Can a LLM be a Fair Judge? Practical Model Evaluation
Overview
Can a LLM be a Fair Judge? Practical Model Evaluation
LLM-as-a-Judge is rapidly becoming a practical default for evaluating LLM applications because it scales human-like judgments far beyond what manual QA can handle, while supporting both reference free and reference-based scoring as well as pairwise comparisons. The rapid adoption of Large Language Models in enterprise applications has created a critical challenge for effective and efficient evaluation.This session introduces and demystifies the LLM-as-a-Judge paradigm, a powerful and practical solution for automated LLM evaluation.
I will discuss a production-ready path from rubrics to ROI by covering how to define criteria that matter for the product and choosing the right judging mode (single-output vs. pairwise). We will discuss how automated evaluation can accelerate iteration without losing fidelity to human expectations. Attendees will learn the foundational principles and practical steps for implementing this system, including constructing effective evaluation prompts, designing robust rubrics, and scoring scales, and choosing the right "judge" model for the task.
Speaker Bio
Kaushik Holla is a Senior Data Scientist at Red Ventures where he leads cross-functional efforts to designs and deploys production AI for large customer operations, emphasizing experimentation, MLOps, and reliability at scale. Previously at Asurion, Kaushik guided ML programs for sales enablement and service experience, partnering with product, engineering, and operations to deliver models that improve customer outcomes.
Beyond industry roles, Kaushik is a Red Ventures Hackathon 2025 winner, former President of Northeastern’s Data Science Club, and a frequent Medium contributor. He holds a Master's degree in Data Science from Northeastern University, where he was awarded the Khoury Graduate Fellowship (2020). His technical interests include applied NLP, recommender systems, and GenAI safety/observability.
Good to know
Highlights
- 1 hour
- Online
Location
Online event
Organized by
Analytics & Big Data Society
Followers
--
Events
--
Hosting
--