LLM Evaluation
9 video summaries
Videos About LLM Evaluation

Stanford CS547 HCI Seminar | Winter 2026 | Creation, Evolution, and Formalization of Notations
Stanford Online

Beating OpenAI and Anthropic by Looking At Data: the new #1 on SWE-Bench w/ W&B CTO Shawn Lewis
Latent Space
![[Paper Club] Who Validates the Validators? Aligning LLM-Judges with Humans (w/ Eugene Yan)](https://i.ytimg.com/vi/4o_ic83U1Kw/maxresdefault.jpg)
[Paper Club] Who Validates the Validators? Aligning LLM-Judges with Humans (w/ Eugene Yan)
Latent Space

Production AI Engineering starts with Evals
Latent Space
![[Paper Club] DocETL: Agentic Query Rewriting + Eval for Complex Document Processing w Shreya Shankar](https://i.ytimg.com/vi/G8d3txDwLZc/maxresdefault.jpg)
[Paper Club] DocETL: Agentic Query Rewriting + Eval for Complex Document Processing w Shreya Shankar
Latent Space

How to train a Million Context LLM — with Mark Huang of Gradient.ai
Latent Space

Your Ground Truth Is Wrong: Evaluating STT with truth files & semantic WER | AssemblyAI Workshop
AssemblyAI

RAG is a hack - with Jerry Liu of LlamaIndex
Latent Space

Personal benchmarks vs HumanEval - with Nicholas Carlini of DeepMind
Latent Space