LLM Evaluation
9 video summaries
Build a research pod on LLM Evaluation.
9 videos curated. Save them to your own pod, ask any question across the body of expert opinion, and connect it to Claude or ChatGPT.
Videos About LLM Evaluation

Stanford CS547 HCI Seminar | Winter 2026 | Creation, Evolution, and Formalization of Notations
Stanford Online

Beating OpenAI and Anthropic by Looking At Data: the new #1 on SWE-Bench w/ W&B CTO Shawn Lewis
Latent Space
![[Paper Club] Who Validates the Validators? Aligning LLM-Judges with Humans (w/ Eugene Yan)](https://i.ytimg.com/vi/4o_ic83U1Kw/maxresdefault.jpg)
[Paper Club] Who Validates the Validators? Aligning LLM-Judges with Humans (w/ Eugene Yan)
Latent Space

Production AI Engineering starts with Evals
Latent Space
![[Paper Club] DocETL: Agentic Query Rewriting + Eval for Complex Document Processing w Shreya Shankar](https://i.ytimg.com/vi/G8d3txDwLZc/maxresdefault.jpg)
[Paper Club] DocETL: Agentic Query Rewriting + Eval for Complex Document Processing w Shreya Shankar
Latent Space

How to train a Million Context LLM — with Mark Huang of Gradient.ai
Latent Space

Your Ground Truth Is Wrong: Evaluating STT with truth files & semantic WER | AssemblyAI Workshop
AssemblyAI

RAG is a hack - with Jerry Liu of LlamaIndex
Latent Space

Personal benchmarks vs HumanEval - with Nicholas Carlini of DeepMind
Latent Space