LLM Evaluation

9 video summaries

Videos About LLM Evaluation

Stanford CS547 HCI Seminar | Winter 2026 | Creation, Evolution, and Formalization of Notations

Stanford CS547 HCI Seminar | Winter 2026 | Creation, Evolution, and Formalization of Notations

Stanford Online

Beating OpenAI and Anthropic by Looking At Data: the new #1 on SWE-Bench w/ W&B CTO Shawn Lewis

Beating OpenAI and Anthropic by Looking At Data: the new #1 on SWE-Bench w/ W&B CTO Shawn Lewis

Latent Space

[Paper Club] Who Validates the Validators? Aligning LLM-Judges with Humans (w/ Eugene Yan)

[Paper Club] Who Validates the Validators? Aligning LLM-Judges with Humans (w/ Eugene Yan)

Latent Space

Production AI Engineering starts with Evals

Production AI Engineering starts with Evals

Latent Space

[Paper Club] DocETL: Agentic Query Rewriting + Eval for Complex Document Processing w Shreya Shankar

[Paper Club] DocETL: Agentic Query Rewriting + Eval for Complex Document Processing w Shreya Shankar

Latent Space

How to train a Million Context LLM — with Mark Huang of Gradient.ai

How to train a Million Context LLM — with Mark Huang of Gradient.ai

Latent Space

Your Ground Truth Is Wrong: Evaluating STT with truth files & semantic WER | AssemblyAI Workshop

Your Ground Truth Is Wrong: Evaluating STT with truth files & semantic WER | AssemblyAI Workshop

AssemblyAI

RAG is a hack - with Jerry Liu of LlamaIndex

RAG is a hack - with Jerry Liu of LlamaIndex

Latent Space

Personal benchmarks vs HumanEval - with Nicholas Carlini of DeepMind

Personal benchmarks vs HumanEval - with Nicholas Carlini of DeepMind

Latent Space