AI Evaluation
8 video summaries
Build a research pod on AI Evaluation.
8 videos curated. Save them to your own pod, ask any question across the body of expert opinion, and connect it to Claude or ChatGPT.
Videos About AI Evaluation

Why SweetBench slipped through #substack #shorts
Latent Space

GPT 4.1: The New OpenAI Workhorse
Latent Space

In the Arena: How LMSys changed LLM Benchmarking Forever
Latent Space
![[Lightning Pod] Evals: How to Improve AI Consistently — with Hamel Husain and Shreya Shankar](https://i.ytimg.com/vi/mpJG3Dc6Fn4/maxresdefault.jpg)
[Lightning Pod] Evals: How to Improve AI Consistently — with Hamel Husain and Shreya Shankar
Latent Space

AI Dev 26 x SF | Ara Khan: Evals Are Broken Use Them Anyway
DeepLearningAI

AI Dev 26 x SF | Adit Abraham: Better Agents with Better Data
DeepLearningAI

Is AI About to “Eat Everything”? (It’s Not.)
Cal Newport

How Intelligent Is AI, Really?
Y Combinator