AI Evaluation
5 video summaries
Videos About AI Evaluation

Why SweetBench slipped through #substack #shorts
Latent Space

GPT 4.1: The New OpenAI Workhorse
Latent Space

In the Arena: How LMSys changed LLM Benchmarking Forever
Latent Space
![[Lightning Pod] Evals: How to Improve AI Consistently — with Hamel Husain and Shreya Shankar](https://i.ytimg.com/vi/mpJG3Dc6Fn4/maxresdefault.jpg)
[Lightning Pod] Evals: How to Improve AI Consistently — with Hamel Husain and Shreya Shankar
Latent Space

How Intelligent Is AI, Really?
Y Combinator