AI Evaluation

9 video summaries

Build a research pod on AI Evaluation.

9 videos curated. Save them to your own pod, ask any question across the body of expert opinion, and connect it to Claude or ChatGPT.

Get Started Free

Videos About AI Evaluation

Why SweetBench slipped through #substack #shorts

Why SweetBench slipped through #substack #shorts

Latent Space

GPT 4.1: The New OpenAI Workhorse

GPT 4.1: The New OpenAI Workhorse

Latent Space

In the Arena: How LMSys changed LLM Benchmarking Forever

In the Arena: How LMSys changed LLM Benchmarking Forever

Latent Space

[Lightning Pod] Evals: How to Improve AI Consistently — with Hamel Husain and Shreya Shankar

[Lightning Pod] Evals: How to Improve AI Consistently — with Hamel Husain and Shreya Shankar

Latent Space

AI Dev 26 x SF | Ara Khan: Evals Are Broken Use Them Anyway

AI Dev 26 x SF | Ara Khan: Evals Are Broken Use Them Anyway

DeepLearningAI

AI Dev 26 x SF | Adit Abraham: Better Agents with Better Data

AI Dev 26 x SF | Adit Abraham: Better Agents with Better Data

DeepLearningAI

Cooking with OpenAI’s Research Chief: AGI, o1, Evals, and Scaling Laws — Mark Chen

Cooking with OpenAI’s Research Chief: AGI, o1, Evals, and Scaling Laws — Mark Chen

Latent Space

Is AI About to “Eat Everything”? (It’s Not.)

Is AI About to “Eat Everything”? (It’s Not.)

Cal Newport

How Intelligent Is AI, Really?

How Intelligent Is AI, Really?

Y Combinator