Model Evaluation

9 video summaries

Build a research pod on Model Evaluation.

9 videos curated. Save them to your own pod, ask any question across the body of expert opinion, and connect it to Claude or ChatGPT.

Get Started Free

Videos About Model Evaluation

Measuring Exponential Trends Rising (in AI) — Joel Becker, METR

Measuring Exponential Trends Rising (in AI) — Joel Becker, METR

Latent Space

Training Llama 2, 3 & 4: The Path to Open Source AGI — with Thomas Scialom of Meta AI

Training Llama 2, 3 & 4: The Path to Open Source AGI — with Thomas Scialom of Meta AI

Latent Space

State of the Art: Training 70B LLMs on 10,000 H100 clusters

State of the Art: Training 70B LLMs on 10,000 H100 clusters

Latent Space

The Agent Reasoning Interface: Claude, ChatGPT Canvas, Tasks, Operator — with Karina Nguyen, OpenAI

The Agent Reasoning Interface: Claude, ChatGPT Canvas, Tasks, Operator — with Karina Nguyen, OpenAI

Latent Space

⚡️Multi-Turn RL for Multi-Hour Agents — with Will Brown, Prime Intellect

⚡️Multi-Turn RL for Multi-Hour Agents — with Will Brown, Prime Intellect

Latent Space

[Paper Club] SWE-Bench [OpenAI Verified/Multimodal] + MLE-Bench with Jesse Hu

[Paper Club] SWE-Bench [OpenAI Verified/Multimodal] + MLE-Bench with Jesse Hu

Latent Space

Stanford CME296 Diffusion & Large Vision Models | Spring 2026 | Lecture 7 - Evaluation

Stanford CME296 Diffusion & Large Vision Models | Spring 2026 | Lecture 7 - Evaluation

Stanford Online

How To Read AI Research Papers Effectively

How To Read AI Research Papers Effectively

DeepLearningAI

Nuts and Bolts of Applying Deep Learning (Andrew Ng)

Nuts and Bolts of Applying Deep Learning (Andrew Ng)

Lex Fridman