S-bench
Study / ResearchMentioned in 2 videos
A benchmark for AI models, where Co-op Labs' Genie model achieved high scores by fine-tuning GPT-4o, though its reasoning traces were withheld, mirroring OpenAI's later competitive approach.
Videos Mentioning S-bench

Building AGI in Real Time (OpenAI Dev Day 2024)
Latent Space
A benchmark for AI models, where Co-op Labs' Genie model achieved high scores by fine-tuning GPT-4o, though its reasoning traces were withheld, mirroring OpenAI's later competitive approach.

Building AGI with OpenAI's Structured Outputs API
Latent Space
An evaluation benchmark for LLMs that targets code writing and file manipulation capabilities, noted for its low pass rate and relevance to model assessment.