Vending Bench
Software / App
Another evaluation approach mentioned in the context of monetary value for AI capabilities.
Mentioned in 2 videos
Videos Mentioning Vending Bench

The End of SWE-Bench Verified — Mia Glaese & Olivia Watkins, OpenAI Frontier Evals
Latent Space
Another evaluation approach mentioned in the context of monetary value for AI capabilities.

GPT 5.5 Arrives, DeepSeek V4 Drops, and the Compute War Intensifies
AI Explained
A benchmark where AI models run a simulated business to maximize profit. GPT-5.5 outperformed Opus 4.7 in this simulation.