MT-Bench
Software / App
A static benchmark developed by LMSys, inspired by Chatbot Arena, for evaluating LLMs on multi-turn conversations.
Mentioned in 2 videos
Save the 2 videos on MT-Bench to your own pod.
Sign up free to keep building your knowledge base on MT-Bench as more episodes are added.
Videos Mentioning MT-Bench

In the Arena: How LMSys changed LLM Benchmarking Forever
Latent Space
A static benchmark developed by LMSys, inspired by Chatbot Arena, for evaluating LLMs on multi-turn conversations.

The Origin and Future of RLHF: the secret ingredient for ChatGPT - with Nathan Lambert
Latent Space
An academic leaderboard for evaluating multi-turn chat capabilities, where GPT-4 scores initial and follow-up responses.