MT-Bench
Software / AppMentioned in 2 videos
A static benchmark developed by LMSys, inspired by Chatbot Arena, for evaluating LLMs on multi-turn conversations.
Videos Mentioning MT-Bench

In the Arena: How LMSys changed LLM Benchmarking Forever
Latent Space
A static benchmark developed by LMSys, inspired by Chatbot Arena, for evaluating LLMs on multi-turn conversations.

The Origin and Future of RLHF: the secret ingredient for ChatGPT - with Nathan Lambert
Latent Space
An academic leaderboard for evaluating multi-turn chat capabilities, where GPT-4 scores initial and follow-up responses.