LMSYS Chatbot Arena
Software / App
A platform for evaluating chatbot performance through crowdsourced human preferences, used to assess LLMs like Llama 3.
Mentioned in 2 videos
Videos Mentioning LMSYS Chatbot Arena

Training Llama 2, 3 & 4: The Path to Open Source AGI — with Thomas Scialom of Meta AI
Latent Space
A platform for evaluating chatbot performance through crowdsourced human preferences, used to assess LLMs like Llama 3.

Personal benchmarks vs HumanEval - with Nicholas Carlini of DeepMind
Latent Space
A platform for evaluating LLMs where prompts are often single-turn, contrasting with real-world multi-turn usage.