A paper critiquing LMArena's practices regarding model testing and leaderboard rankings.
Latent Space