Winner's curse

Concept

A phenomenon where the performance of a selected model is overstated due to statistical fluctuations, a concern addressed by LMSys's live benchmark.

Mentioned in 1 video