An academic benchmark used for evaluating AI models, particularly in multi-turn scenarios.
Mentioned in 1 video
Latent Space