One of the earliest systems for large language model evaluation discussed by the speaker.
Stanford Online