G

GSM 8K

Study / ResearchMentioned in 1 video

A benchmark for AI mathematical reasoning designed for high schoolers, found to have errors in its original design.