S

SWAG

Software / AppMentioned in 1 video

Another benchmark that appears to suffer from the 'first character equals final answer' issue, similar to issues found in the MMLU.