S
SWAG
Software / AppMentioned in 1 video
Another benchmark that appears to suffer from the 'first character equals final answer' issue, similar to issues found in the MMLU.
Another benchmark that appears to suffer from the 'first character equals final answer' issue, similar to issues found in the MMLU.