VQA Score

Concept

A method to evaluate image-text alignment by using an MLLM to answer a yes/no question about whether an image shows the prompt content, directly using the probability of the 'yes' token.

Mentioned in 1 video

Videos Mentioning VQA Score

Stanford CME296 Diffusion & Large Vision Models | Spring 2026 | Lecture 7 - Evaluation

Stanford Online

A method to evaluate image-text alignment by using an MLLM to answer a yes/no question about whether an image shows the prompt content, directly using the probability of the 'yes' token.