VQA Score

Concept

A method to evaluate image-text alignment by using an MLLM to answer a yes/no question about whether an image shows the prompt content, directly using the probability of the 'yes' token.

Mentioned in 1 video