VQA Score
Concept
A method to evaluate image-text alignment by using an MLLM to answer a yes/no question about whether an image shows the prompt content, directly using the probability of the 'yes' token.
Mentioned in 1 video
A method to evaluate image-text alignment by using an MLLM to answer a yes/no question about whether an image shows the prompt content, directly using the probability of the 'yes' token.