MIT researchers found that vision-language models, widely used to analyze medical images, do not understand negation words like “no” and “not.” This could cause them to fail unexpectedly when asked to retrieve medical images that contain certain objects but not others.
Study shows vision-language models can’t handle queries with negation words
