Top AI models fail spectacularly when faced with slightly altered medical questions

Artificial intelligence has dazzled with its test scores on medical exams, but a new study suggests this success may be superficial. When answer choices were modified, AI performance dropped sharply—raising questions about whether these systems truly understand what they’re doing.