AI and LLMs struggle with historical accuracy in advanced tests

This was originally published on post
Leading AI systems perform poorly on nuanced historical exams, achieving only 46% accuracy at best.