These researchers used NPR Sunday Puzzle questions to benchmark AI 'reasoning' models

Researchers used questions from the NPR Sunday Puzzle challenge to build a benchmark to test AI ‘reasoning’ models.