CAIS and Scale AI Unveil Results of “Humanity's Last Exam,” a Groundbreaking New Benchmark

January 23, 2025 By admin

The Center for AI Safety (CAIS) and Scale AI today announced the results of a groundbreaking new AI benchmark that was designed to test the limits of AI knowledge and whether the models are capable of chain-of-thought reasoning. The results demonstrated a significant improvement from the reasoning capabilities of earlier models, but current models still were only able to answer fewer than 10 percent of the expert questions correctly.