CAIS and Scale AI Unveil Results of “Humanity's Last Exam,” a Groundbreaking New Benchmark

This was originally published on post
The Center for AI Safety (CAIS) and Scale AI today announced the results of a groundbreaking new AI benchmark that was designed to test the limits of AI knowledge and whether the models are capable of chain-of-thought reasoning. The results demonstrated a significant improvement from the reasoning capabilities of earlier models, but current models still were only able to answer fewer than 10 percent of the expert questions correctly.