AI groups rush to redesign model testing and create new benchmarks

Rapidly advancing technology is surpassing current methods of evaluating and comparing large language models