Can AI sandbag safety checks to sabotage users? Yes, but not very well — for now

AI companies claim to have robust safety checks in place that ensure that models don’t say or do weird, illegal, or unsafe stuff. But what if the models