AIs Deceive Human Evaluators. And We’re Probably Not Freaking Out Enough

AI models have disobeyed researchers, attempted to escape containment and even lied about following rules. What happens next?
Continue reading...