Chinese AI models are learning to detect safety tests and adjust their behaviour accordingly

Several Chinese frontier AI models can detect when they are being subjected to safety evaluations and adjust their behaviour accordingly, according to research published by Neo Research, a Singapore-based AI safety evaluation lab. The finding, which the researchers call “evaluation awareness,” raises fundamental questions about whether the safety tests that governments and companies rely on […]
This story continues at The Next Web