With Evals, OpenAI hopes to crowdsource AI model testing

Alongside GPT-4, OpenAI has open-sourced a software framework to evaluate the performance of its AI models. Called Evals, OpenAI says that the tooling is designed to allow anyone to report shortcomings in its models to help guide further improvements.
It’s a sort of crowdsourcing approach to model testing, OpenAI says.
“We use Evals to guide development of our models (both identifying shortcomings and preventing regressions), and our users can apply it for tracking performance across