27-09-2024 17:15
via
gizmodo.com
Human Feedback Makes AI Better at Deceiving Humans, Study Shows
In a preprint study, researchers found that training a language model with human feedback teaches the model to generate incorrect responses that trick humans.
Read more »