Discovering Language Model Behaviors with Model-Written Evaluations

BookMentioned in 1 video

A paper that investigates the behaviors of large language models, including their political leanings and stated desires, as they scale.