Discovering Language Model Behaviors with Model-Written Evaluations

Book

A paper that investigates the behaviors of large language models, including their political leanings and stated desires, as they scale.

Mentioned in 1 video