AdamW

Concept

Optimizer chosen for training (AdamW variant) with recommended betas and epsilon following GPT-3 guidance.

Mentioned in 2 videos