Optimizer chosen for training (AdamW variant) with recommended betas and epsilon following GPT-3 guidance.
Andrej Karpathy
Latent Space