Adam optimizer

Concept

The optimization algorithm chosen for training in the lecture (recommended default for training transformers).

Mentioned in 2 videos