G (approximate GELU)

Study / Research

The G nonlinearity (approximate GELU) used by GPT-2; the video explains exact vs approximate forms and historical reasons.

Mentioned in 1 video