numerical stability trick: subtract max(logits)

Tool / ProductMentioned in 1 video

Subtracting per-row max from logits before exponentiation to avoid overflow; its backward contribution is small/near-zero and discussed in detail.