Distributed Data Parallel

Concept

A PyTorch implementation of data parallelism where gradients are averaged across all processes after the backward pass.

Mentioned in 1 video