FSDP

Software / App

Fully Sharded Data Parallelism, a memory-saving technique that shards parameters, gradients, and optimizer states across GPUs, often used in PyTorch.

Mentioned in 1 video