FSDP
Software / App
Fully Sharded Data Parallelism, a memory-saving technique that shards parameters, gradients, and optimizer states across GPUs, often used in PyTorch.
Mentioned in 1 video
Fully Sharded Data Parallelism, a memory-saving technique that shards parameters, gradients, and optimizer states across GPUs, often used in PyTorch.