FSDP

Software / App

Fully Sharded Data Parallelism, a memory-saving technique that shards parameters, gradients, and optimizer states across GPUs, often used in PyTorch.

Mentioned in 2 videos

Save the 2 videos on FSDP to your own pod.

Sign up free to keep building your knowledge base on FSDP as more episodes are added.

Get Started Free