Deepspeed Zero, DDP, FSDP, and DeepSpeed ZeRO all distribute training across GPUs but solve different problems.