Simulation with NCCL-based MPI (the fastest training)

In case your cross-GPU bandwidth is high (e.g., InfiniBand, NVLink, EFA, etc.), we suggest to use this NCCL-based MPI FL simulator to accelerate your development.