Hi all,
Currently, the examples we have are all about running one-single training task in the system.
I'm thinking is there any method to simulate a multi-tenant scenarios in the network.
E.g. run two independent ResNet training jobs on host 1-8 and 9-16.
Is it possible to generate two different 8-host workloads, and rename, combine them in one folder. Can the correctness of running be guaranteed?
The reason I'm asking is because sometimes we wish to research the behavior of network when multiple training tasks are running concurrently.
Looking forward to anyone's experience on this problem.
Best wishes,
Yuze