Let me describe at a high level, and then I can provide any additional information needed. I have two hosts - host1 and host2. If I specify host1 on the first line of the host file, the workers start running on both the hosts but they abort on host2. Now if I switch the order of hosts in the host file, that is, host2 is first and host1 is second, the worker nodes now abort on host1. I don't see any error messages, and in both cases I'm launching the process from host1. In other words, worker nodes only complete running on the first host in the host file, and abort on the other host.