I think the problem is from the range that has been set for each parameter at Step2.
When using parallel processing, you would notice some core(s) run till the end but others do not. Those that stop are those where at least one parameter value sampled is out of bound. To solve the problem, open your wolkingFolder/output then "CurrentSimulationReport.log". From there you can determine the core(s) that could not achieve it's simulation. Open the TxtInOut_x corresponding, then calibration.cal, there you will see the values sampled for each parameter. You can use your experience to determine the parameter(s) and the value(s) out of bound or start deleting parameters (the whole line) one at a time and running the model using the executable which is already in the TxtInOut_x folder. It will keep failling until the parameter(s) out of bound is (are) removed. Once the parameter(s) out of bound is determined adjust the value until SWAT (executable) could run properly. Reajust the range then and restart the process at Step2.
The parameter out of bound in my case was bd.sol
I hope this would help, good luck.