Another option is to do your pcoord calculation separately and in parallel, but independently of the workers. What I mean by this is having the end of `runseg.sh` submit a pcoord calculation task. This calculation is done by a separate pool of workers, and the result is given to the main thread.
It sounds to me like you have two (related) needs:
- A pcoord calculation can scale to more than a single worker's resources
- Some pcoord calculations need to be done using the results of all the workers, and can't be calculated using an individual trajectory. Similarly, you want to calculate binning using all of the results of the pcoord calculations.
The difference is if you do each calculation within runseg as you do now, calculations are effectively distributed across the workers, but each calculation only uses a single worker's resources. On the other hand, if runseg splits off a new separate task for the calculation, each calculation can use the entire resources of whatever pool it's submitted to.
For the second point: Since all the calculations are done by one thing, it can also wait and collect trajectories from multiple workers, and do calculations over them in aggregate.
This also lets you configure and provision the nodes for pcoord calculation separately from the WESTPA nodes. For example, if your pcoord calculation isn't GPU-optimized, you can run it on nodes with more CPUs instead of running it on the GPU worker node.
-John