Hi He,
We don't have any experience with the conda-pack for diffpy-cmi, which includes c++ code. I imagine that it has to be built on an identical system (i.e., same operating system, not same HPC system) for it to be unpacked and run out of the box, but as I say, we have never tried. If you do try, please let me know and we can think about sharing code this way. Another thing to think about of course is using Docker. Again, we have not focussed on this ourselves.
Pavol did get a parallel version of the pair iterator working in diffpy.sreal, but as I recall was not blown away by the speed-up, but it will depend on what you are trying to do. For crystalline materials where the outer-loop is a sum over atoms in the unit cell, and the unit cell tends to be small, you can parallelize the outer loop but the time is bound by the inner loop which takes the longest, especially if you are computing to reasonably high values of r. I think this is why we didn't push through and finish that work. But if you are doing 'big box' modeling where the outer loop is also order N (the number of atoms in the box) the speedup could be significant. I can ask Pavol what happened to his code. Alan Coehlo speeded things up a lot by removing the peak-broadening convolution from the innermost loop. I have wanted to code this up in diffpy.srreal for a long time, but never quite found the right person to do it. This might be a good place to start if you are interested in contributing to the diffpy project.
For everyone on the thread, Diffpy is an open source community project and we welcome contributions in the form of pull-requests. Anyone who wants to join the party, we can help you get started with the workflow....
S