Hey Alex,
I agree - it's not something that's supposed to be possible in linux. I tried all those - dynamic and static pre-compiled executables, re-compiling myself, and compiling out openmp.
Most recently, I figured out that using the hard drives of the execute nodes worked, while the larger filesystems (lustre, and nfs) did not. My best guess now is:
- STAR does not free all the memory it uses in indexing, but normally the OS handles this
- A randomly reproducible race condition involving a file lock is triggered (~50% of indexing runs)
- The file lock is still intact, so one or more of STAR's processes is still running when the OS tries to free memory
- The OS sees the process still running, and gives up on freeing the memory
- The process ends normally after that
Normally
the OS acts as a safety net, and frees all memory attributed to a
process, but the race condition interferes, and it has something to do
with latency on the network filesystem. This is all pretty speculative obviously - all I know is that local disks work, and one example each of nfs and lustre don't.
I believe the workaround of using local storage will be fine for me, but I think this is something to look out for.
Thanks!
Brendan