I have been scheduling large sets of INLA models on a large compute cluster. Scheduling and resource allocation are done via SLURM (if that matters). I'm running models with 24 cores each, ~1.5GB memory/core. I seem to have squashed all the seg faults and other errors. Now, it just returns "done" within the output file and ends. In the error file, it reports "killed" with no other details.
Here's the contents of my SLURM error file:
Error in inla.inlaprogram.has.crashed() :
The inla-program exited with an error. Unless you interupted it yourself, please rerun with verbose=TRUE and check the output carefully.
Calls: inla -> inla.inlaprogram.has.crashed
In addition: Warning message:
In inla.model.properties.generic(inla.trim.family(model), mm[names(mm) == :
Model 'z' in section 'latent' is marked as 'experimental'; changes may appear at any time.
Use this model with extra care!!! Further warnings are disabled.
Execution halted
I am running INLA with debug=TRUE (and debugging turned on via the environment variable), and I am using pardiso. Pardiso returns a few errors: *** PARDISO ERROR(0): not pos.def matrix: 15 eigenvalues are negative.
*** PARDISO ERROR: I will try to work around the problem...
But these seem not to be catastrophic. The level is a nested model similar to the one in chapter 4 of Gomez-Rubio 2020.
I am at a bit of a loss as the output from SLURM contains absolutely no errors (aside from the above PARDISO ERROR) and no other memory or resource related issues. I'm happy to post any other useful information.
Thanks for your time and help.
Scott