TSUNAMI Error Code 35072

147 views
Skip to first unread message

Noah Walton

unread,
Sep 17, 2021, 1:38:57 PM9/17/21
to SCALE Users Group
Hey guys,

I am running a stochastic update algorithm on a 10x10 infinitely reflected grid. Each of the 100 cells contains some amount of Uranium 234, 235, 236,and 238 as well as O16 and H1. Most configurations the algorithm comes up with have been running without an issue, however, a few configurations throw the error code 35072 during KENO transport. This error core is found in the output file, while the msg file shows the following message:

sh: line 1: 1318974 Killed                  /opt/scale6.2.3/bin/../bin/kenova < i_kenova.forward

I have tried running the one of the failing configurations in standalone forward KENO and it completes successfully. Another strange thing is that if I decrease the number of active generations to 10 (opposed to the original 40), the sequences will all complete successfully. Because of these tests, I'm assuming this has something to do with the IFP storage of neutron histories, but I can't figure out if this is a SCALE error or a machine error.

Another thing I tried while troubleshooting was to rescale the geometry and densities such that I have the same optical dimensions, but the densities are more realistic. I thought that the algorithm might just be coming up with a crazy configuration, but this did not resolve the issue.

Any feedback is welcomed! And I would be happy to share the input/output/msg files if needed, thank you! 

Noah Walton,
University of Tennessee, Knoxville


Rike Bostelmann

unread,
Sep 20, 2021, 10:47:32 PM9/20/21
to SCALE Users Group
Hi Noah,

Since you mentioned IFP: are you using KENO as transport code during a TSUNAMI calculation or through CSAS or TRITON? IFP calculations can have a really large memory footprint depending on e.g. the number of latent generations (cfp).
If you could post your input file, that would be helpful.

Best,
Rike
SCALE Team

Noah Walton

unread,
Sep 30, 2021, 10:09:42 AM9/30/21
to SCALE Users Group
Hey Rike,

Apologies for the late reply, I thought I had turned on notifications for google groups but didn't get an alert that you had responded. I will fix this!

I am running TSUNAMI with 5 latent generations. I've attached an input file that is failing for me along with the msg and out files.

Thank you!
Noah Walton

tsunami_failing_case.msg
tsunami_failing_case.inp
tsunami_failing_case.out

Winfried Zwermann

unread,
Oct 1, 2021, 5:02:22 AM10/1/21
to SCALE Users Group
Dear Noah,

The issue with your failed job is that it is killed by the operating system because it runs out of memory. Apparently in some cases, sufficient memory cannot be allocated, probably due to other processes already running on the specific node.

Your job requires some 40 GB, so you may try and request a correspondingly larger amount of memory for your job. Then, it will only be started if enough memory is available, but it won't die during runtime (hopefully).

Best regards,
Winfried

Noah Walton

unread,
Oct 1, 2021, 2:02:38 PM10/1/21
to SCALE Users Group
Winfried,

Thank you, I will try this.
I have ran into memory constraints on the computing cluster I am using before, but received a message from the job/queue submission system that informed me I needed to request more memory. This was using only a forward solve (CSAS), so perhaps the machine doesn't recognize the issue inside of TSUNAMI w/IFP.

Thanks again!
Noah Walton

Noah Walton

unread,
Oct 6, 2021, 10:03:32 AM10/6/21
to SCALE Users Group
Just following up,

I was able to request more memory and the job ran successfully, thank you! 

I was wondering, where did you find the specific amount of memory required by the tsunami job I posted? The message at the bottom of the output reports that the scale driver used around 100 MiB, which is less than 1 GB. I haven't been able to find a memory footprint in any of the output files.

Winfried Zwermann

unread,
Oct 6, 2021, 10:44:08 AM10/6/21
to SCALE Users Group
Good to hear that it worked.

I observed the process status of the node on which the job was running with the top command. Say it is running on node01 and one wants a snapshot, one can run the command:

ssh node01 top -b -n 1 | head -n 20

However, I do not know whether it is allowed on your cluster to send commands to the compute nodes in foreground mode.
Reply all
Reply to author
Forward
0 new messages