Multiple Simulation Job Failures

17 views
Skip to first unread message

Albert Fabrizi

unread,
May 5, 2026, 4:53:31 PM (20 hours ago) May 5
to GlueX Software Help
Hello experts,
When running MCWrapper with the gen_2pi_primakoff generator most jobs fail with no error message or an error message that says disk space quota exceeded. Screenshot 2026-05-05 at 4.44.43 PM.png

This is the end of a .out file (/farm_out/alfab/prim_rho0_sim/log/101582_stdout.101582_81.out) which just mysteriously ends at the beginning of the Geant4 step. The corresponding .err is empty. 

For a different file the .out shows the generation ended abruptly Screenshot 2026-05-05 at 4.48.20 PM.png 

and the corresponding .err shows a disk quota exceeded error:
Screenshot 2026-05-05 at 4.47.48 PM.png

I am using the default version.xml at the time of this email and the corresponding .cfg, MC.config, and jana.config can all be found here:
/work/halld/home/alfab/cpp_analysis/mc_sim_files/primakoff_rho0_sim/

The initial command i used:
 gluex_MC.py MC.config 101582 10000000 per_file=50000 cleanmcsmear=0 recon=0 batch=2 logdir=/farm_out/alfab/prim_rho0_sim/

My attempts at fixing this are switch between this version.xml and version_7.4.0.xml, allowing for more job resources, and changing the amount of events i am requesting to be generated. 

Any help would be greatly appreciated!
Albert

Sean Dobbs

unread,
May 5, 2026, 4:58:36 PM (20 hours ago) May 5
to Albert Fabrizi, GlueX Software Help
Hi Albert - 

Try checking to see if your /farm_out is nearly full - I think by default the quota is 15GB?

My jobs recently failed with a similarly (unhelpful) error message, and that was the cause.

Cheers,
Sean

--
You received this message because you are subscribed to the Google Groups "GlueX Software Help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gluex-softwar...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/gluex-software/611e1cde-1614-4e53-b231-c7f32084c05dn%40googlegroups.com.

Albert Fabrizi

unread,
8:53 AM (4 hours ago) 8:53 AM
to Sean Dobbs, GlueX Software Help
Hi Sean, Thanks for the note - I think that is the case, I cleared my farm_out and more jobs succeeded however MCWrapper is still flooding my log files with:

Screenshot 2026-05-06 at 8.50.45 AM.png

which is causing me to hit 15GB again quite quickly, is this a feature or something that can be suppressed by an option in MCWrapper?



--
___________________

Albert Fabrizi
Graduate Student, Physics
University of Massachusetts 
Amherst, MA 01003

Sean Dobbs

unread,
8:55 AM (4 hours ago) 8:55 AM
to Albert Fabrizi, GlueX Software Help
Not sure, can you tell from a log file which step in the process is causing this? (generator, hdgeant4, mcsmear, hd_root, ?)

---Sean

Albert Fabrizi

unread,
8:59 AM (4 hours ago) 8:59 AM
to Sean Dobbs, GlueX Software Help
This is from the generator step

Screenshot 2026-05-06 at 8.56.17 AM.png

I am using the gen_2pi_primakoff generator with a custom MC.confiig file:
/work/halld/home/alfab/cpp_analysis/mc_sim_files/primakoff_rho0_sim/MC.config

Some of the files that fail these lines just get cut off abruptly before it reaches geant4. 

[alfab@ifarm2402 ~]$ du -sh /farm_out/alfab/prim_rho0_sim/log/
14G /farm_out/alfab/prim_rho0_sim/log/

- Albert


Shepherd, Matthew

unread,
8:59 AM (4 hours ago) 8:59 AM
to Sean Dobbs, Albert Fabrizi, GlueX Software Help

This looks to me like debugging output in an amplitude calculation.

Is this MC using the AmpTools framework and an amplitude for generation?

If so, I can give some guidance on the proper way to put in debugging info.

Matt


On May 6, 2026, at 8:54 AM, Sean Dobbs <sean...@gmail.com> wrote:

Not sure, can you tell from a log file which step in the process is causing this? (generator, hdgeant4, mcsmear, hd_root, ?)

---Sean

On Wed, May 6, 2026 at 8:53 AM Albert Fabrizi <afab...@umass.edu> wrote:
Hi Sean, Thanks for the note - I think that is the case, I cleared my farm_out and more jobs succeeded however MCWrapper is still flooding my log files with:

<Screenshot 2026-05-06 at 8.50.45 AM.png>

which is causing me to hit 15GB again quite quickly, is this a feature or something that can be suppressed by an option in MCWrapper?
On Tue, May 5, 2026 at 4:58 PM Sean Dobbs <sean...@gmail.com> wrote:
Hi Albert - 

Try checking to see if your /farm_out is nearly full - I think by default the quota is 15GB?

My jobs recently failed with a similarly (unhelpful) error message, and that was the cause.

Cheers,
Sean

On Tue, May 5, 2026 at 4:53 PM Albert Fabrizi <afab...@umass.edu> wrote:
Hello experts,
When running MCWrapper with the gen_2pi_primakoff generator most jobs fail with no error message or an error message that says disk space quota exceeded. <Screenshot 2026-05-05 at 4.44.43 PM.png>

This is the end of a .out file (/farm_out/alfab/prim_rho0_sim/log/101582_stdout.101582_81.out) which just mysteriously ends at the beginning of the Geant4 step. The corresponding .err is empty. 

For a different file the .out shows the generation ended abruptly <Screenshot 2026-05-05 at 4.48.20 PM.png> 

and the corresponding .err shows a disk quota exceeded error:

Albert Fabrizi

unread,
9:05 AM (4 hours ago) 9:05 AM
to Shepherd, Matthew, Sean Dobbs, GlueX Software Help
Hi Matt, 
Yes I do use the AMPTools framework and a user defined amplitude. 

Here is s snippet from my .cfg: 
Screenshot 2026-05-06 at 9.03.31 AM.png

I was trying to poke around and see where the printing could be happening but I didn't find anything here 

Shepherd, Matthew

unread,
9:13 AM (3 hours ago) 9:13 AM
to Albert Fabrizi, Sean Dobbs, GlueX Software Help


It is line 119 of TwoPiAngles_primakoff.cc

It looks like it is there for a reason.  One should probably understand why the amplitude is zero for these events as that looks to be a just a bit suspicious.  A quick look didn't turn up an obvious problem.

You could comment out that line, but you may be masking a deeper problem.

You might want to run this generator on the command line with some small numbers of events.  It is the four-vector generation that is causing this, so it should be relatively easy to test/understand without having to run hdgeant, etc..

Matt


On May 6, 2026, at 9:05 AM, Albert Fabrizi <afab...@umass.edu> wrote:

Hi Matt, 
Yes I do use the AMPTools framework and a user defined amplitude. 

Here is s snippet from my .cfg: 

Albert Fabrizi

unread,
9:18 AM (3 hours ago) 9:18 AM
to Shepherd, Matthew, Sean Dobbs, GlueX Software Help
I see, let me run a few small samples and see if I can figure out the issue. 
Reply all
Reply to author
Forward
0 new messages