Folks,
I am getting an intermittent problem when running bggen events on
the JLab farm. It doesn't happen every time but 40% of the jobs
crash. The jobs crash in mcsmear. This is what shows up on
standard error:
Run: [FATAL] Connection error
terminate called after throwing an instance of
'std::runtime_error'
what(): hddm_s::istream::istream error - invalid hddm header
/group/halld/Software/builds/Linux_CentOS7.7-x86_64-gcc4.8.5/gluex_MCwrapper/gluex_MCwrapper-v2.5.1/MakeMC.sh:
line 1481: 246733 Aborted (core dumped) mcsmear
$MCSMEAR_Flags -PTHREAD_TIMEOUT_FIRST_EVENT=6400
-PTHREAD_TIMEOUT=6400
-o$STANDARD_NAME\_geant$GEANTVER\_smeared.hddm
$STANDARD_NAME\_geant$GEANTVER.hddm
./run$formatted_runNumber\_random.hddm\:1\+$fold_skip_num
Standard output says:
RUNNING MCSMEAR
skipping: 43032
mcsmear -PTHREAD_TIMEOUT_FIRST_EVENT=6400 -PTHREAD_TIMEOUT=6400
-obggen_bggen_030401_219\_geant4\_smeared.hddm
bggen_bggen_030401_219\_geant4.hddm
./run030401\_random.hddm:1+43032
An hddm file was not created by mcsmear. Terminating MC
production. Please consult logs to diagnose
Jobs are run via gluex_MC.py and are submitted to swif. The version set I am using is here. This version set has versions of halld_recon and halld_sim were recently updated from their respective master branches (earlier this week). There are some NPP tweaks in halld_recon, but that should not affect the issue I am having. The gluex_MC.py config file is attached.
This problem seems to ring a bell, but thought I would ask the group befor diving into it more deeply.
-- Mark
So...try again?
Oh yes! This might be it. Dtn1902 (which hosts the files via xrootd) was crashing off and on all morning. It worked when it was up and not when it was down. I should see about why uconn wasn’t getting the fall-backs. Also, maybe a way to disable xrootd….
Thomas Britton
Staff Scientist, Scientific Computing
Jefferson Lab
> To view this discussion on the web visit https://urldefense.proofpoint.com/v2/url?u=https-3A__groups.google.com_d_msgid_gluex-2Dsoftware_066947c7-2D86e6-2D2693-2D17b7-2D2c188e53ea75-2540jlab.org&d=DwIFaQ&c=CJqEzB1piLOyyvZjb8YUQw&r=ccFffx721N71hPpKcJGvJIqY9RM4gBTuzp9ir7rze5Q&m=Dxuzh3o-3N5VPrY1tOlZ4zpJfSqjVHY1zRBEyjMLGzU&s=F7UfxjCJGna4uMhq6NnfD8o3JdZC9dhXnGKrPkM1-L4&e= .
--
You received this message because you are subscribed to the Google Groups "GlueX Software Help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gluex-softwar...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gluex-software/066947c7-86e6-2693-17b7-2c188e53ea75%40jlab.org.
But I noticed that there are at least two flavors of standard error output. I only gave an example of one in my original post. Find both examples attached to this post.
To view this discussion on the web visit https://groups.google.com/d/msgid/gluex-software/1ac1587c-c132-9d5e-d5a8-e2f5156a0b8a%40jlab.org.
-->
On Dec 11, 2020, at 1:24 PM, Mark Ito <mark...@gmail.com> wrote:
You received this message because you are subscribed to a topic in the Google Groups "GlueX Software Help" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/gluex-software/kmb9M_BxxJY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to gluex-softwar...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gluex-software/8789d5dd-be18-464d-be60-5253bc9df523n%40googlegroups.com.