MCWrapper failed to test properly

40 views
Skip to first unread message

Edmundo Barriga

unread,
Oct 21, 2020, 9:17:07 AM10/21/20
to GlueX Software Help
On behalf of Gabriel:

Hello experts,

I requested a simulation using gen_amp through the MCWrapper webpage. It failed to test properly but the reason is not clear why. Below I include some files that should point to the reason (thanks to Thomas for providing these).
Any assistance would be appreciated.

Regards,

Gabriel
testerr.err
testlog.log

Sean Dobbs

unread,
Oct 21, 2020, 9:24:48 AM10/21/20
to Edmundo Barriga, GlueX Software Help
A quick follow-up to this, since there has been some discussion off-list:

It's clear that the crash is being caused by having two consecutive
event #1's, see e.g. the warning in the log file:

JANA >>WARNING: Calling DSourceComboer::Reset_NewEvent() with repeated
run number: 1

Apparently this is happening at the gen_amp level. It would be
helpful to know if other recent gen_amp samples have this "feature",
which does not always cause a crash.

---Sean
> --
> You received this message because you are subscribed to the Google Groups "GlueX Software Help" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to gluex-softwar...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/gluex-software/3ed2570e-70f9-4573-8ae7-8b2050b2ecdco%40googlegroups.com.

Alexander Austregesilo

unread,
Oct 21, 2020, 9:50:45 AM10/21/20
to gluex-s...@googlegroups.com
I just ran gen_amp with your config file, using the exact same
environment, and I did not see any duplicate event numbers. Could you
please confirm that there are 2 events with the same event number in the
generated hddm file from the test?
>> To view this discussion on the web visit https://urldefense.proofpoint.com/v2/url?u=https-3A__groups.google.com_d_msgid_gluex-2Dsoftware_3ed2570e-2D70f9-2D4573-2D8ae7-2D8b2050b2ecdco-2540googlegroups.com&d=DwIBaQ&c=CJqEzB1piLOyyvZjb8YUQw&r=9LGv0gfS3B8uAbsk8r_cEX_4GVRxd2wkj-RJy5MLidg&m=22KT1kV4wTFCy5TMbGwFoPe1nwbjYfe3UE9ccwW8c_k&s=vzQiMlNEIG7n1zVG-G8aMy8h5QhCrpuYDkw6psZE65Y&e= .

Peter Pauli

unread,
Oct 21, 2020, 10:09:13 AM10/21/20
to GlueX Software Help
For some reason the generated hddm file seems to be deleted. But the hddm output from the hdgeant4 step contains two eventNo=1 events. I was able to analyse the smeared hddm file when I use analysis-2017_01-ver39.xml (halld_recon 4.18.1) but not with analysis-2017_01-ver40.xml or ver41.xml (halld_recon 4.19.0). Comparing the two halld_recon releases I don’t see an obvious change that could cause this.

Sean Dobbs

unread,
Oct 21, 2020, 10:34:15 AM10/21/20
to Alexander Austregesilo, GlueX Software Help Email List
I copied the test files for anyone who is interested:

/volatile/halld/home/sdobbs/TestProj_1359

So here I am seeing a repeated event number:
(mea culpa, this wasn't at gen_amp stage, clearly I needed more coffee
before I started looking into this)

ifarm1802.jlab.org> hddm-xml dana_rest_gen_amp_030274_000.hddm | grep
eventNo | head
<reconstructedPhysicsEvent eventNo="0" runNo="0">
<reconstructedPhysicsEvent eventNo="1" runNo="30274">
<reconstructedPhysicsEvent eventNo="1" runNo="30274">
<reconstructedPhysicsEvent eventNo="2" runNo="30274">
<reconstructedPhysicsEvent eventNo="3" runNo="30274">
<reconstructedPhysicsEvent eventNo="4" runNo="30274">
<reconstructedPhysicsEvent eventNo="5" runNo="30274">
<reconstructedPhysicsEvent eventNo="6" runNo="30274">
<reconstructedPhysicsEvent eventNo="7" runNo="30274">
<reconstructedPhysicsEvent eventNo="8" runNo="30274">

But if I run the commands myself, I don't see this repetition:
see files in /volatile/halld/home/sdobbs/TestProj_1359.test

---Sean
> To view this discussion on the web visit https://groups.google.com/d/msgid/gluex-software/9a61da3b-3120-3f7e-ecdb-5daa489a87d7%40jlab.org.

Sean Dobbs

unread,
Oct 21, 2020, 10:51:02 AM10/21/20
to Alexander Austregesilo, GlueX Software Help Email List
For what it's worth, I ran the MCWrapper configuration file used in
the test and couldn't reproduce the problem interactively on the
ifarm.
Thomas said he'll look into reproducibility more as well.

---Sean

Peter Pauli

unread,
Nov 2, 2020, 5:50:04 AM11/2/20
to Sean Dobbs, Alexander Austregesilo, GlueX Software Help Email List
Hi everyone,

there is another project that seems to fail because of something similar.
All files to reproduce the error can be found here /work/halld/home/ppauli/test_gabyrod/
just copy the directory and run 
gxenv /group/halld/www/halldweb/html/halld_versions/analysis-2017_01-ver43.xml
hd_root --config=ana_jana.cfg dana_rest_gen_amp_030274_000.hddm

This time I don’t think it is caused by multiple eventNo=1, maybe it was a red herring with the previous project?
As in the previous project I can successfully run the REST file when I use analysis-2017_01-ver39.xml instead.

Does anyone have any ideas what could go wrong?

See the terminal output of a crash below.

Cheers,
Peter


JANA >>Reading configuration from "ana_jana.cfg" ...
JANA >>OUTPUT_FILENAME: hd_root.root
JANA >>Initializing plugin "/group/halld/Software/builds/Linux_RHEL7-x86_64-gcc4.8.5/halld_recon/halld_recon-4.19.0^jana082/Linux_RHEL7-x86_64-gcc4.8.5/plugins/ReactionFilter.so" ...
Opened ROOT file "hd_root.root" ...
JANA >>Opening source "dana_rest_gen_amp_030274_000.hddm" of type: REST
   | This is a REST event stream...
JANA >>Launching threads .


JANA >>Created JCalibration object of type: JCalibrationCCDB
JANA >>Generated via: JCalibration using CCDB for MySQL and SQLite databases
JANA >>Run:30274
JANA >>context: default
JANA >>comment: Default constants for analyzing data
JANA >>Creating DGeometry:
JANA >>  Run requested:30274  found:30274
JANA >>  Run validity range: 30274-30274
JANA >>  URL="ccdb:///GEOMETRY/main_HDDS.xml"  context="default"
JANA >>  Type="JGeometryXML"
JANA >>Found 25 material maps in calib. DB
JANA >>Read in 25 material maps for run 30274 containing 76153 grid points total
JANA >>Run 30274 beam spot: x=0.1745 y=-0.002098 z=65 dx/dz=-0.0005445 dy/dz=-0.0003837
Created TGeoManager :0x7f5ff108d070
JANA >>Reading Magnetic field map from Magnets/Solenoid/solenoid_1350A_poisson_20160222 ...
 Nx=221 Ny=1 Nz=701 )  at 0x7f5ff24200e0d)  0.0Hz  (avg.: 0.0Hz)     
Reading fine-mesh B-field data from /group/halld/www/halldweb/html/resources/Magnets/Solenoid/finemeshes/solenoid_1350A_poisson_20160222
 rmin: 0 rmax: 88.5 dr: 0.1 zmin: 0 zmax: 600 dz: 0.1vg.: 0.0Hz)     
 Number of points in z = 6000
 Number of points in r = 885
JANA >>154921 entries found (Created Magnetic field map of type DMagneticFieldMapFineMesh
src/JANA/JGeometryXML.cc:347 Node or attribute not found for xpath "//section/composition/posXYZ[@volume='DIRC']/@X_Y_Z".
src/JANA/JGeometryXML.cc:347 Node or attribute not found for xpath "//section/composition/posXYZ[@volume='TRDGEM']/@X_Y_Z".
JANA >> Beam spot: x=0.1745 y=-0.002098 z=65 dx/dz=-0.0005445 dy/dz=-0.0003837
JANA >>vertex constraint: <none>
src/JANA/JGeometryXML.cc:347 Node or attribute not found for xpath "//composition[@name='forwardTOF_bottom3']/mposY[@volume='FTOL']/@ncopy".
src/JANA/JGeometryXML.cc:347 Node or attribute not found for xpath "//composition[@name='forwardTOF_top3']/mposY[@volume='FTOL']/@ncopy".
src/JANA/JGeometryXML.cc:347 Node or attribute not found for xpath "//composition[@name='forwardTOF_bottom3']/mposY[@volume='FTOL']/@ncopy".
src/JANA/JGeometryXML.cc:347 Node or attribute not found for xpath "//composition[@name='forwardTOF_top3']/mposY[@volume='FTOL']/@ncopy".
ReactionFilter: Reaction: ksks__B4_M16
Photon_Proton__KShort_KShort_Proton
KShort__Pi-_Pi+
KShort__Pi-_Pi+
Error [1270]: in [MySQLDataProvider::GetAssignmentShort(int, const string&, time_t, const string&)] No data was selected. Table '/PHOTON_BEAM/hodoscope/endpoint_calib' for run='30274', timestampt='0' and variation='default'  
JANA >>
JANA >> --- Configuration Parameters --
JANA >> JANA:RESOURCE_DEFAULT_PATH =               
JANA >> PLUGINS                    = ReactionFilter
JANA >> Reaction1                  = 1_14__16_16_14
JANA >> Reaction1:Flags            = B4_M16        
JANA >> THREAD_TIMEOUT             = 30 seconds    
JANA >> -------------------------------
JANA ERROR>> didn't sleep full 0.5 seconds! 0.0Hz  (avg.: 0.2Hz)     
  2.0 events processed  (12.0 events read)  0.0Hz  (avg.: 0.2Hz)     


===========================================================
There was a crash.
This is the entire stack trace of all threads:
===========================================================

Thread 3 (Thread 0x7f6005af0700 (LWP 3035)):
#0  0x00007f60207a4945 in pthread_cond_wait

GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00000000010e5ad0 in jana::JApplication::EventBufferThread (this=this
entry=0x7ffe72c4b700) at src/JANA/JApplication.cc:742
#2  0x00000000010e5d99 in LaunchEventBufferThread (arg=0x7ffe72c4b700) at src/JANA/JApplication.cc:666
#3  0x00007f60207a0dd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f601faa7b3d in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7f60052ef700 (LWP 3038)):
#0  0x00007f601fa6e159 in waitpid () from /lib64/libc.so.6
#1  0x00007f601f9ebde2 in do_system () from /lib64/libc.so.6
#2  0x00007f601f9ec191 in system () from /lib64/libc.so.6
#3  0x00007f6026355b19 in TUnixSystem::StackTrace (this=0x336a8e0) at /u/group/halld/Software/builds/Linux_RHEL7-x86_64-gcc4.8.5/root/root-6.08.06/core/unix/src/TUnixSystem.cxx:2405
#4  0x00007f60263583ec in TUnixSystem::DispatchSignals (this=0x336a8e0, sig=kSigSegmentationViolation) at /u/group/halld/Software/builds/Linux_RHEL7-x86_64-gcc4.8.5/root/root-6.08.06/core/unix/src/TUnixSystem.cxx:3625
#5  <signal handler called>
#6  mass (this=0x0) at libraries/PID/DKinematicData.h:44
#7  energy (this=0x0) at libraries/PID/DKinematicData.h:60
#8  DEventWriterROOT::Fill_BeamData (this=this
entry=0x7f5ff85639e0, locTreeFillData=locTreeFillData
entry=0x7f5fdb7684b0, locArrayIndex=locArrayIndex
entry=0, locBeamPhoton=0x7f5fd2025f60, locVertex=locVertex
entry=0x7f5fe1bc1670, locMCThrownMatching=locMCThrownMatching
entry=0x7f5fe1c35370) at libraries/ANALYSIS/DEventWriterROOT.cc:1529
#9  0x0000000000734009 in DEventWriterROOT::Fill_DataTree (this=this
entry=0x7f5ff85639e0, locEventLoop=locEventLoop
entry=0x7f5ff80008c0, locReaction=0x7f5fdb72e540, locParticleCombos=std::deque with 2 elements = {...}) at libraries/ANALYSIS/DEventWriterROOT.cc:1201
#10 0x000000000073576e in DEventWriterROOT::Fill_DataTrees (this=this
entry=0x7f5ff85639e0, locEventLoop=locEventLoop
entry=0x7f5ff80008c0, locDReactionTag="ReactionFilter") at libraries/ANALYSIS/DEventWriterROOT.cc:1026
#11 0x00007f60184aee02 in DEventProcessor_ReactionFilter::evnt (this=<optimized out>, locEventLoop=0x7f5ff80008c0, locEventNumber=<optimized out>) at plugins/Analysis/ReactionFilter/DEventProcessor_ReactionFilter.cc:75
#12 0x000000000110051a in jana::JEventLoop::OneEvent (this=this
entry=0x7f5ff80008c0) at src/JANA/JEventLoop.cc:693
#13 0x00000000011014b4 in jana::JEventLoop::Loop (this=this
entry=0x7f5ff80008c0) at src/JANA/JEventLoop.cc:496
#14 0x00000000010d859a in LaunchThread (arg=0x7ffe72c4b700) at src/JANA/JApplication.cc:1382
#15 0x00007f60207a0dd5 in start_thread () from /lib64/libpthread.so.0
#16 0x00007f601faa7b3d in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7f6027145240 (LWP 2998)):
#0  0x00007f60207a7eed in nanosleep () from /lib64/libpthread.so.0
#1  0x00000000010e193d in jana::JApplication::Run (this=this
entry=0x7ffe72c4b700, proc=proc
entry=0x3424a90, Nthreads=<optimized out>, Nthreads
entry=0) at src/JANA/JApplication.cc:1613
#2  0x00000000006622fc in main (narg=3, argv=0x7ffe72c4bd68) at programs/Analysis/hd_root/hd_root.cc:45
===========================================================


The lines below might hint at the cause of the crash.
You may get help by asking at the ROOT forum http://root.cern.ch/forum.
Only if you are really convinced it is a bug in ROOT then please submit a
report at http://root.cern.ch/bugs. Please post the ENTIRE stack trace
from above as an attachment in addition to anything else
that might help us fixing this issue.
===========================================================
#6  mass (this=0x0) at libraries/PID/DKinematicData.h:44
#7  energy (this=0x0) at libraries/PID/DKinematicData.h:60
#8  DEventWriterROOT::Fill_BeamData (this=this
entry=0x7f5ff85639e0, locTreeFillData=locTreeFillData
entry=0x7f5fdb7684b0, locArrayIndex=locArrayIndex
entry=0, locBeamPhoton=0x7f5fd2025f60, locVertex=locVertex
entry=0x7f5fe1bc1670, locMCThrownMatching=locMCThrownMatching
entry=0x7f5fe1c35370) at libraries/ANALYSIS/DEventWriterROOT.cc:1529
#9  0x0000000000734009 in DEventWriterROOT::Fill_DataTree (this=this
entry=0x7f5ff85639e0, locEventLoop=locEventLoop
entry=0x7f5ff80008c0, locReaction=0x7f5fdb72e540, locParticleCombos=std::deque with 2 elements = {...}) at libraries/ANALYSIS/DEventWriterROOT.cc:1201
#10 0x000000000073576e in DEventWriterROOT::Fill_DataTrees (this=this
entry=0x7f5ff85639e0, locEventLoop=locEventLoop
entry=0x7f5ff80008c0, locDReactionTag="ReactionFilter") at libraries/ANALYSIS/DEventWriterROOT.cc:1026
#11 0x00007f60184aee02 in DEventProcessor_ReactionFilter::evnt (this=<optimized out>, locEventLoop=0x7f5ff80008c0, locEventNumber=<optimized out>) at plugins/Analysis/ReactionFilter/DEventProcessor_ReactionFilter.cc:75
#12 0x000000000110051a in jana::JEventLoop::OneEvent (this=this
entry=0x7f5ff80008c0) at src/JANA/JEventLoop.cc:693
#13 0x00000000011014b4 in jana::JEventLoop::Loop (this=this
entry=0x7f5ff80008c0) at src/JANA/JEventLoop.cc:496
#14 0x00000000010d859a in LaunchThread (arg=0x7ffe72c4b700) at src/JANA/JApplication.cc:1382
#15 0x00007f60207a0dd5 in start_thread () from /lib64/libpthread.so.0
#16 0x00007f601faa7b3d in clone () from /lib64/libc.so.6

===========================================================



Alexander Austregesilo

unread,
Nov 2, 2020, 9:42:05 AM11/2/20
to Peter Pauli, Sean Dobbs, Mark Ito, GlueX Software Help Email List

Hi Peter,

This crash is a result of a bug in halld_recon tag 4.19.0. We tried to flag the generated photon without checking if it was tagged, which can cause problems when the generated energy is in the sampling region of the hodoscope. I did not notice it when I tested it and only generated events in the coherent peak region. This commit fixed the bug:

https://github.com/JeffersonLab/halld_recon/commit/2de3a71cb2d8a7041a860ffbf12a49178d736b06

, but it was only included in 4.20.0. Since it only affects the simulation part, we could patch the tag 4.19.0 to 4.19.1 and modify the link to analysis-2017_01-ver43.xml. What do you think, Mark?

Cheers,

Alex

Mark Ito

unread,
Nov 2, 2020, 3:28:16 PM11/2/20
to Alexander Austregesilo, Peter Pauli, Sean Dobbs, GlueX Software Help Email List

Yes. That could be done. Helps that you report the specific commit. Should I do this?

Alexander Austregesilo

unread,
Nov 2, 2020, 3:40:21 PM11/2/20
to Mark Ito, Peter Pauli, Sean Dobbs, GlueX Software Help Email List

OK. The halld_recon tag 4.19.0 is used in several version xml files, but as far as I can see only 2 were used for analysis launches:

version_4.24.0.xml

and

version_4.27.0.xml

For the moment, you can just make the new halld_recon tag 4.19.1, build a new version_4.27.1.xml with it, and Peter should be able to confirm that this really fixes the problem.

Mark Ito

unread,
Nov 2, 2020, 3:43:30 PM11/2/20
to Alexander Austregesilo, Peter Pauli, Sean Dobbs, GlueX Software Help Email List

OK. Will do.

Peter Pauli

unread,
Nov 3, 2020, 4:36:28 AM11/3/20
to Mark Ito, Alexander Austregesilo, Sean Dobbs, GlueX Software Help Email List
Thank you everyone!
I just re-ran the test with versiion_4.27.1.xml and it works absolutely fine.

I think all that is left to do is to link the version files pointing to 4.24.0 and 4.27.0 to the new version xmls and to make sure that the changes propagate to the osg. Then we should be good to go again.

Cheers,
Peter

Alexander Austregesilo

unread,
Nov 3, 2020, 8:41:13 AM11/3/20
to Peter Pauli, Mark Ito, Sean Dobbs, GlueX Software Help Email List

Excellent. I just updated the link to analysis-2017_01-ver43.xml on the group disk. It may take a few hours until it propagates to the osg world.

Mark, can you implement the same patched halld_recon version in version_4.24.1.xml? I guess, it will be called version_4.24.2.xml then.

Thank you,

Alex

Mark Ito

unread,
Nov 3, 2020, 9:07:39 AM11/3/20
to Alexander Austregesilo, Peter Pauli, Sean Dobbs, GlueX Software Help Email List

OK. Got it.

Reply all
Reply to author
Forward
0 new messages