Fwd: [EXTERNAL] Error message when running jobs on farm

530 views
Skip to first unread message

Mark Ito

unread,
Jan 24, 2022, 2:43:06 PM1/24/22
to Software Help

A recent email conversation I've been having with Nilanga. Others may have more insight into his problem than I have...

-------- Forwarded Message --------
Subject: Re: [EXTERNAL] Error message when running jobs on farm
Date: Mon, 24 Jan 2022 11:47:31 -0500
From: Nilanga Wickramaarachchi <wickrama...@cua.edu>
To: Mark Ito <ma...@jlab.org>


All the jobs I tested produced the output files but for some jobs it show SWIF error and I see the error with the ~/.history file again in the error log file of the job.

ifarm1802.jlab.org> swif status -workflow bggen_Sigma0_pi0_2018_01

workflow_id                   = 189921

workflow_name                 = bggen_Sigma0_pi0_2018_01

workflow_user                 = nwickjlb

jobs                          = 178

succeeded                     = 124

problems                      = 54

problem_types                 = SWIF-USER-NON-ZERO

problem_swif_user_non_zero    = 54

attempts                      = 261

create_ts                     = 2022-01-23 21:03:06.0

update_ts                     = 2022-01-24 10:41:49.0

current_ts                    = 2022-01-24 11:02:37.0


ifarm1802.jlab.org> cat /volatile/halld/home/nwickjlb/bggen_Sigma0_pi0_2018_01/batch02/log/040859/stderr.040859_004.err

stty: standard input: Inappropriate ioctl for device

src/JANA/JGeometryXML.cc:347 Node or attribute not found for xpath "//section/composition/posXYZ[@volume='ForwardMWPC']/@X_Y_Z".

src/JANA/JGeometryXML.cc:347 Node or attribute not found for xpath "//section[@name='ForwardMWPC']/box[@name='CPPF']/@X_Y_Z".

Can't load history: $< line too long.

Can't load history: $< line too long.


On Mon, Jan 24, 2022 at 9:43 AM Nilanga Wickramaarachchi <wickrama...@cua.edu> wrote:
Thanks, I don't see the previous  message when logging out. I'll check if the jobs will work now.

On Mon, Jan 24, 2022 at 9:39 AM Mark Ito <ma...@jlab.org> wrote:

Try deleting it. It is used for the "history" command. Should not be essential.

On 1/24/22 9:37 AM, Nilanga Wickramaarachchi wrote:
When I try to open that file it says it's too large. It's about 16 MB.

[nwickjlb@ifarm1802 ~]$ ls -lah .history 

-rw------- 1 nwickjlb TCP 16M Jan 23 19:31 .history


On Mon, Jan 24, 2022, 8:59 AM Mark Ito <ma...@jlab.org> wrote:

do you have a long line in your ~/.history file?

On 1/24/22 8:04 AM, Nilanga Wickramaarachchi wrote:
Hi Mark,
I ran some jobs on the farm but they failed and when I look at the error file I see the message below about "history". I also see it when logging out from the farm.
  
Please let me know if you have any suggestions to fix it.

Thanks,
Nilanga

ifarm1802.jlab.org> cat /volatile/halld/home/nwickjlb/bggen_Sigma0_pi0_2018_01/batch02/log/040856/stderr.040856_028.err

Can't load history: $< line too long.

Can't load history: $< line too long.

df: ‘/w/eic-scshelf2104’: Stale file handle

src/JANA/JGeometryXML.cc:347 Node or attribute not found for xpath "//section/composition/posXYZ[@volume='ForwardMWPC']/@X_Y_Z".

src/JANA/JGeometryXML.cc:347 Node or attribute not found for xpath "//section[@name='ForwardMWPC']/box[@name='CPPF']/@X_Y_Z".

Can't load history: $< line too long.

Can't load history: $< line too long.

ifarm1802.jlab.org>


ifarm1802.jlab.org> exit

logout

Can't load history: $< line too long.

Connection to ifarm1802 closed.

NILANGA INDRAJIE WICKRAMAARACHCHI

unread,
Jan 25, 2022, 5:59:35 PM1/25/22
to Mark Ito, Software Help
Hi all,
I’m not sure if this is related to the problem with ~/.history file. I see some bggen hddm files in my home directory. Not sure how they appeared there.
Could someone let me know is it ok to delete them (are they just copies) ? 

/u/home/nwickjlb
dana_rest_bggen_040933_019.hddm  dana_rest_bggen_041097_064.hddm  dana_rest_bggen_041205_006.hddm
dana_rest_bggen_040933_024.hddm  dana_rest_bggen_041100_007.hddm  dana_rest_bggen_041206_026.hddm
dana_rest_bggen_041075_003.hddm  dana_rest_bggen_041100_009.hddm  dana_rest_bggen_041206_035.hddm
dana_rest_bggen_041075_048.hddm  dana_rest_bggen_041100_011.hddm  dana_rest_bggen_041206_039.hddm
dana_rest_bggen_041078_003.hddm  dana_rest_bggen_041102_056.hddm  dana_rest_bggen_041206_055.hddm
dana_rest_bggen_041078_032.hddm  dana_rest_bggen_041102_060.hddm  dana_rest_bggen_041206_064.hddm
dana_rest_bggen_041078_036.hddm  dana_rest_bggen_041106_062.hddm  dana_rest_bggen_041206_072.hddm
dana_rest_bggen_041078_047.hddm  dana_rest_bggen_041107_055.hddm  dana_rest_bggen_041206_084.hddm
dana_rest_bggen_041078_048.hddm  dana_rest_bggen_041107_064.hddm  dana_rest_bggen_041207_040.hddm
dana_rest_bggen_041079_030.hddm  dana_rest_bggen_041203_017.hddm  dana_rest_bggen_041207_054.hddm
dana_rest_bggen_041084_051.hddm  dana_rest_bggen_041203_050.hddm  dana_rest_bggen_041207_071.hddm
dana_rest_bggen_041084_060.hddm  dana_rest_bggen_041203_075.hddm  dana_rest_bggen_041287_029.hddm
dana_rest_bggen_041088_003.hddm  dana_rest_bggen_041204_000.hddm  dana_rest_bggen_051733_001.hddm
dana_rest_bggen_041089_022.hddm  dana_rest_bggen_041204_031.hddm  dana_rest_bggen_051733_004.hddm
dana_rest_bggen_041089_054.hddm  dana_rest_bggen_041204_041.hddm  dana_rest_bggen_051733_008.hddm
dana_rest_bggen_041089_069.hddm  dana_rest_bggen_041204_050.hddm  dana_rest_bggen_051748_011.hddm
dana_rest_bggen_041097_024.hddm  dana_rest_bggen_041204_061.hddm  dana_rest_bggen_051748_035.hddm
dana_rest_bggen_041097_049.hddm  dana_rest_bggen_041204_070.hddm  dana_rest_bggen_051748_039.hddm
dana_rest_bggen_041097_055.hddm  dana_rest_bggen_041205_003.hddm  dana_rest_bggen_051748_042.hddm

Thanks,
Nilanga

--
You received this message because you are subscribed to the Google Groups "GlueX Software Help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gluex-softwar...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gluex-software/d2382142-15f4-a6ac-9259-bec2f4c7c910%40jlab.org.

Churamani Paudel

unread,
Jan 25, 2022, 6:15:52 PM1/25/22
to NILANGA INDRAJIE WICKRAMAARACHCHI, Mark Ito, Software Help
Hello Experts: 

I  also see several root trees appearing in home directory from last couple of days. These appear to be hidden in size and du -ah command does  not show their size. SO I wonder if these have any impact on siwf jobs or any other effects. 

Sincerely
Churamani 

tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051251.root

tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051263.root

tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051287.root

tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051289.root

tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051315.root

tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051322.root

tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051333.root

tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051335.root

tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051336.root

tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051502.root

tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051520.root

tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051521.root

tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051560.root

tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051577.root

tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051578.root

tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051579.root

tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051580.root

tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051581.root


NILANGA INDRAJIE WICKRAMAARACHCHI

unread,
Jan 26, 2022, 9:09:08 AM1/26/22
to Churamani Paudel, Mark Ito, Software Help
Hi Churamani,
Did you try ls -alh command ? I see few files have links to files in /cache.

ifarm1901.jlab.org> ls -lah *.hddm
-rw-rw-r-- 1 nwickjlb TCP 106M Jan 12 21:50 dana_rest_bggen_040933_019.hddm
-rw-rw-r-- 1 nwickjlb TCP 107M Jan 14 20:47 dana_rest_bggen_040933_024.hddm
-rw-rw-r-- 1 nwickjlb TCP  99M Jan 13 01:48 dana_rest_bggen_041075_003.hddm
-rw-rw-r-- 1 nwickjlb TCP  94M Jan 13 01:54 dana_rest_bggen_041075_048.hddm
-rw-rw-r-- 1 nwickjlb TCP  94M Jan 13 02:10 dana_rest_bggen_041078_003.hddm
-rw-rw-r-- 1 nwickjlb TCP  94M Jan 13 02:15 dana_rest_bggen_041078_032.hddm
-rw-rw-r-- 1 nwickjlb TCP  94M Jan 14 15:49 dana_rest_bggen_041078_036.hddm
-rw-rw-r-- 1 nwickjlb TCP  94M Jan 14 15:51 dana_rest_bggen_041078_047.hddm
-rw-rw-r-- 1 nwickjlb TCP  93M Jan 13 02:18 dana_rest_bggen_041078_048.hddm
-rw-rw-r-- 1 nwickjlb TCP  22M Jan 14 12:27 dana_rest_bggen_041079_030.hddm
-rw-rw-r-- 1 nwickjlb TCP  94M Jan 13 02:29 dana_rest_bggen_041084_051.hddm
-rw-rw-r-- 1 nwickjlb TCP  94M Jan 14 15:45 dana_rest_bggen_041084_060.hddm
-rw-rw-r-- 1 nwickjlb TCP  94M Jan 13 02:11 dana_rest_bggen_041088_003.hddm
-rw-rw-r-- 1 nwickjlb TCP  93M Jan 13 02:48 dana_rest_bggen_041089_022.hddm
-rw-rw-r-- 1 nwickjlb TCP  93M Jan 13 02:34 dana_rest_bggen_041089_054.hddm
-rw-rw-r-- 1 nwickjlb TCP  24M Jan 13 01:22 dana_rest_bggen_041089_069.hddm
-rw-rw-r-- 1 nwickjlb TCP  94M Jan 13 05:10 dana_rest_bggen_041097_024.hddm
-rw-rw-r-- 1 nwickjlb TCP  94M Jan 14 15:58 dana_rest_bggen_041097_049.hddm
-rw-rw-r-- 1 nwickjlb TCP  94M Jan 13 02:47 dana_rest_bggen_041097_055.hddm
-rw-rw-r-- 1 nwickjlb TCP 2.5M Jan 12 22:58 dana_rest_bggen_041097_064.hddm
-rw-rw-r-- 1 nwickjlb TCP  97M Jan 14 15:56 dana_rest_bggen_041100_007.hddm
-rw-rw-r-- 1 nwickjlb TCP  96M Jan 13 03:11 dana_rest_bggen_041100_009.hddm
-rw-rw-r-- 1 nwickjlb TCP  97M Jan 13 02:52 dana_rest_bggen_041100_011.hddm
-rw-rw-r-- 1 nwickjlb TCP  99M Jan 14 16:00 dana_rest_bggen_041102_056.hddm
-rw-rw-r-- 1 nwickjlb TCP  98M Jan 13 02:32 dana_rest_bggen_041102_060.hddm
-rw-rw-r-- 1 nwickjlb TCP  98M Jan 13 02:44 dana_rest_bggen_041106_062.hddm
-rw-rw-r-- 1 nwickjlb TCP  99M Jan 13 03:06 dana_rest_bggen_041107_055.hddm
-rw-rw-r-- 1 nwickjlb TCP  99M Jan 14 16:02 dana_rest_bggen_041107_064.hddm
-rw-rw-r-- 1 nwickjlb TCP  99M Jan 13 08:25 dana_rest_bggen_041203_017.hddm
-rw-rw-r-- 1 nwickjlb TCP  99M Jan 15 00:47 dana_rest_bggen_041203_050.hddm
-rw-rw-r-- 1 nwickjlb TCP  99M Jan 13 06:32 dana_rest_bggen_041203_075.hddm
-rw-rw-r-- 1 nwickjlb TCP  95M Jan 13 06:33 dana_rest_bggen_041204_000.hddm
-rw-rw-r-- 1 nwickjlb TCP  94M Jan 13 14:20 dana_rest_bggen_041204_031.hddm
-rw-rw-r-- 1 nwickjlb TCP  95M Jan 13 07:00 dana_rest_bggen_041204_041.hddm
-rw-rw-r-- 1 nwickjlb TCP  95M Jan 13 07:14 dana_rest_bggen_041204_050.hddm
-rw-rw-r-- 1 nwickjlb TCP  94M Jan 13 07:03 dana_rest_bggen_041204_061.hddm
-rw-rw-r-- 1 nwickjlb TCP  94M Jan 13 06:55 dana_rest_bggen_041204_070.hddm
-rw-rw-r-- 1 nwickjlb TCP  92M Jan 14 16:31 dana_rest_bggen_041205_003.hddm
-rw-rw-r-- 1 nwickjlb TCP  92M Jan 14 16:28 dana_rest_bggen_041205_006.hddm
-rw-rw-r-- 1 nwickjlb TCP  99M Jan 13 07:00 dana_rest_bggen_041206_026.hddm
-rw-rw-r-- 1 nwickjlb TCP  99M Jan 13 07:45 dana_rest_bggen_041206_035.hddm
-rw-rw-r-- 1 nwickjlb TCP  99M Jan 15 05:47 dana_rest_bggen_041206_039.hddm
-rw-rw-r-- 1 nwickjlb TCP  99M Jan 13 06:59 dana_rest_bggen_041206_055.hddm
-rw-rw-r-- 1 nwickjlb TCP  99M Jan 13 07:00 dana_rest_bggen_041206_064.hddm
-rw-rw-r-- 1 nwickjlb TCP  99M Jan 13 17:59 dana_rest_bggen_041206_072.hddm
-rw-rw-r-- 1 nwickjlb TCP  55M Jan 13 05:24 dana_rest_bggen_041206_084.hddm
-rw-rw-r-- 1 nwickjlb TCP  99M Jan 13 06:47 dana_rest_bggen_041207_040.hddm
-rw-rw-r-- 1 nwickjlb TCP  99M Jan 13 06:51 dana_rest_bggen_041207_054.hddm
-rw-rw-r-- 1 nwickjlb TCP  98M Jan 13 06:56 dana_rest_bggen_041207_071.hddm
-rw-rw-r-- 1 nwickjlb TCP  26M Jan 13 10:04 dana_rest_bggen_041287_029.hddm
lrwxrwxrwx 1 nwickjlb TCP  126 Jan 22 09:53 dana_rest_bggen_051733_001.hddm -> /cache/halld/gluex_simulations/REQUESTED_MC/F2018_ver02_21_bggen_batch04_20211205090407pm/hddm/dana_rest_bggen_051733_001.hddm
lrwxrwxrwx 1 nwickjlb TCP  126 Jan 22 09:53 dana_rest_bggen_051733_004.hddm -> /cache/halld/gluex_simulations/REQUESTED_MC/F2018_ver02_21_bggen_batch04_20211205090407pm/hddm/dana_rest_bggen_051733_004.hddm
lrwxrwxrwx 1 nwickjlb TCP  126 Jan 22 09:53 dana_rest_bggen_051733_008.hddm -> /cache/halld/gluex_simulations/REQUESTED_MC/F2018_ver02_21_bggen_batch04_20211205090407pm/hddm/dana_rest_bggen_051733_008.hddm
lrwxrwxrwx 1 nwickjlb TCP  126 Jan 22 09:56 dana_rest_bggen_051748_011.hddm -> /cache/halld/gluex_simulations/REQUESTED_MC/F2018_ver02_21_bggen_batch04_20211205090407pm/hddm/dana_rest_bggen_051748_011.hddm
lrwxrwxrwx 1 nwickjlb TCP  126 Jan 22 09:56 dana_rest_bggen_051748_035.hddm -> /cache/halld/gluex_simulations/REQUESTED_MC/F2018_ver02_21_bggen_batch04_20211205090407pm/hddm/dana_rest_bggen_051748_035.hddm
lrwxrwxrwx 1 nwickjlb TCP  126 Jan 22 09:56 dana_rest_bggen_051748_039.hddm -> /cache/halld/gluex_simulations/REQUESTED_MC/F2018_ver02_21_bggen_batch04_20211205090407pm/hddm/dana_rest_bggen_051748_039.hddm
lrwxrwxrwx 1 nwickjlb TCP  126 Jan 22 09:56 dana_rest_bggen_051748_042.hddm -> /cache/halld/gluex_simulations/REQUESTED_MC/F2018_ver02_21_bggen_batch04_20211205090407pm/hddm/dana_rest_bggen_051748_042.hddm

Churamani Paudel

unread,
Jan 26, 2022, 9:13:13 AM1/26/22
to NILANGA INDRAJIE WICKRAMAARACHCHI, Mark Ito, Software Help
Hi Nilanga: 

I checked with command you suggested: 

yes files have links to /cache as follows: so that means  these files may not affect swif jobs, as they are probably softlinked by some experts. 


152 Jan 24 21:35 tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051071.root -> /cache/halld/RunPeriod-2018-08/analysis/ver14/tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17/merged/tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051071.root

lrwxrwxrwx    1 churaman ENP   152 Jan 24 21:35 tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051100.root -> /cache/halld/RunPeriod-2018-08/analysis/ver14/tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17/merged/tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051100.root

lrwxrwxrwx    1 churaman ENP   152 Jan 24 21:37 tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051155.root -> /cache/halld/RunPeriod-2018-08/analysis/ver14/tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17/merged/tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051155.root

lrwxrwxrwx    1 churaman ENP   152 Jan 22 11:34 tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051159.root -> /cache/halld/RunPeriod-2018-08/analysis/ver14/tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17/merged/tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051159.root

lrwxrwxrwx    1 churaman ENP   152 Jan 22 11:34 tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051160.root -> /cache/halld/RunPeriod-2018-08/analysis/ver14/tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17/merged/tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051160.root

lrwxrwxrwx    1 churaman ENP   152 Jan 24 21:38 tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051196.root -> /cache/halld/RunPeriod-2018-08/analysis/ver14/tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17/merged/tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051196.root

lrwxrwxrwx    1 churaman ENP   152 Jan 24 21:39 tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051213.root -> /cache/halld/RunPeriod-2018-08/analysis/ver14/tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17/merged/tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051213.root

lrwxrwxrwx    1 churaman ENP   152 Jan 22 11:35 tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051250.root -> /cache/halld/RunPeriod-2018-08/analysis/ver14/tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17/merged/tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051250.root

lrwxrwxrwx    1 churaman ENP   152 Jan 22 11:35 tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051251.root -> /cache/halld/RunPeriod-2018-08/analysis/ver14/tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17/merged/tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051251.root

lrwxrwxrwx    1 churaman ENP   152 Jan 24 21:43 tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051263.root -> /cache/halld/RunPeriod-2018-08/analysis/ver14/tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17/merged/tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051263.root

lrwxrwxrwx    1 churaman ENP   152 Jan 22 11:35 tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051287.root -> /cache/halld/RunPeriod-2018-08/analysis/ver14/tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17/merged/tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051287.root

lrwxrwxrwx    1 churaman ENP   152 Jan 24 21:44 tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051289.root -> /cache/halld/RunPeriod-2018-08/analysis/ver14/tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17/merged/tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051289.root

lrwxrwxrwx    1 churaman ENP   152 Jan 24 21:44 tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051315.root -> /cache/halld/RunPeriod-2018-08/analysis/ver14/tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17/merged/tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051315.root

lrwxrwxrwx    1 churaman ENP   152 Jan 22 11:36 tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051322.root -> /cache/halld/RunPeriod-2018-08/analysis/ver14/tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17/merged/tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051322.root

lrwxrwxrwx    1 churaman ENP   152 Jan 22 11:36 tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051333.root -> /cache/halld/RunPeriod-2018-08/analysis/ver14/tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17/merged/tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051333.root

lrwxrwxrwx    1 churaman ENP   152 Jan 22 11:36 tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051335.root -> /cache/halld/RunPeriod-2018-08/analysis/ver14/tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17/merged/tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051335.root

lrwxrwxrwx    1 churaman ENP   152 Jan 22 11:36 tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051336.root -> /cache/halld/RunPeriod-2018-08/analysis/ver14/tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17/merged/tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051336.root

lrwxrwxrwx    1 churaman ENP   152 Jan 24 21:47 tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051502.root -> /cache/halld/RunPeriod-2018-08/analysis/ver14/tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17/merged/tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051502.root

lrwxrwxrwx    1 churaman ENP   152 Jan 22 11:37 tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051520.root -> /cache/halld/RunPeriod-2018-08/analysis/ver14/tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17/merged/tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051520.root

lrwxrwxrwx    1 churaman ENP   152 Jan 22 11:37 tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051521.root -> /cache/halld/RunPeriod-2018-08/analysis/ver14/tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17/merged/tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051521.root

lrwxrwxrwx    1 churaman ENP   152 Jan 22 11:37 tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051560.root -> /cache/halld/RunPeriod-2018-08/analysis/ver14/tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17/merged/tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051560.root

lrwxrwxrwx    1 churaman ENP   152 Jan 22 11:38 tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051577.root -> /cache/halld/RunPeriod-2018-08/analysis/ver14/tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17/merged/tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051577.root

lrwxrwxrwx    1 churaman ENP   152 Jan 22 11:38 tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051578.root -> /cache/halld/RunPeriod-2018-08/analysis/ver14/tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17/merged/tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051578.root

lrwxrwxrwx    1 churaman ENP   152 Jan 22 11:38 tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051579.root -> /cache/halld/RunPeriod-2018-08/analysis/ver14/tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17/merged/tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051579.root

lrwxrwxrwx    1 churaman ENP   152 Jan 22 11:38 tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051580.root -> /cache/halld/RunPeriod-2018-08/analysis/ver14/tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17/merged/tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051580.root

lrwxrwxrwx    1 churaman ENP   152 Jan 22 11:38 tree_etapr__eta_gmissg__B2_F1_T0_U1_M35_M17_051581.root -> /cache/halld/RunPeriod-20

Nathan Baltzell

unread,
Jan 26, 2022, 9:19:48 AM1/26/22
to NILANGA INDRAJIE WICKRAMAARACHCHI, Churamani Paudel, Mark Ito, Software Help
Would those happen to be input files for your batch jobs submitted with SWIF?

If so, one way I've seen that happen is something's wrong on a node and the local /scratch filesystem isn't available, and the system doesn't check whether a job is in the correct working directory.  And then those jobs end up running in $HOME and doing whatever file staging is requested there and making a mess.  Also, if inputs are specified to be on /mss, the system does symlinks to /cache rather than a copy.  I do a "expr $PWD : ^/scratch/slurm" and abort if it fails, to avoid all that.

-Nathan


Churamani Paudel

unread,
Jan 26, 2022, 9:25:07 AM1/26/22
to Nathan Baltzell, NILANGA INDRAJIE WICKRAMAARACHCHI, Mark Ito, Software Help
Hi Nathan,

these were input files for job submission with SWIF in my case, and there were some failed jobs as well, at the end what I see for those failed jobs that input files to failed jobs were appearing in the form of links to my home directory.  May be since SWIF1 is retiring, gradually nodes may not be available, 
Thanks for your clarification.
Churamani 

NILANGA INDRAJIE WICKRAMAARACHCHI

unread,
Jan 26, 2022, 9:27:04 AM1/26/22
to Nathan Baltzell, Churamani Paudel, Mark Ito, Software Help
Would those happen to be input files for your batch jobs submitted with SWIF?
Yes. So that means those input files were copied to $HOME when the jobs ran.

Thanks for the clarification.

Nathan Baltzell

unread,
Jan 26, 2022, 9:27:31 AM1/26/22
to Churamani Paudel, NILANGA INDRAJIE WICKRAMAARACHCHI, Mark Ito, Software Help
Well, I wouldn't think SWIF1 versus SWIF2 would have much to do with it, unless they changed whether it checks the current working directory before staging.

-Nathan


Mark Ito

unread,
Jan 26, 2022, 9:59:59 AM1/26/22
to Software Help

Nathan,

Thanks for that response. Very helpful.

  -- Mark

Alexander Austregesilo

unread,
Jan 26, 2022, 10:16:21 AM1/26/22
to gluex-s...@googlegroups.com

Thank you for the suggestion. Indeed, I have occasionally seen the behavior that Nathan describes. I will add the check to our production scripts and someone should do the same to MCWrapper.

Cheers,

Alex

Nathan Baltzell

unread,
Jan 26, 2022, 10:40:10 AM1/26/22
to Alexander Austregesilo, gluex-s...@googlegroups.com
The actual job directory for SWIF is supposed to be /scratch/slurm/$SLURM_JOB_ID, probably we should really check $PWD for that instead, but I never bothered so far ...

I also see $SLURM_JOBID, and it has the same value, not sure which one is more proper.


Reply all
Reply to author
Forward
0 new messages