Problem with submitting jobs to swif2

108 views
Skip to first unread message

Mariana Khachatryan

unread,
Feb 4, 2022, 3:14:56 PM2/4/22
to GlueX Software Help Email List
Dear all,

I have created a swif2 workflow by “swif2 create -workflow bootstrap_SPD" and am trying to submit jobs using my old scrip, after replacing 
 --project and --track in SWIF1 --add-job  by --account and partition,
but it gives the following error:

farm1901.jlab.org> python script_Fit.py
/w/halld-scshelf2101/Mariana/PWA_challenge/Fit_real_data_5_phase1data_massconst_flatMCrandtrig/fit_data_allM_allepsilon_loop_moments_bootstrapping/EtaPi_fit/script
/work/halld/Mariana/PWA_challenge/Fit_real_data_5_phase1data_massconst_flatMCrandtrig/fit_data_allM_allepsilon_loop_moments_bootstrapping/EtaPi_fit/script/bin_0_0.csh
/work/halld/Mariana/PWA_challenge/Fit_real_data_5_phase1data_massconst_flatMCrandtrig/fit_data_allM_allepsilon_loop_moments_bootstrapping/EtaPi_fit/script/fit_bin_0_0.py
ERROR Bad Request: Standard output path must be absolute (i.e. start with /)

I have attached here the submission script (script_Fit.py) I’m using and scripts fit_bin_0_0.py and bin_0_0.csh that ii is creating for each of the jobs (these are for first job). 
All the paths are absolute, so I don’t know what is the problem. 
Can you check to see if there is a problem there?
Also if you have a similar working example for swif2 can you please send it to me?

bin_0_0.csh
fit_bin_0_0.py
script_Fit.py

Sean Dobbs

unread,
Feb 4, 2022, 3:17:00 PM2/4/22
to Mariana Khachatryan, GlueX Software Help Email List
Hi Mariana,

For the arguments to -stdout and -stderr, you need to remove the "file:" prefix.

---Sean
> --
> You received this message because you are subscribed to the Google Groups "GlueX Software Help" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to gluex-softwar...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/gluex-software/9254000D-14D3-46E1-8688-13CE8DFFFF23%40gmail.com.
>
> --
> You received this message because you are subscribed to the Google Groups "GlueX Software Help" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to gluex-softwar...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/gluex-software/9254000D-14D3-46E1-8688-13CE8DFFFF23%40gmail.com.
>
> --
> You received this message because you are subscribed to the Google Groups "GlueX Software Help" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to gluex-softwar...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/gluex-software/9254000D-14D3-46E1-8688-13CE8DFFFF23%40gmail.com.
>
> Thank you,
> Mariana.
>
> --
> You received this message because you are subscribed to the Google Groups "GlueX Software Help" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to gluex-softwar...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/gluex-software/9254000D-14D3-46E1-8688-13CE8DFFFF23%40gmail.com.

Alexander Austregesilo

unread,
Feb 4, 2022, 3:20:07 PM2/4/22
to Mariana Khachatryan, GlueX Software Help Email List
Hi Mariana,

Sean beat me to the answer. I only want to add that there are no
CentOS7.7 nodes anymore. You should use "-os centos79" or "-os general".

Cheers,

Alex

On 2/4/2022 3:16 PM, Sean Dobbs wrote:
> Hi Mariana,
>
> For the arguments to -stdout and -stderr, you need to remove the "file:" prefix.
>
> ---Sean
>
> On Fri, Feb 4, 2022 at 3:14 PM Mariana Khachatryan <mari...@gmail.com> wrote:
>> Dear all,
>>
>> I have created a swif2 workflow by “swif2 create -workflow bootstrap_SPD" and am trying to submit jobs using my old scrip, after replacing
>> --project and --track in SWIF1 --add-job by --account and —partition,
>> but it gives the following error:
>>
>> farm1901.jlab.org> python script_Fit.py
>> /w/halld-scshelf2101/Mariana/PWA_challenge/Fit_real_data_5_phase1data_massconst_flatMCrandtrig/fit_data_allM_allepsilon_loop_moments_bootstrapping/EtaPi_fit/script
>> /work/halld/Mariana/PWA_challenge/Fit_real_data_5_phase1data_massconst_flatMCrandtrig/fit_data_allM_allepsilon_loop_moments_bootstrapping/EtaPi_fit/script/bin_0_0.csh
>> /work/halld/Mariana/PWA_challenge/Fit_real_data_5_phase1data_massconst_flatMCrandtrig/fit_data_allM_allepsilon_loop_moments_bootstrapping/EtaPi_fit/script/fit_bin_0_0.py
>> ERROR Bad Request: Standard output path must be absolute (i.e. start with /)
>>
>> I have attached here the submission script (script_Fit.py) I’m using and scripts fit_bin_0_0.py and bin_0_0.csh that ii is creating for each of the jobs (these are for first job).
>> All the paths are absolute, so I don’t know what is the problem.
>> Can you check to see if there is a problem there?
>> Also if you have a similar working example for swif2 can you please send it to me?
>>
>> --
>> You received this message because you are subscribed to the Google Groups "GlueX Software Help" group.
>> To unsubscribe from this group and stop receiving emails from it, send an email to gluex-softwar...@googlegroups.com.
>> To view this discussion on the web visit https://urldefense.proofpoint.com/v2/url?u=https-3A__groups.google.com_d_msgid_gluex-2Dsoftware_9254000D-2D14D3-2D46E1-2D8688-2D13CE8DFFFF23-2540gmail.com&d=DwIFaQ&c=CJqEzB1piLOyyvZjb8YUQw&r=9LGv0gfS3B8uAbsk8r_cEX_4GVRxd2wkj-RJy5MLidg&m=QpMM-XsOUaXqO3ZUwtKO7xZvlwRup89yUY89eqa4Qt4LZ5utDMGXt2tAGV1-3DVy&s=q8pPJUC4s4a6uVDyQJnmWZ3nr1bHdOkDjbnQyMXxKmk&e= .
>>
>> --
>> You received this message because you are subscribed to the Google Groups "GlueX Software Help" group.
>> To unsubscribe from this group and stop receiving emails from it, send an email to gluex-softwar...@googlegroups.com.
>> To view this discussion on the web visit https://urldefense.proofpoint.com/v2/url?u=https-3A__groups.google.com_d_msgid_gluex-2Dsoftware_9254000D-2D14D3-2D46E1-2D8688-2D13CE8DFFFF23-2540gmail.com&d=DwIFaQ&c=CJqEzB1piLOyyvZjb8YUQw&r=9LGv0gfS3B8uAbsk8r_cEX_4GVRxd2wkj-RJy5MLidg&m=QpMM-XsOUaXqO3ZUwtKO7xZvlwRup89yUY89eqa4Qt4LZ5utDMGXt2tAGV1-3DVy&s=q8pPJUC4s4a6uVDyQJnmWZ3nr1bHdOkDjbnQyMXxKmk&e= .
>>
>> --
>> You received this message because you are subscribed to the Google Groups "GlueX Software Help" group.
>> To unsubscribe from this group and stop receiving emails from it, send an email to gluex-softwar...@googlegroups.com.
>> To view this discussion on the web visit https://urldefense.proofpoint.com/v2/url?u=https-3A__groups.google.com_d_msgid_gluex-2Dsoftware_9254000D-2D14D3-2D46E1-2D8688-2D13CE8DFFFF23-2540gmail.com&d=DwIFaQ&c=CJqEzB1piLOyyvZjb8YUQw&r=9LGv0gfS3B8uAbsk8r_cEX_4GVRxd2wkj-RJy5MLidg&m=QpMM-XsOUaXqO3ZUwtKO7xZvlwRup89yUY89eqa4Qt4LZ5utDMGXt2tAGV1-3DVy&s=q8pPJUC4s4a6uVDyQJnmWZ3nr1bHdOkDjbnQyMXxKmk&e= .
>>
>> Thank you,
>> Mariana.
>>
>> --
>> You received this message because you are subscribed to the Google Groups "GlueX Software Help" group.
>> To unsubscribe from this group and stop receiving emails from it, send an email to gluex-softwar...@googlegroups.com.
>> To view this discussion on the web visit https://urldefense.proofpoint.com/v2/url?u=https-3A__groups.google.com_d_msgid_gluex-2Dsoftware_9254000D-2D14D3-2D46E1-2D8688-2D13CE8DFFFF23-2540gmail.com&d=DwIFaQ&c=CJqEzB1piLOyyvZjb8YUQw&r=9LGv0gfS3B8uAbsk8r_cEX_4GVRxd2wkj-RJy5MLidg&m=QpMM-XsOUaXqO3ZUwtKO7xZvlwRup89yUY89eqa4Qt4LZ5utDMGXt2tAGV1-3DVy&s=q8pPJUC4s4a6uVDyQJnmWZ3nr1bHdOkDjbnQyMXxKmk&e= .

--
Alexander Austregesilo

Staff Scientist - Experimental Nuclear Physics Hall D
Thomas Jefferson National Accelerator Facility
Newport News, VA
aaus...@jlab.org
(757) 269-6982 W
(757) 534-8367 C

Alexander Austregesilo

unread,
Feb 5, 2022, 11:09:53 AM2/5/22
to Mariana Khachatryan, GlueX Software Help Email List

Dear Mariana,

I have seen this error before, but it may not be related to the settings in your script.

For better performance, please try to write the log files (stdout, stderr) to /farm_out/<userName>. I also think that the default unit for disk space is MB, but you can specify '-disk 25GB'.

The official launch scripts were updated for swif2, but they are currently only providing the options to run hd_root or the DSelector:

https://github.com/JeffersonLab/hd_utilities/tree/master/launch_scripts/launch/launch.py

Cheers,

Alex


On 2/4/2022 7:04 PM, Mariana Khachatryan wrote:
It looks like there is still a problem.
All 12 jobs submitted to the created workflow have failed with the following problem type:
SITE_LAUNCH_FAIL Batch job could not be created.
Any idea what could be causing this? There are no error outputs that could point to the problem.




On Feb 4, 2022, at 4:04 PM, Mariana Khachatryan <mari...@jlab.org> wrote:

Thank you for suggestions, it works now.

Nathan Baltzell

unread,
Feb 5, 2022, 11:59:39 AM2/5/22
to Alexander Austregesilo, Mariana Khachatryan, GlueX Software Help Email List
The SWIF developer told me SITE_LAUNCH_FAIL, which didn't exist in SWIF1, means the "sbatch" command failed, so the job never even got submitted to SLURM.

I've seen it happen from system problems.  In those cases I knew the job was configured correctly and issuing a retry would succeed.

And I just now tested that some combination of invalid account/partition/constraint can also result in SITE_LAUNCH_FAIL.   In those cases using "sbatch" directly would've reported "invalid account/partition" or something like that.  So it seems SWIF2 isn't checking validity on those arguments.

-Nathan

Reply all
Reply to author
Forward
0 new messages