Error initializing jobs on lxplus

38 views
Skip to first unread message

Al Kas

unread,
Dec 4, 2016, 7:35:22 PM12/4/16
to grid-control
Hello

I am trying to run from lxplus - my .conf looks like this [1] - However, I get this error

*********************************************************

Revision: 469:1395 - stable
Mapping between nickname and other settings:

 Nickname |                Config file
==========+===========================================
   None   | TreeProducerFromMiniAOD_8020x_Data25ns.py

Current task ID: GC1360f62cfa59
Task started on 2016-12-05
GCError: Error while creating instance of type CMSSW_Advanced (grid_control_cms.cmssw_advanced.CMSSW_Advanced)
KeyError: X509_USER_PROXY


Any hint ?

Thanks in advance

Alexis

===============================


[1]
[global]
module  = CMSSW_Advanced
; change to your working directory

workdir = /afs/cern.ch/work/a/alkaloge/Stau/work.gc_Data_MiniAODv2_80x_grid/
backend = local

[storage]
se input timeout  = 6:00
se output timeout = 10:00
se output files   = output_DATA.root
se output pattern = @NICK@/@NICK@_@MY_JOBID@.root
se path           =
;  change to the directory where plain ntuples will be stored

         srm://dcache-se-cms.desy.de:8443/srm/managerv1?SFN=/pnfs/desy.de/cms/tier2/store/alkaloge/8020_23SeptReReco_v2/

[local]
submit options =
        site => hh
        os => sld6
;       h_rt => 167:59:00
        h_rt => 5:59:00
        h_vmem => 4000M
wms = LSF
proxy = VomsProxy

[jobs]
;wall time = 167:59:00
wall time = 6:59:00
in flight =  20000
monitor = dashboard
shuffle = true
;queue timeout = 119:59:00
queue timeout = 5:59:00
memory = 4000
dataset provider = DBS3Provider

[dataset]
resync interactive = False
dataset provider  = DBS3Provider

[grid]
sites      = -samtests -group_admin -monitor -lcgadmin -cern -roma1.infn.it
dataset provider  = DBS3Provider

[glite-wms]
config        = docs/glite_wms_CERN.conf
use delegate  = False
dataset provider  = DBS3Provider

[CMSSW_Advanced]
depends=gLite
dataset provider  = DBS3Provider
dataset splitter = EventBoundarySplitter
;HybridSplitter


project area = ../../../..
se runtime         = True
events per job     = 100000

Fred Stober

unread,
Dec 5, 2016, 5:20:55 AM12/5/16
to grid-control
Hi,

If you use a more recent version of grid-control, it will tell you that it was unable to query DBS, because you did not initialize a grid proxy (the CMS dataset services require a grid proxy since some time).
If you want to use LSF, you also need to specify the submission queue (1nd in my example), since LSF does not specify a default queue and can't match the job requirements to a queue by itself. You can find the list of all queues with "bqueues".

The "submit options" you specify are old ones for the NAF, which aren't even needed anymore.
"dataset provider  = DBS3Provider" is redundant, since it is the default, and the one in the [grid] and [jobs] sections will never be queried - only the one in [CMSSW_Advanced] and [dataset] is used.

This would be as slimmer version for the config file:

[global]
module  = CMSSW_Advanced
; change to your working directory

workdir = /afs/cern.ch/work/a/alkaloge/Stau/work.gc_Data_MiniAODv2_80x_grid/
backend = LSF

[lsf]
queue = 1nd

[storage]
se input timeout  = 6:00
se output timeout = 10:00
se output files   = output_DATA.root
se output pattern = @NICK@/@NICK@_@MY_JOBID@.root
se path           =
;  change to the directory where plain ntuples will be stored

         srm://dcache-se-cms.desy.de:8443/srm/managerv1?SFN=/pnfs/desy.de/cms/tier2/store/alkaloge/8020_23SeptReReco_v2/

[local]
proxy = VomsProxy

[jobs]
;wall time = 167:59:00
wall time = 6:59:00
in flight =  20000
monitor = dashboard
shuffle = true
;queue timeout = 119:59:00
queue timeout = 5:59:00
memory = 4000

[dataset]
resync interactive = False


[CMSSW_Advanced]
depends=gLite
dataset provider  = DBS3Provider
dataset splitter = EventBoundarySplitter ;HybridSplitter
project area = ../../../..
se runtime         = True
events per job     = 100000

Cheers,Fred

Al Kas

unread,
Dec 5, 2016, 9:37:45 AM12/5/16
to grid-control
Hi Fred

Ok, I am trying what you re saying, but I still get the same error wrt to my proxy (which I ve refreshed anyway)

Current task ID: GC9cc144b22539

Task started on 2016-12-05
GCError: Error while creating instance of type CMSSW_Advanced (grid_control_cms.cmssw_advanced.CMSSW_Advanced)
KeyError: X509_USER_PROXY

Btw, I am using version 469:1395 with python2.7

Thanks

Fred Stober

unread,
Dec 5, 2016, 2:16:31 PM12/5/16
to grid-control
Hi,

before starting grid-control - can you check that the environmen variable X509_USER_PROXY was set to a vaild path after the grid-proxy was created?

Cheers,
Fred

Al Kas

unread,
Dec 5, 2016, 2:44:09 PM12/5/16
to grid-control
Hi Fred

Thanks, that did the trick - However, I am still getting errors related to the CMSSW runtime, ie I get this


Copy CMSSW runtime to SE 1 failed
url_copy_single_force file:////afs/cern.ch/work/a/alkaloge/Stau/work.gc_Data_MiniAODv2_80x_grid/runtime.tar.gz srm://dcache-se-cms.desy.de:8443/srm/managerv1?SFN=/pnfs/desy.de/cms/tier2/store/alkaloge/8020_23SeptReReco_v2/GC1eb4c0331e5d.tar.gz
 returned 101
===
 url_exists srm://dcache-se-cms.desy.de:8443/srm/managerv1?SFN=/pnfs/desy.de/cms/tier2/store/alkaloge/8020_23SeptReReco_v2/GC1eb4c0331e5d.tar.gz
  returned 101
===
Checking for binary lcg-ls -b -T srmv2 ... FAILED
Checking for file lcg-ls ... FAILED

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!! FAIL - FAIL - FAIL - FAIL !!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Error code 101

lcg-ls missing
12751:shell:0 gc-run.lib:print_and_eval:340 gc-run.lib:url_copy_single_force:548 gc-run.lib:print_and_eval:340 gc-run.lib:url_exists:403 gc-run.lib:print_and_eval:334 gc-run.lib:checkbin:127 gc-run.lib:checkfile:103 gc-run.lib:fail:44 gc-run.lib:debug_helper:37 - fail(101)
12751:shell:0 gc-run.lib:print_and_eval:340 gc-run.lib:url_copy_single_force:548 gc-run.lib:print_and_eval:340 gc-run.lib:url_exists:403 gc-run.lib:print_and_eval:334 gc-run.lib:checkbin:127 gc-run.lib:checkfile:103 gc-run.lib:fail:52 gc-run.lib:updatejobinfo:88 gc-run.lib:debug_helper:37 - updatejobinfo(101)
===
 url_copy_single file:////afs/cern.ch/work/a/alkaloge/Stau/work.gc_Data_MiniAODv2_80x_grid/runtime.tar.gz srm://dcache-se-cms.desy.de:8443/srm/managerv1?SFN=/pnfs/desy.de/cms/tier2/store/alkaloge/8020_23SeptReReco_v2/GC1eb4c0331e5d.tar.gz
  returned 101
===
  url_copy_single_srm file:////afs/cern.ch/work/a/alkaloge/Stau/work.gc_Data_MiniAODv2_80x_grid/runtime.tar.gz srm://dcache-se-cms.desy.de:8443/srm/managerv1?SFN=/pnfs/desy.de/cms/tier2/store/alkaloge/8020_23SeptReReco_v2/GC1eb4c0331e5d.tar.gz
   returned 101
===
Checking for binary lcg-cp ... FAILED
Checking for file lcg-cp ... FAILED

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!! FAIL - FAIL - FAIL - FAIL !!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Error code 101

lcg-cp missing
12751:shell:0 gc-run.lib:print_and_eval:340 gc-run.lib:url_copy_single_force:549 gc-run.lib:print_and_eval:340 gc-run.lib:url_copy_single:534 gc-run.lib:print_and_eval:340 gc-run.lib:url_copy_single_srm:472 gc-run.lib:print_and_eval:334 gc-run.lib:checkbin:127 gc-run.lib:checkfile:103 gc-run.lib:fail:44 gc-run.lib:debug_helper:37 - fail(101)
12751:shell:0 gc-run.lib:print_and_eval:340 gc-run.lib:url_copy_single_force:549 gc-run.lib:print_and_eval:340 gc-run.lib:url_copy_single:534 gc-run.lib:print_and_eval:340 gc-run.lib:url_copy_single_srm:472 gc-run.lib:print_and_eval:334 gc-run.lib:checkbin:127 gc-run.lib:checkfile:103 gc-run.lib:fail:52 gc-run.lib:updatejobinfo:88 gc-run.lib:debug_helper:37 - updatejobinfo(101)
===
===
===


Unable to copy CMSSW runtime! You can try to copy it manually.
Is CMSSW runtime (/afs/cern.ch/work/a/alkaloge/Stau/work.gc_Data_MiniAODv2_80x_grid/runtime.tar.gz) available on SE srm://dcache-se-cms.desy.de:8443/srm/managerv1?SFN=/pnfs/desy.de/cms/tier2/store/alkaloge/8020_23SeptReReco_v2/? [no]:


Any ideas ?

Thanks again
Reply all
Reply to author
Forward
0 new messages