DIRACOS environment

64 views
Skip to the first unread message

Daniela Bauer

unread,
27 Sept 2019, 08:34:2027/09/2019
to diracgrid-forum
Hi *, but especially Chris,

We have been trying out DIRACOS in v7r0p1. So far it seems to work well with the inbuilt DIRAC commands (dirac-dms-* etc), but we are having issues with external commands.

For example our standard test job contains a perl script. With diracos enabled perl segfaults if the jobs runs on a CentOS7 node.
Sourcing the DIRAC environment on the UI (on CentOS7)  causes both emacs and even 'less' to malfunction :-S

We've played around with this, but it's not clear what th best way forward is. Most experiments seem to setup their own environment in detail and don't expect DIRAC to change this.

The simplest idea we have for fixing this (which is still a fairly major change) would be:

 - Modify bashrc so that it only adds the dirac-* commands to the path.
 - Create a second script "diracenv" that sets X509_USER_CERT, etc... and sources diracosrc (much like bashrc now)
   (This should _set_ the LD_LIBRARY_PATH, etc. not just add to it so the environment is completely consistent)
 - Modify the wrapper scripts for all dirac-* commands so that they source "diracenv" before starting python.
 - Modify the runsv startup scripts to source diracenv for the servers.

This way all of the dirac tools get a fixed environment and all of the user stuff gets the environment from the host (which is what the users expect).

We don't really want to go down the alternative path of having users compile their code within DIRACOS: The small experiments simply don't have the time to look at that, plus people running stuff locally won't want a DIRAC install just for the environment.

Regards,

Daniela & Simon

Marko Petric

unread,
27 Sept 2019, 15:49:2527/09/2019
to Daniela Bauer, diracgrid-forum
Dear Daniela,

> We have been trying out DIRACOS in v7r0p1. So far it seems to work well with the inbuilt DIRAC commands (dirac-dms-* etc), but we are having issues with external commands.
>
> For example our standard test job contains a perl script. With diracos enabled perl segfaults if the jobs runs on a CentOS7 node.
What do you mean by this? The payload of the job is a perl script that is executed on a node or you use perl to submit jobs and automate things?
If it is the first case, this is just a problem of payload isolation / job setup. For instance the iLCDirac jobs always see the default node env and not the env of externals+lcg / diracos.

> Sourcing the DIRAC environment on the UI (on CentOS7) causes both emacs and even 'less' to malfunction :-S
That certain system commands don’t work in diracos is known, however less works for me under CC7 (doesn’t work for me under diracos+Fedora).
However I would like to point out that this was never promised. The main aim was to provide a small as possible binary layer, to enable running of DIRAC commands and services on several platforms, not that native binaries from these platform will work with diracos, this opens up an endless support queue and would blow up the size of diracos. This feature would be a nice and a desirable side product, but is subordinate to cross platform support.

> We've played around with this, but it's not clear what th best way forward is. Most experiments seem to setup their own environment in detail and don't expect DIRAC to change this.
I can only comment here for iLCDirac, we always advised our users not to mix the “offline software”/experiment environment with the dirac environment. And we have been recommending this since long before DIRACOS.
The main reason behind this is that the python even in externals+lcg had different compile flags than the system python, or the python that comes with the offline software of the experiment (not to speak that these are all different versions). If an experiment uses python, it is impossible for dirac not to change this. If you add pip installed packages to this, you have a disaster.

> The simplest idea we have for fixing this (which is still a fairly major change) would be:
>
> - Modify bashrc so that it only adds the dirac-* commands to the path.
> - Create a second script "diracenv" that sets X509_USER_CERT, etc... and sources diracosrc (much like bashrc now)
> (This should _set_ the LD_LIBRARY_PATH, etc. not just add to it so the environment is completely consistent)
> - Modify the wrapper scripts for all dirac-* commands so that they source "diracenv" before starting python.
> - Modify the runsv startup scripts to source diracenv for the servers.
>
> This way all of the dirac tools get a fixed environment and all of the user stuff gets the environment from the host (which is what the users expect).
This is a viable solution that you could provide to your users. We recommend to our users to use one shell to prepare you binaries with the experiment software or do development stuff, open up a new shell source DIRAC/bashrc and submit jobs, and under no circumstances source the experiment software and dirac in the same shell.

> We don't really want to go down the alternative path of having users compile their code within DIRACOS: The small experiments simply don't have the time to look at that, plus people running stuff locally won't want a DIRAC install just for the environment.
This point I don’t understand, why would you compile inside DIRACOS? I completely agree you should not do this. This is of course is not desired nor intended.

Cheers,
Marko

>
> Regards,
>
> Daniela & Simon
>
> --
> You received this message because you are subscribed to the Google Groups "diracgrid-forum" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to diracgrid-for...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/diracgrid-forum/8c19b4ec-d11d-4730-b422-9d37e76a7949%40googlegroups.com.

Daniela Bauer

unread,
30 Sept 2019, 11:25:2130/09/2019
to Marko Petric, diracgrid-forum
Hi Marko,

thanks for taking time to answer.


If it is the first case, this is just a problem of payload isolation / job setup. For instance the iLCDirac jobs always see the default node env and not the env of externals+lcg / diracos.

But how exactly do you do this ?

> Sourcing the DIRAC environment on the UI (on CentOS7)  causes both emacs and even 'less' to malfunction :-S
That certain system commands don’t work in diracos is known, however less works for me under CC7 (doesn’t work for me under diracos+Fedora).
However I would like to point out that this was never promised.
Yes, but it's annoying as hell that when I want to change a jdl during debugging I have to switch tabs .... (And I'm not using vim, don't go there.)
 
> We've played around with this, but it's not clear what th best way forward is. Most experiments seem to setup their own environment in detail and don't expect DIRAC to change this.
I can only comment here for iLCDirac, we always advised our users not to mix the “offline software”/experiment environment with the dirac environment. And we have been recommending this since long before DIRACOS.
We recommend the same thing...

> The simplest idea we have for fixing this (which is still a fairly major change) would be:
>
>  - Modify bashrc so that it only adds the dirac-* commands to the path.
>  - Create a second script "diracenv" that sets X509_USER_CERT, etc... and sources diracosrc (much like bashrc now)
>    (This should _set_ the LD_LIBRARY_PATH, etc. not just add to it so the environment is completely consistent)
>  - Modify the wrapper scripts for all dirac-* commands so that they source "diracenv" before starting python.
>  - Modify the runsv startup scripts to source diracenv for the servers.
>
> This way all of the dirac tools get a fixed environment and all of the user stuff gets the environment from the host (which is what the users expect).
This is a viable solution that you could provide to your users. We recommend to our users to use one shell to prepare you binaries with the experiment software or do development stuff, open up a new shell source DIRAC/bashrc and submit jobs, and under no circumstances source the experiment software and dirac in the same shell.

Why just our users ? I assume no-one out there starts out with wanting the DIRAC environment showing up in their job to start with.
 
Cheers,
Daniela

--
Sent from the pit of despair

-----------------------------------------------------------
daniel...@imperial.ac.uk
HEP Group/Physics Dep
Imperial College
London, SW7 2BW
Tel: +44-(0)20-75947810
http://www.hep.ph.ic.ac.uk/~dbauer/

Marko Petric

unread,
30 Sept 2019, 16:40:3830/09/2019
to diracgrid-forum
Hi Daniela,

But how exactly do you do this ?
This is a generic application that a user can run any payload, I hope this is as close as possible to your use case.

The final execution of the user code is wrapped in the DIRAC function shellCall, which should open a subprocess in the default shell (or you can even specify the environment you want to have).
 
Yes, but it's annoying as hell that when I want to change a jdl during debugging I have to switch tabs .... (And I'm not using vim, don't go there.)
I completely share your viewpoint that this is annoying and would also like to see emacs work, but I don't think there is a good solution to this, without blowing up the bundle. 
 
Why just our users ? I assume no-one out there starts out with wanting the DIRAC environment showing up in their job to start with.
I misunderstood the situation, but probably I still don't understand how your users submit jobs and I agree that the DIRAC env should not pop up in the user payload but I hope that with shellCall this should be solved.

Cheers,
Marko

Igor Pelevanyuk

unread,
3 Oct 2019, 11:05:3903/10/2019
to diracgrid-forum
Hello Daniela and Marko, 

In our installation, the same questions arise. It is not strictly related to DIRACOS, but more to environments. So, I will ask it here in this topic. 
The standard workflow of a job is the following:
1. Download input from SE.
2. Configure env to allow software use.(config script is provided by physicists)
3. Perform calculations.
4. Upload result to SE.

The Download input step requires dirac env.
Configure breaks dirac env and create software env.
Calculations require software env.
Upload requires dirac env again.

So, now we keep some env variables values and switch them depending on the step. It looks quite messy. 
Are there any DIRAC recommendations for that? How to organize user jobs in the best way, cause right not it looks more like scaffoldings. 

Cheers,
Igor

Marko Petric

unread,
4 Oct 2019, 07:05:2404/10/2019
to diracgrid-forum
Hi Igor,
I would say this is the same use case, please look at the same workflow module

Here you can see how a payload is executed via the usage of the DIRAC command shellCall, which opens a blank shell in a subprocess to execute the paylod.
I think it's not advisable that you modify the env in which DIRAC is running to execute the payload.

1. Download input from SE. -> dirac command
2. Configure env to allow software use.(config script is provided by physicists) -> automatically construct an init env bash script
3. Perform calculations.  -> execute this script via shellCall
4. Upload result to SE. -> dirac command

Cheers,
Marko

Nikolay Kutovskiy

unread,
16 Oct 2019, 02:36:1216/10/2019
to diracgrid-forum
On Friday, 4 October 2019 14:05:24 UTC+3, Marko Petric wrote:
Hi Igor,
I would say this is the same use case, please look at the same workflow module

Here you can see how a payload is executed via the usage of the DIRAC command shellCall, which opens a blank shell in a subprocess to execute the paylod.
I think it's not advisable that you modify the env in which DIRAC is running to execute the payload.

1. Download input from SE. -> dirac command
2. Configure env to allow software use.(config script is provided by physicists) -> automatically construct an init env bash script
3. Perform calculations.  -> execute this script via shellCall
4. Upload result to SE. -> dirac command

Hi Marko,

is there any way in such approach to install required python module (e.g. numpy)?

Best regards,
Nikolay.

André Sailer

unread,
16 Oct 2019, 03:35:4716/10/2019
to Nikolay Kutovskiy, diracgrid-forum
Hi Nikolay,

We provide all software via CVMFS, that includes a python installation
to which we can install any package we need.

Cheers,
Andre

Federico Stagni

unread,
16 Oct 2019, 06:47:1416/10/2019
to André Sailer, Nikolay Kutovskiy, diracgrid-forum
Just to be clear: it's not up to DIRAC to provide the software (in isolation) for the user jobs. Andre' above is referring to CLIC (or ILC) case. In LHCb, the applications' environment is created with, basically, the "env" command, and LHCb, as well as CLIC and many other users use CVMFS for distributing the needed software.

--
You received this message because you are subscribed to the Google Groups "diracgrid-forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to diracgrid-for...@googlegroups.com.

Marko Petric

unread,
16 Oct 2019, 06:54:2916/10/2019
to diracgrid-forum
Dear Nikolay,
can you describe the use case a bit better, it's hard to help otherwise.

What are you doing now?
What is the problem you are facing with your current approach?
What would you like to achieve as the final goal?

Cheers,
Marko

Nikolay Kutovskiy

unread,
17 Oct 2019, 05:50:2117/10/2019
to diracgrid-forum
Dear Marco,


On Wednesday, 16 October 2019 13:54:29 UTC+3, Marko Petric wrote:
Dear Nikolay,
can you describe the use case a bit better, it's hard to help otherwise.
as I wrote in another thread the initial problem with user analysis jobs was a necessity to have certain version of python (e.g. 3.6 and not 2.7) and to have some python modules (e.g. numpy) installed on WNs.
- show quoted text -
 

What are you doing now?
User decided to re-write his python3.6 code on python2.7 and to install missing python  modules using pip provided by DIRAC (e.g. /scratch/plt00/Linux_x86_64_glibc-2.12/bin/pip) on WNs.
 
What is the problem you are facing with your current approach?
particular in the case I've just described above the user would like to avoid a necessity to re-write his code on python2.7 and have a chance to run his code on worker node (i.e. ship somehow a required python module(s)).

What would you like to achieve as the final goal?
users should be able to use their code as it is and have a possibility to install missing modules in a proper way (as far as I understand python module installation using pip shipping from DIRAC is not a proper and recommended way).

Daniela Bauer

unread,
17 Oct 2019, 06:11:0617/10/2019
to Nikolay Kutovskiy, diracgrid-forum
Hi Nikolay,

pip is certainly an option.
But wouldn't it also be possible for the user to tar up all the library they need to run on vanilla CentOS7, put them on cvmfs (untarred, but making a working tar ball usually ensures the stuff actually works),
and then use the code from there ? Possibly with some clean up of the environment, but that's where I get lost and someone else will have to help.
Regards,
Daniela

--
You received this message because you are subscribed to the Google Groups "diracgrid-forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to diracgrid-for...@googlegroups.com.

Marko Petric

unread,
18 Oct 2019, 05:31:4718/10/2019
to diracgrid-forum
Dear Nikolay,
I understand the problem, but just for completeness could you please elaborate also on how do your users submit jobs.
Imagine I am your user I have a shell script that I want to send to the grid, what does your user do in this case? What dirac workflow/commands do they call?
Cheers,
Marko

Nikolay Kutovskiy

unread,
18 Oct 2019, 08:43:0518/10/2019
to diracgrid-forum
Hi Daniela,


On Thursday, 17 October 2019 13:11:06 UTC+3, Daniela Bauer wrote:
Hi Nikolay,

pip is certainly an option.
But wouldn't it also be possible for the user to tar up all the library they need to run on vanilla CentOS7, put them on cvmfs (untarred, but making a working tar ball usually ensures the stuff actually works),
As far as I know only a limited amount of VO users have write access to VO CVMFS repo. It seems it's a wrong way to put any piece of software needed by users on VO CVMFS repository. So as far as I understand from the discussion above there are few options:
1) pack all needed software/modules in tarball and bring it to WN if the size of such bundle is relatively small;
2) in case of python one can try to insеall missing python modules using pip shipping with DIRAC;
In the future hopefully one can be an option to pack all needed software in a container (e.g. singularity).
 
To unsubscribe from this group and stop receiving emails from it, send an email to diracgr...@googlegroups.com.

Nikolay Kutovskiy

unread,
18 Oct 2019, 08:54:2718/10/2019
to diracgrid-forum



On Friday, 18 October 2019 12:31:47 UTC+3, Marko Petric wrote:
Dear Nikolay,
I understand the problem, but just for completeness could you please elaborate also on how do your users submit jobs.
Imagine I am your user I have a shell script that I want to send to the grid, what does your user do in this case? What dirac workflow/commands do they call?
Dear Marco,
 
right now the way the user is following is running a shell script with a set of checks and preparations (e.g. install numpy module, getting a list of LFNs with dirac-dms-find-lfns command and downloading them on WN with dirac-dms-get-file command) and his python script is invoked at the end of that shell script.

Marko Petric

unread,
22 Oct 2019, 08:05:5622/10/2019
to diracgrid-forum
Dear Nikolay,
Do I understand correctly that in your case the payload (you shell script) that is executed on a grid node downloads the input data and not DIRAC?
To my understanding of DIRAC this is not the DIRAC way, or how things are designed or meant to be used. I am referring here to the fact that the payload interacts/calls DIRAC commands.

The proper way, as I understand DIRAC is:
1. Use the DIRAC JobWrapper to do download the inputSandbox or to do resolution of the input data
2. Once the files are present on the node, the payload is executed in a vanilla environment without the any knowledge of DIRAC
3. The payload produces output files and finished execution
4. The JobWrapper check if the payload finished successfully and if the desired output is present and uploads the resulting files

Cheers,
Marko

Nikolay Kutovskiy

unread,
24 Oct 2019, 09:48:5424/10/2019
to diracgrid-forum
On Tuesday, 22 October 2019 15:05:56 UTC+3, Marko Petric wrote:
Dear Nikolay,
Dear Marco,

thank you (and all others who replied) for your time and efforts in helping me to understand how DIRAC works (or at least should work) as well as for sharing your understanding, for giving me advises and recommendations on how to debug or use DIRAC properly. I really appreciate that!

Please, see my comments inline.
 
Do I understand correctly that in your case the payload (you shell script) that is executed on a grid node downloads the input data and not DIRAC?
yes, that's correct since we couldn't find a way to specify properly a DFC path in InputData JDL-attribute. The one like below didn't work:
InputData = {"LFN:/somevo/somedir1/anotherdir/onemoredir/dataX/ABC*.dat"};

The job submitted with JDL containing such InputData attribute fails with the error:

 dlogging -d <jobid>
Status    MinorStatus          ApplicationStatus         Time                 Source
==========================================================================================
Received  Job accepted         Unknown                   2019-10-24 13:07:10  JobManager
Checking  JobSanity            Unknown                   2019-10-24 13:07:10  JobPath
Checking  InputData            Unknown                   2019-10-24 13:07:10  JobSanity
Failed    InputData optimizer  Input data not available  2019-10-24 13:07:10  InputData

So in order to download a required set of input files we (me and my user) didn't find a way better then to use dirac-dms-find-lfns and dirac-dms-get-file DIRAC commands in our wrapper-shell script.


To my understanding of DIRAC this is not the DIRAC way, or how things are designed or meant to be used. I am referring here to the fact that the payload interacts/calls DIRAC commands.

The proper way, as I understand DIRAC is:
1. Use the DIRAC JobWrapper to do download the inputSandbox or to do resolution of the input data
2. Once the files are present on the node, the payload is executed in a vanilla environment without the any knowledge of DIRAC
3. The payload produces output files and finished execution
4. The JobWrapper check if the payload finished successfully and if the desired output is present and uploads the resulting files
I completely share your understanding of DIRAC (i.e. put on it as much operations as it is capable to perform including a downloading input data) but as I wrote above we didn't find a way to define properly a required set of input files.

Best regards,
Nikolay

Marko Petric

unread,
31 Oct 2019, 09:06:3631/10/2019
to diracgrid-forum
Dear Nikolay,
Do I understand correctly that in your case the payload (you shell script) that is executed on a grid node downloads the input data and not DIRAC?
yes, that's correct since we couldn't find a way to specify properly a DFC path in InputData JDL-attribute. The one like below didn't work:
InputData = {"LFN:/somevo/somedir1/anotherdir/onemoredir/dataX/ABC*.dat"};
Yes, wild cards in InputData do not work. But I would suggest that you rather resolve you input data (wild card) on the client side, then you can even do job spiting, so that you get one input file per job (if this is the use-case). What hinders you in the job submission step to call dirac-dms-find-lfns and put the desired result in InputData?

I completely share your understanding of DIRAC (i.e. put on it as much operations as it is capable to perform including a downloading input data) but as I wrote above we didn't find a way to define properly a required set of input files.

To conclude on this, we had a longer discussion on the last developers meeting, and the majority opinion is that people should not call dirac commands from the payload, but rather execute the payload in a vanilla environment and use the JobWrapper for input output data. Nevertheless dirac offers the possibility to call dirac commands in the payload, and we will not actively disable this option, but if the VO, user, operators decide to follow such an approach, it is their responsibility to ensure the environment separation for calling dirac command and payload execution.

Cheers,
Marko

Nikolay Kutovskiy

unread,
6 Nov 2019, 21:12:0006/11/2019
to diracgr...@googlegroups.com
Dear Marco,

thank you again for taking time to share your thoughts and suggestions!

Best regards,
Nikolay.
> --
> You received this message because you are subscribed to the Google Groups "diracgrid-forum" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> diracgrid-for...@googlegroups.com <mailto:diracgrid-for...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/diracgrid-forum/eb199275-abde-46a0-bad7-0b7901eb591f%40googlegroups.com
> <https://groups.google.com/d/msgid/diracgrid-forum/eb199275-abde-46a0-bad7-0b7901eb591f%40googlegroups.com?utm_medium=email&utm_source=footer>.
Reply all
Reply to author
Forward
0 new messages