file permissions for Cylc-generated data files

96 views
Skip to first unread message

Jin Lee

unread,
Nov 28, 2018, 6:35:49 PM11/28/18
to cylc
Hello,

At our site (NCI, Australia) files which are generated by a Cylc suite seem to have only read-permission for the owner. Files that are created by the shell have the usual file permissions which are set in users' startup scripts: e.g. umask u=rwx,g=rx,o=rx. Before I start investigating can someone able to tell me whether the permissions given to Cylc-generated files are controlled by Cylc? If Cylc has nothing to do with file permissions then I will look elsewhere.

Thank you.




Regards,

Jin

Scott Wales

unread,
Nov 28, 2018, 6:42:15 PM11/28/18
to cylc

Hi Jin,


Permissions on NCI jobs are set by a PBS flag '-W umask', which defaults to 0088 (only the owning user can read/write). 'man qsub' has more details


Cheers, Scott


Scott Wales | Computational Modelling Systems Specialist
ARC Centre of Excellence for Climate Extremes
School of Earth Sciences
The University of Melbourne
www.climateextremes.org.au

1519686280519_clexlogo.png

From: cy...@googlegroups.com <cy...@googlegroups.com> on behalf of Jin Lee <jint...@gmail.com>
Sent: Thursday, 29 November 2018 10:35:49 AM
To: cylc
Subject: [cylc-dev] file permissions for Cylc-generated data files
 
--

---
You received this message because you are subscribed to the Google Groups "cylc" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cylc+uns...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Tom Coleman

unread,
Nov 28, 2018, 7:06:58 PM11/28/18
to cy...@googlegroups.com
Hi Jin,

I would tend to recommend people add in to their "[[[directives]]]" section "-W umask=0022" (or whatever you need, 0022 suits my needs usually).

Regards,
Tom

Jin Lee

unread,
Nov 29, 2018, 4:40:55 PM11/29/18
to cylc
Hi Tom and Scott,

Thanks for your reply.

Does the PBS flag '-W umask=NNNN' control the permission levels of data files that Cylc produces: e.g. permission levels on the directory, $CYLC_SUITE_SHARE_DIR and its subdirectories?



Jin

Hilary Oliver

unread,
Dec 1, 2018, 4:47:36 PM12/1/18
to cy...@googlegroups.com
Hi Jin,

I believe that PBS directive only affects the .out and .err files created for the job by PBS (Cylc tells PBS their locations via other directives in the job script). The permissions on other files and directories that are created by the Cylc suite server program should be determined by the umask in your user environment on the suite host.

Hilary


--

Jin Lee

unread,
Dec 5, 2018, 1:53:51 AM12/5/18
to cylc
Hi Hilary,

Thank you for that information. Yes, I was more interested in the data files that Cylc creates. In our setup each job that is submitted to the job queuing system (PBS in our case) executes the user's startup script. So I put in a umask command in my startup script and this seems to give files and subdirectories under $CYLC_SUITE_SHARE_DIR world-readable permission. This solution probably is not ideal. Our group might have to revisit this later.



Jin

Hilary Oliver

unread,
Dec 8, 2018, 6:59:17 AM12/8/18
to cy...@googlegroups.com
Hi again Jin,

It's not quite clear (to me, anyway) if you still think this is a Cylc issue, or not. To be a bit more explicit: job.out and job.err file permissions are determined by PBS (I'm not sure why PBS doesn't respect the users's umask) and can be modified with the "-W umask = ..." directive.   But  Cylc does not manipulate the permissions of its server-generated (as opposed to job-generated) log files at all - they should be determined by your umask.

Hilary


--

Jin Lee

unread,
Dec 17, 2018, 5:54:01 PM12/17/18
to cylc
Hi Hilary,

On our HPC I am experiencing 3 problems:

(1) job.out and job.err files generated on remote task host intermittently have user-only-read file permissions
(2) files and directories other than job.out and job.err on remote task host seem to have user-only-read file permissions
(3) job.out and job.err files that are retrieved and brought back to suite host have user-only-read file permissions; this seems to happen intermittently

(2) is fixed by using umask in my startup script of the remote task host (our installation of PBS seems to execute user startup script for every PBS job). I decided to live with (1) but (3) is still a problem.

My current Cylc site/user configuration has the following,

> cylc get-site-config
    [[raijin.*]]
        task communication method = ssh
        ...
        retrieve job logs max size = 10M
        retrieve job logs command = rsync -v -rltgoD --chmod=Du=rwx,Dgo=rx,Fu=rw,Fgo=r
        retrieve job logs = True
        use login shell = False
        ...
        copyable environment variables = 
        ...
        task event handler retry delays = 
        global init-script = 
        retrieve job logs retry delays = PT10S, PT30S, PT1M, PT3M

If I understand it correctly, log output files are retrieved using rsync and the permissions on those files should be world-readable. This is the case but intermittently the retrieved job.out and job.err files have restricted file permissions. If you can suggest a possible solution I would appreciate it very much.

One question on Cylc user config file (~/.cylc/global.rc): does a Cylc suite read this file once at the start? Or does it read it every time a task is submitted? I have been fiddling with this user config file while there are running suites and I'm not certain whether the settings are effective or not.

Thank you.

Jin

Tom Coleman

unread,
Dec 17, 2018, 6:02:17 PM12/17/18
to cy...@googlegroups.com
Have you put in the directives "-W umask=0022"? Without it, yes the log files will not be world readable, and thus, the files brought back will not be either.

--

Jin Lee

unread,
Dec 17, 2018, 7:04:53 PM12/17/18
to cylc
Hi Tom,

Yes, I tested the PBS directive, "-W umask=0022" but the retrieved job.out and job.err files on the suite host still had restrictive file permissions. But regardless of whether this directive is used or not the following line in my Cylc site/user configuration,

retrieve job logs command = rsync -v -rltgoD --chmod=Du=rwx,Dgo=rx,Fu=rw,Fgo=r


should force retrieved job.out and job.err files to have world-read permissions. Or am I misunderstanding something?


Jin

Bruno P. Kinoshita

unread,
Dec 22, 2018, 12:41:57 AM12/22/18
to cy...@googlegroups.com
Hi Jin,

I took some interest on this issue as I had never used Cylc and PBS together. I managed to reproduce the issue, but not sure if my configurations are the same as yours [1].

I used Docker to set up two containers, and user Docker shared volume to mimic NFS. Basically, one container `cylc` is running my suite (below), and another container `pbs` has PBS Torque running.

```
# File: suite.rc
[cylc]
[[reference test]]
required run mode = live
live mode suite timeout = PT5M
[scheduling]
[[dependencies]]
graph = a:start => b
[runtime]
[[a]]
script = sleep 2
[[[remote]]]
host=pbs
retrieve job logs = True
[[[job]]]
batch system = pbs
[[[directives]]]
-W umask=0077

[[b]]
script = cylc poll "$CYLC_SUITE_NAME" 'a'

# from Cylc 7.8.0 code, cylc/tests/cylc-poll/07-pbs


```

And my `global.rc`:

```
[hosts]
[[pbs]]
retrieve job logs command = rsync -v -rltgoD --chmod=Du=rwx,Dgo=rx,Fu=rw,Fgo=r


```

(I copied your command with an extra space, and realized Cylc was not complaining about rsync exiting with 1 status code - https://github.com/cylc/cylc/pull/2911)

The output logs in my PBS container were created in the normal location (i.e. /var/spool/torque), but as Cylc appends -o -e directives in the script submitted to PBS, and these values are pointing to `cylc-run` directory, **my logs were generated in the `cylc-run` directory**.

It is important the last part of the sentence above, as the `cylc-run` directory is shared. Which means that I already had the out and err logs of my PBS job. Then, I hacked task_events_mgr.py to print the command executed for log retrieval, and got:

rsync -v -rltgoD --chmod=Du=rwx,Dgo=rx,Fu=rw,Fgo=r '--rsh=ssh -oBatchMode=yes -oConnectTimeout=10' -v --include=/1 --include=/1/a --include=/1/a/01 '--include=/1/a/01/**' '--exclude=/**' 'pbs:$HOME/cylc-run/pbs1/log/job/' /home/testuser/cylc-run/pbs1/log/job/


You probably recognised the first part of the command. The rest is created by Cylc. Including the two directories (source / target), which are the same, as they are shared through Docker volume.

If you are sharing `cylc-run` through NFS or another distributed file system, then you could have a similar situation.

It looks [1, just conclusion] like rsync --chmod works for newly created files. But if you have existing files, that are up to date, then rsync won't update the permissions with chmod. I had a quick look at the source of rsync, but couldn't pinpoint where chmod was exactly used (chmod.c contains the logic for parsing the command FWIW). So will post a message to the rsync mailing list to confirm if that's the case.

So just in case you have `cylc-run` shared, you **may** have the same issue. But it was fun trying to reproduce it locally.

Hope that helps, and happy holidays!
Bruno

[1] https://kinoshita.eti.br/2018/12/22/running-cylc-tasks-on-pbs-torque-with-docker/


________________________________
From: Jin Lee <jint...@gmail.com>
To: cylc <cy...@googlegroups.com>
Sent: Tuesday, 18 December 2018 1:04 PM
Subject: [cylc-dev] Re: file permissions for Cylc-generated data files

Jin Lee

unread,
Jan 15, 2019, 9:05:33 PM1/15/19
to cylc
Hello Bruno,

Sorry for this late reply, I had a break from work and didn't have chance to read your post.

On our HPC localhost and remotehost don't share 'cylc-run'. Thank you for looking into this.


Regards,

Jin

Bruno P. Kinoshita

unread,
Jan 16, 2019, 9:28:01 PM1/16/19
to cy...@googlegroups.com
Hi Jin,

Not a problem. I quickly removed the shared volume from the Docker containers [1], so now my container "cylc" has its cylc-run folder, and the "pbs" has nothing.

Executed the same suite, with the same command to retrieve logs. The cylc-run directory was created on the "pbs" container/node, and the logs were copied successfully.

Furthermore, it looks like the permissions were successfully updated (screenshot).


This test was done with Cylc 7.8.0. Then I tried the version on master, which has a bit more of logging. Executed the same suite, but changed the permissions for "rwx", and added `--verbose --debug`. The final logs on my "cylc" container/node were correctly copied, with the right new permission too. Here's the logs that I got with the latest release.


2019-01-17T02:19:02Z DEBUG - [TaskJobLogsRetrieveContext(key='job-logs-retrieve', ctx_type='job-logs-retrieve', user_at_host='pbs', max_size=None) cmd] rsync -v -rltgoD --chmod=Du=rwx,Dgo=rx,Fu=rwx,Fgo=rwx '--rsh=ssh -oBatchMode=yes -oConnectTimeout=10' -v --include=/1 --include=/1/a --include=/1/a/01 '--include=/1/a/01/**' '--exclude=/**' 'pbs:$HOME/cylc-run/pbs1/log/job/' /home/testuser/cylc-run/pbs1/log/job/
[TaskJobLogsRetrieveContext(key='job-logs-retrieve', ctx_type='job-logs-retrieve', user_at_host='pbs', max_size=None) ret_code] 0
[TaskJobLogsRetrieveContext(key='job-logs-retrieve', ctx_type='job-logs-retrieve', user_at_host='pbs', max_size=None) out]
opening connection using: ssh -oBatchMode=yes -oConnectTimeout=10 pbs rsync --server --sender -vvlogDtre.iLsfx . "$HOME/cylc-run/pbs1/log/job/" (10 args)
receiving incremental file list
[sender] showing directory 1 because of pattern /1
delta-transmission enabled
[sender] showing directory 1/a because of pattern /1/a
[sender] showing directory 1/a/01 because of pattern /1/a/01
[sender] hiding file 1/a/NN because of pattern /**
[sender] showing file 1/a/01/job.xtrace because of pattern /1/a/01/**
[sender] showing file 1/a/01/job.err because of pattern /1/a/01/**
[sender] showing file 1/a/01/job because of pattern /1/a/01/**
[sender] showing file 1/a/01/job.out because of pattern /1/a/01/**
[sender] showing file 1/a/01/job.status because of pattern /1/a/01/**
1/a/01/job.err is uptodate
1/a/01/job.out is uptodate
1/a/01/
1/a/01/job
1/a/01/job.status
1/a/01/job.xtrace
total: matches=2 hash_hits=2 false_alarms=0 data=9777

sent 163 bytes received 10,768 bytes 7,287.33 bytes/sec
total size is 11,187 speedup is 1.02

It could be something with your command (spaces, missing comma, etc). Otherwise, could be some different version of rsync, or another library/tool in your OS. But at least looks like Cylc is using the command to retrieve logs correctly.

Cheers
Bruno


[1] https://github.com/kinow/cylc-docker/tree/master/pbs

________________________________
From: Jin Lee <jint...@gmail.com>
To: cylc <cy...@googlegroups.com>
Sent: Wednesday, 16 January 2019 3:07 PM
Subject: [cylc-dev] Re: file permissions for Cylc-generated data files



51291243-55834580-1a6b-11e9-9f30-cfa0dc138d5a.png

Jin Lee

unread,
Jan 22, 2019, 6:36:11 PM1/22/19
to cylc
Hi Bruno,

Thanks for checking Cylc's handling of job log output. The problem we're experiencing on our HPC is intermittent and I've been trying to identify any common event that seems to result in the restricted file permission for log output files. So far I don't seem to have identified any commonality.

I'll try to look into the areas you've suggested. Hopefully something may turn up!

Thank you once again for all your help.


Cheers,

Jin
Reply all
Reply to author
Forward
0 new messages