permission denied to file (afs problem??)

63 views
Skip to first unread message

nbecker

unread,
Dec 15, 2011, 6:32:00 AM12/15/11
to pic...@googlegroups.com
Several jobs died with 'Permission denied' after having been running for hours.  These were caused by
 cloud.files.putf

Traceback (most recent call last):
  File "/root/.local/lib/python2.7/site-packages/cloudserver/workers/employee/child.py", line 583, in run
  File "./run-cloud-11121401.py", line 23, in run_test
  File "test_8psk_cancel_lms.py", line 919, in run_line
  File "test_8psk_cancel_lms.py", line 689, in run
OSError: [Errno 13] Permission denied

Aaron Staley

unread,
Dec 16, 2011, 2:27:44 AM12/16/11
to pic...@googlegroups.com
Hello,

Our apologies.  We use openafs as our distributed file system.  Security is controlled by a "ticket system".  We issue you a ticket to your job right before it begins.  Unfortunately, the tickets expire in 10 hours.  After your job passed the 10 hour mark, it no longer had permission to read the file system - which resulted in the error you saw.

The underlying issue will be fixed in our next system update; until then, please be sure to catch and ignore any OSError exceptions.  We also recommend keeping job times to no more than a few hours if possible.

Finally, we are crediting the errored jobs' computation time to you.  Please file a support ticket if you wish for us to restart the job.

Regards,
Aaron Staley

Neal Becker

unread,
Dec 16, 2011, 11:18:57 AM12/16/11
to pic...@googlegroups.com
I can ignore OSError, but is this just a transient problem, or will a long running process continue to get OSError after 10 hours?

Aaron Staley

unread,
Dec 16, 2011, 6:44:06 PM12/16/11
to pic...@googlegroups.com
We won't be able to deploy our system update for at least a week, possibly longer.  We will notify you when the issue is resolved.
Reply all
Reply to author
Forward
0 new messages