Race condition with os.makedirs?

80 views
Skip to first unread message

Andrey Novoseltsev

unread,
Jun 24, 2015, 9:51:13 PM6/24/15
to sage-n...@googlegroups.com
Hello,

I have the following problem: notebook sort of works, but it is impossible to create new users, worksheets, or open old ones and the log in the console shows permission errors. Looking at the files that were not created shows that there are directories with drw-r-xr-x mode, i.e. execution bit is off for the user. Directories are created here:
https://github.com/sagemath/sagenb/blob/master/sagenb/storage/filesystem_storage.py#L93
and if I put something afterwards like sleep(1) or printing permissions on created directories everything is working fine (just very slow with sleep(1)).

This suggests that makedirs is not completely finished setting permissions before returning, although it does not make any sense to me.

This behaviour appeared for old and new installation after I've upgraded from wheezy to jessie on the host, using old wheezy, upgraded to jessie, or "fresh jessie" in LXC containers that run notebooks does not matter.

Any ideas on what is really going on and how to fix it?

Thank you,
Andrey

Andrey Novoseltsev

unread,
Jun 24, 2015, 9:59:12 PM6/24/15
to sage-n...@googlegroups.com
Replacing that line with

        if not os.path.exists(p):
            head, tail = os.path.split(p)
            self._makepath(head)
            os.mkdir(p)

works, but it would be nice to understand what is going on and fix the issue.

kcrisman

unread,
Jun 24, 2015, 11:38:51 PM6/24/15
to sage-n...@googlegroups.com, novo...@gmail.com
I have the following problem: notebook sort of works, but it is impossible to create new users, worksheets, or open old ones and the log in the

I regret to say that I do not know anything about the internals of os.makedirs. 

William Stein

unread,
Jun 25, 2015, 12:53:37 AM6/25/15
to sage-notebook, Andrey Novoseltsev
Here are the internals of os.makedirs:

def makedirs(name, mode=0777):
"""makedirs(path [, mode=0777])

Super-mkdir; create a leaf directory and all intermediate ones.
Works like mkdir, except that any intermediate path segment (not
just the rightmost) will be created if it does not exist. This is
recursive.

"""
head, tail = path.split(name)
if not tail:
head, tail = path.split(head)
if head and tail and not path.exists(head):
try:
makedirs(head, mode)
except OSError, e:
# be happy if someone already created the path
if e.errno != errno.EEXIST:
raise
if tail == curdir: # xxx/newdir/. exists if xxx/newdir exists
return
mkdir(name, mode)


The above is just pure Python, except for references into the
Modules/posixmodule.c, e.g., the definition of mkdir is here:

https://github.com/python/cpython/blob/master/Modules/posixmodule.c#L3948

Anyway, the way you describe the problem, suggests to me that maybe
your filesystem or container is somehow not telling the truth to the
OS. E.g., maybe you have some sort of filesystem caching enabled,
which makes things seem way faster, but is dangerous and not
compliant... I don't know. Does the problem go away when you use
ext4 outside a container?

-- William




--
William (http://wstein.org)

Andrey Novoseltsev

unread,
Jun 25, 2015, 1:32:13 AM6/25/15
to sage-n...@googlegroups.com, novo...@gmail.com

I've changed the default to 0755 and added chmod(name, mode) in the very end - things seem to work now (my previous "fix" was not enough as there are multiple calls to makedirs in sagenb).
 

The above is just pure Python, except for references into the
Modules/posixmodule.c, e.g., the definition of mkdir is here:

https://github.com/python/cpython/blob/master/Modules/posixmodule.c#L3948

Anyway, the way you describe the problem, suggests to me that maybe
your filesystem or container is somehow not telling the truth to the
OS.  E.g., maybe you have some sort of filesystem caching enabled,
which makes things seem way faster, but is dangerous and not
compliant...  I don't know.  Does the problem go away when you use
ext4 outside a container?


I can't quickly check ext4 at the moment. LXC+BTRFS worked smoothly under wheezy with backports for kernel and LXC for quite a while (a couple years) and on a different machine there are no problems using BTRFS with jessie direclty (for a year or so). I didn't fiddle with any cache optimizations myself so looks like it is a bug in something. I may try to figure it out closer or see if upgrading to versions from testing resolves issues, for now I better write the final for tomorrow...

Jeroen Demeyer

unread,
Jun 25, 2015, 4:58:56 AM6/25/15
to sage-n...@googlegroups.com
On 2015-06-25 03:51, Andrey Novoseltsev wrote:
> Any ideas on what is really going on and how to fix it?
If sleep(1) helps, this is almost certainly a problem with your
operating system, not with Python or Sagenb.

Andrey Novoseltsev

unread,
Jun 25, 2015, 2:36:08 PM6/25/15
to sage-n...@googlegroups.com

Of course, although it is way easier for me to trace and attempt to fix issues in Sage/SageNB than kernel/filesystem ;-) I'll report if I get any more insight into the problem - with a workaround found there are more pressing things for me to do.

Volker Braun

unread,
Jun 25, 2015, 6:35:27 PM6/25/15
to sage-n...@googlegroups.com
The "if not os.path.exists(p): os.makedirs(p)" pattern is a common mistake: Race between checking and creating the directory if you have multiple processes doing that. Directory creation (with given permissions) is atomic, but you must take the "ask for forgiveness" approach. 

Though that doesn't explain what you are seeing, its clearly a lxc/kernel bug.

Andrey Novoseltsev

unread,
Jun 26, 2015, 4:28:39 PM6/26/15
to sage-n...@googlegroups.com, novo...@gmail.com
On Wednesday, 24 June 2015 22:53:37 UTC-6, William Stein wrote:
The above is just pure Python, except for references into the
Modules/posixmodule.c, e.g., the definition of mkdir is here:

https://github.com/python/cpython/blob/master/Modules/posixmodule.c#L3948


Without much hope I looked at this code and thread statements made me remember that there were semaphore issues in LXC before, not letting threads to work. I had a manual mount at /run/shm, which on jessie is a link to /dev/shm and my containers had mounts at both locations. Celebrating the victory I removed the manual mount but with no effect.

Containers in my setup use Sage built on host and stored at /var/opt, which is visible from containers. For good measure, I've rebuilt Sage from scratch testing outside and inside of containers - there were some glitches with "Bad exit" that were not reproduced on separate testing (although at least one of them was in sagenb/cell.py which may be related under parallel testing to my problems).

Then I've rebuilt Sage from scratch inside of the container, also with a couple of non-reproducible glitches for ptestlong, and now things seem to be OK, although I'll test more extensively tomorrow.

This suggests to me that somehow build environments behave differently inside and outside of containers, even though both host and containers are now configured to use plain jessie repositories and nothing else and both are upgraded to latest versions from there. Any suggestions on how to debug it further to fix whatever is wrong in the setup? One of the reasons for using containers was that it is so easy to share Sage installations between them, including making upgrades to new versions or applying fresh patches when necessary, so building inside is a workaround but an inconvenient one.

Thanks for all the input so far!
Andrey

Volker Braun

unread,
Jun 26, 2015, 4:35:45 PM6/26/15
to sage-n...@googlegroups.com
We just recently fixed a race in Sage (http://trac.sagemath.org/ticket/17924) that was triggered when running inside Docker. At least some of the sage doctest failures are genuine bugs ;-)

Andrey Novoseltsev

unread,
Jun 27, 2015, 10:12:37 PM6/27/15
to sage-n...@googlegroups.com
On Friday, 26 June 2015 14:35:45 UTC-6, Volker Braun wrote:
We just recently fixed a race in Sage (http://trac.sagemath.org/ticket/17924) that was triggered when running inside Docker. At least some of the sage doctest failures are genuine bugs ;-)


Thanks for the pointer - that's the type of errors I was seeing. Trying to compile the freshest beta, I've noticed attempt in the beginning to download configuration files because autotools were not installed - and this was a difference with container environment where my script was installing automake. So I've installed automake on the host, build Sage-6.8.beta6 and then Sage-6.7 on the host, and tried them in containers - both versions worked fine without any issues so far! If that's indeed the case, I would say that there is a bug with "default configuration" or attempt to use it on a non-suitable machine. Why not make autotools a prerequisite (and, if necessary, have a way to override it with "standard configuration")?

Volker Braun

unread,
Jun 28, 2015, 3:57:45 AM6/28/15
to sage-n...@googlegroups.com
There is an autotools optional package but its pretty big (70mb iirc) which is why I'd rather not include it. By design, you should only need autotools when creating the source tarball, and not when compiling. Anything else is a bug. I didn't quite understand the problem that you had, though. The expected result is

* Sage git tree + autotools installed globally: We run autotools, no download.

* Sage git tree + no autotools installed: we download configure-nn.tar.gz with the autotools output.

This is all implemented in the SAGE_ROOT/bootstrap script.

Andrey Novoseltsev

unread,
Jun 28, 2015, 6:00:39 PM6/28/15
to sage-n...@googlegroups.com
On Sunday, 28 June 2015 01:57:45 UTC-6, Volker Braun wrote:
There is an autotools optional package but its pretty big (70mb iirc) which is why I'd rather not include it. By design, you should only need autotools when creating the source tarball, and not when compiling. Anything else is a bug. I didn't quite understand the problem that you had, though. The expected result is

* Sage git tree + autotools installed globally: We run autotools, no download.

And the resulting build runs fine in my containers.
 

* Sage git tree + no autotools installed: we download configure-nn.tar.gz with the autotools output.

And the resulting build has problems while creating directories in containers.
 

Andrey Novoseltsev

unread,
Jun 28, 2015, 10:21:50 PM6/28/15
to sage-n...@googlegroups.com
Correction to the above statements: nothing so far results in reliably running notebook. There is something that allows it to work fine, but I was wrong so far in naming this something...
Reply all
Reply to author
Forward
0 new messages