Race condition in star_imports cache?

49 views
Skip to first unread message

Nils Bruin

unread,
Dec 12, 2012, 12:48:50 PM12/12/12
to sage-devel
There's a patchbot problem on #8327 that needs more eyeballs (or at
least Robert's eyeballs):

The patchbot there sporadically fails on various tests with errors of
the type

Traceback (most recent call last):
File "/mnt/storage2TB/patchbot/.sage/tmp/
volker_desktop.stp.dias.ie-14095/multireplace_29004.py", line 6, in
<module>
from sage.all_cmdline import *;
File "/mnt/storage2TB/patchbot/Sage/sage-5.5.rc0/local/lib/python/
site-packages/sage/all_cmdline.py", line 14, in <module>
from sage.all import *
File "/mnt/storage2TB/patchbot/Sage/sage-5.5.rc0/local/lib/python/
site-packages/sage/all.py", line 72, in <module>
from sage.rings.all import *
File "/mnt/storage2TB/patchbot/Sage/sage-5.5.rc0/local/lib/python/
site-packages/sage/rings/all.py", line 169, in <module>
lazy_import("sage.rings.universal_cyclotomic_field.all","*")
File "lazy_import.pyx", line 850, in
sage.misc.lazy_import.lazy_import (sage/misc/lazy_import.c:5168)
names[ix:ix+1] = get_star_imports(module)
File "lazy_import.pyx", line 900, in
sage.misc.lazy_import.get_star_imports (sage/misc/lazy_import.c:5924)
star_imports = pickle.load(open(cache_file))
EOFError

The code in question:

{{{
if star_imports is None:
cache_file = get_cache_file()
if os.path.exists(cache_file):
star_imports = pickle.load(open(cache_file))
else:
star_imports = {}
}}}

The error seems to indicate that the cache file either gets corrupted
or that
between the path.exists and the open, the file gets truncated or
deleted.

The code that writes the cache file seems careful: It writes the file
to a tmp
and then moves it into place. That should be atomic. However, it does
seem that
the *same* cache file is used by all parallel doctest instances, so
there seems
plenty of room for race conditions that would be more likely to get
triggered on
a ticket that uses star_imports (as this one does). Would:

{{{
if star_imports is None:
cache_file = get_cache_file()
try:
star_imports = pickle.load(open(cache_file))
except IOError:
star_imports = {}
}}}

be a more robust alternative?

Frédéric Chapoton

unread,
Dec 12, 2012, 1:32:41 PM12/12/12
to sage-...@googlegroups.com
There is the same problem with ticket #8386 (http://trac.sagemath.org/sage_trac/ticket/8386)

Volker Braun

unread,
Dec 12, 2012, 1:52:56 PM12/12/12
to sage-...@googlegroups.com
No, writing to tmp and moving the file in place is not atomic! The problem should be clear from the filesystem layout:

On Wednesday, December 12, 2012 5:48:50 PM UTC, Nils Bruin wrote:
Traceback (most recent call last):
  File "/mnt/storage2TB/patchbot/.sage/tmp/
volker_desktop.stp.dias.ie-14095/multireplace_29004.py", line 6, in
<module>

I'm running the patchbot on a separate harddisk, mounted under /mnt/storage2TB. But my temp directory is tmpfs. So the pickle is moved across block devices, which is of course not atomic. Hence the patchbot sometimes dies here while opening a half-written file. The temporary file should be created in the target directory, only then can we be sure that the move is atomic.


Volker Braun

unread,
Dec 12, 2012, 1:58:35 PM12/12/12
to sage-...@googlegroups.com

Nils Bruin

unread,
Dec 15, 2012, 3:52:37 AM12/15/12
to sage-devel
On http://trac.sagemath.org/13826 there is now a workaround. However,
there is currently a problem with where the cache file lives: in
`.sage`. The problem is that running `sage -b` can invalidate the
cache. Indeed `sage -b` removes the cache file from the .sage of the
UID who runs `sage -b`. Other UIDs using that sage still have their
old cache file, which means that `lazy star import` may present the
wrong names as lazily imported.

One solution would be to place the cache file somewhere in SAGE_ROOT.
Then `sage -b` would have to create that cache file. That would
involve running sage as part of `sage -b`. Is it acceptable to do so?
Since the cache file currently gets deleted in setup.py, that would
probably also be the place to create the new one. Hopefully the
original author of lazy_import, Robert, can comment on this?

An alternative would be to check a timestamp during startup, located
somewhere in SAGE_ROOT that gets updated every time `sage -b` gets
run. Then at least a change of the timestamp can be detected and a
rebuild of the cache triggered. Then every UID can synchronize cache
by itself after a `sage -b`. It still means that a cache that really
reflects the particular sage install gets saved in potentially many
UID .sage dirs, but at least they'll be up-to-date.

Volker Braun

unread,
Dec 15, 2012, 8:05:52 AM12/15/12
to sage-...@googlegroups.com
This isn't the only place where something has to be run once after something in the filesystem changed, for example the GAP workspace and running sage-location both have the same underlying problem and independent Python code to it. It would be nice if we could settle on a unified framework. Ideally in a way that works for a system-wide Sage install.  

The first question would be: are we still allowing people to "develop Sage" on a system-wide install? And how (presumably by making a copy of the Sage library in the user's home directory)?

Nils Bruin

unread,
Dec 16, 2012, 2:11:49 PM12/16/12
to sage-devel
On Dec 15, 5:05 am, Volker Braun <vbraun.n...@gmail.com> wrote:
> This isn't the only place where something has to be run once after
> something in the filesystem changed, for example the GAP workspace and
> running sage-location both have the same underlying problem

Perhaps the lowest cost change is to go with a reliable time stamp.
The best I could find is

$SAGE_ROOT/local/lib/python/site-packages/sage-*.egg-info

which I think is the only file that is guaranteed to get rewritten by
`sage -b`.
So one could just save the time stamp of that file in .sage somewhere
(either in a pickle or separately) and compare on startup to see if
`sage -b` has been run since. This would work for both
star_import_cache and for Gap (I imagine Gap could even get a better
time stamp from somewhere else).

It doesn't remove the redundancy that a whole bunch of .sage
directories contain exactly the same cache files, but at least they're
up-to-date.

The sage-location thing seems different to me: Its trigger is NOT a
rebuild. In fact, its action is also different. It has no choice but
to write in SAGE_ROOT. Its action can be unified with the above cases
by, for instance, providing a new option

sage --location-and-cache-housekeeping

which needs to be run after relocation and any `sage -b` (perhaps
automatically?) by the UID that owns sage. Then all system workspaces
and caches can live with SAGE_ROOT. That has the side-benefit that
after upgrading sage a couple of times, you're not left with a lot of
defunct cache files in `.sage`.
(another indication that these cache files are currently living in the
wrong location).

> The first question would be: are we still allowing people to "develop Sage"
> on a system-wide install? And how (presumably by making a copy of the Sage
> library in the user's home directory)?

I think that's a more complicated question. However, I do think we
want to make it possible to actively develop (or at least patch!) a
sage that gets run by multiple UIDs, or at least by a UID different
from the UID that has write permission on the install. If a patch
exists for a particularly nasty bug for a group's research, I could
see how someone would want to patch a production version of sage, so
that students can benefit from the bug fix. These students would need
to refresh caches!

Testing the notebook can similarly lead to running under different
UIDs. Increasing the differences between "developing sage" and
"running sage" would negate some of the peculiar benefits that sage
has (getting quick bug-fixes if you need them).
Reply all
Reply to author
Forward
0 new messages