Failure in doctesting framework when running Sage in docker

69 views
Skip to first unread message

Nicolas M. Thiery

unread,
Mar 3, 2015, 4:06:47 PM3/3/15
to sage-...@googlegroups.com, Vincen...@lri.fr
Dear Sage developers,

A colleague of mine (Vincent Neri, in CC) is investigating the usage
of docker to run multiple instances of the sage patchbot within
sandboxes on a cluster here. This works smoothly, except that the
doctesting framework fails mysteriously on three files, and we haven't
been able to debug. So this is a blocker for moving further, and help
would be much appreciated.


Vincent tried the docker centos-based image built by Volker:

https://github.com/sagemath/docker

He also built his own docker images using several other linux
distributions and various recent versions of Sage (including 6.4.1 and
6.5).

In all cases, all tests pass smoothly, except for three files:

sage -t --warn-long 19.5 src/sage/interfaces/qsieve.py # Bad exit: 1
sage -t --warn-long 19.5 local/lib/python2.7/site-packages/sagenb-0.11.4-py2.7.egg/sagenb/notebook/worksheet.py
# Bad exit: 1
sage -t --warn-long 19.5 local/lib/python2.7/site-packages/sagenb-0.11.4-py2.7.egg/sagenb/notebook/cell.py
# Bad exit: 1

In all three cases, it's actually the doctesting framework that fails,
somewhere during the multiprocessing handling:

Traceback (most recent call last):
File "/home/sage/sage-6.4.1/local/lib/python/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/home/sage/sage-6.4.1/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 1839, in run
task(self.options, self.outtmpfile, msgpipe, self.result_queue)
File "/home/sage/sage-6.4.1/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 2159, in __call__
result_queue.put(result, False)
File "/home/sage/sage-6.4.1/local/lib/python/multiprocessing/queues.py", line 102, in put
raise Full
Full
Bad exit: 1

The failing line in queues.py is doing something like acquiring a semaphore:

if not self._sem.acquire(block, timeout):

The failure occurs as well when running the tests on a single file
(thus without real need for multiprocessing). Steps to reproduce:

> docker run -t -i sagemath/sage su - sage
> cd sage-6.4.1/
> ./sage -t src/sage/interfaces/qsieve.py

Occasionally another test file fails similarly. There can also be some
minor unrelated failures in sagedev.py, but we are not worried about those.

As a complement, here is a minimal file that triggers a systematic failure:

=====8<----------------------- bla.py
r"""
sage: from sage.interfaces.qsieve import qsieve
sage: k = 19; n = next_prime(10^k)*next_prime(10^(k+1))
sage: v, t = qsieve(n, time=True) # uses qsieve; optional - time
sage: q = qsieve(next_prime(10^20)*next_prime(10^21), block=False)
"""
=====8<-----------------------

Any idea of what might go wrong?

Thanks!

Cheers,
Nicolas
--
Nicolas M. Thiéry "Isil" <nth...@users.sf.net>
http://Nicolas.Thiery.name/

Jeroen Demeyer

unread,
Mar 3, 2015, 4:44:13 PM3/3/15
to sage-...@googlegroups.com
On 2015-03-03 22:06, Nicolas M. Thiery wrote:
> Any idea of what might go wrong?
Does "docker" involve unexpected forking or multi-threading?

I have seen that error before and it usually happens because there is a
duplicate process in the doctester somehow.

Also: can you please post the *actual* and *complete* output from
running the command you posted?

Volker Braun

unread,
Mar 3, 2015, 5:23:47 PM3/3/15
to sage-...@googlegroups.com, Vincen...@lri.fr, Nicolas...@u-psud.fr
I'm guessing you are hitting resource limits, e.g. compare "ipcs -l" inside and outside of the docker container.

Afair my image passed doctests on my machine, though that machine is still boxed up in my closet right now...

Nicolas M. Thiery

unread,
Mar 4, 2015, 2:58:14 AM3/4/15
to sage-...@googlegroups.com, Vincen...@lri.fr
Hi,

On Tue, Mar 03, 2015 at 10:44:06PM +0100, Jeroen Demeyer wrote:
> On 2015-03-03 22:06, Nicolas M. Thiery wrote:
> >Any idea of what might go wrong?
> Does "docker" involve unexpected forking or multi-threading?

I don't know. In a first approximation, this should be similar to
running within a virtual machine. But there may be peculiarities with
certain system calls since we are using the host kernel.

Vincent, any clue?

> I have seen that error before and it usually happens because there
> is a duplicate process in the doctester somehow.

Ok.

> Also: can you please post the *actual* and *complete* output from
> running the command you posted?

Sure thing, see below. I ran the test in verbose mode in case that
would be useful.

Thanks!

Cheers,
Nicolas


root@xxx:~# docker run -t -i sagemath/sage su - sage
Last login: Thu Nov 27 15:28:05 GMT 2014
[sage@63182b1b1561 ~]$ uname -a
Linux 63182b1b1561 3.13.0-43-generic #72-Ubuntu SMP Mon Dec 8 19:35:06 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
[sage@63182b1b1561 ~]$ cd sage-6.4.1/
[sage@63182b1b1561 sage-6.4.1]$ ./sage -v
Sage Version 6.4.1, Release Date: 2014-11-23
[sage@63182b1b1561 sage-6.4.1]$ cat > bla.py << EOF
> r"""
> sage: from sage.interfaces.qsieve import qsieve
> sage: k = 19; n = next_prime(10^k)*next_prime(10^(k+1))
> sage: v, t = qsieve(n, time=True) # uses qsieve; optional - time
> sage: q = qsieve(next_prime(10^20)*next_prime(10^21), block=False)
> """
> EOF
[sage@63182b1b1561 sage-6.4.1]$ ./sage -t --verbose bla.py
init.sage does not exist ... creating
no stored timings available
Running doctests with ID 2015-03-04-07-53-04-191d76e2.
Doctesting 1 file.
sage -t bla.py
Trying (line 2): from sage.interfaces.qsieve import qsieve
Expecting nothing
ok [0.00 s]
Trying (line 3): k = 19; n = next_prime(10^k)*next_prime(10^(k+1))
Expecting nothing
ok [0.00 s]
Trying (line 5): q = qsieve(next_prime(10^20)*next_prime(10^21), block=False)
Expecting nothing
ok [0.23 s]
Trying (line 6): sig_on_count()
Expecting:
0
ok [0.00 s]
1 item passed all tests:
4 tests in bla
4 tests in 1 item.
4 passed and 0 failed.
Test passed.
Process DocTestWorker-1:
Traceback (most recent call last):
File "/home/sage/sage-6.4.1/local/lib/python/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/home/sage/sage-6.4.1/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 1839, in run
task(self.options, self.outtmpfile, msgpipe, self.result_queue)
File "/home/sage/sage-6.4.1/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 2159, in __call__
result_queue.put(result, False)
File "/home/sage/sage-6.4.1/local/lib/python/multiprocessing/queues.py", line 102, in put
raise Full
Full
Bad exit: 1
**********************************************************************
Tests run before process (pid=108) failed:
sage: from sage.interfaces.qsieve import qsieve ## line 2 ##
sage: k = 19; n = next_prime(10^k)*next_prime(10^(k+1)) ## line 3 ##
sage: q = qsieve(next_prime(10^20)*next_prime(10^21), block=False) ## line 5 ##
sage: sig_on_count() ## line 6 ##
0

**********************************************************************
----------------------------------------------------------------------
sage -t bla.py # Bad exit: 1
----------------------------------------------------------------------
Total time for all tests: 0.4 seconds
cpu time: 0.0 seconds
cumulative wall time: 0.0 seconds
[sage@63182b1b1561 sage-6.4.1]$ logout

Nicolas M. Thiery

unread,
Mar 4, 2015, 3:05:42 AM3/4/15
to sage-...@googlegroups.com, Vincen...@lri.fr
Hi Volker!

On Tue, Mar 03, 2015 at 02:23:47PM -0800, Volker Braun wrote:
> I'm guessing you are hitting resource limits, e.g. compare "ipcs -l"
> inside and outside of the docker container.

Thanks for the tip!

There does not seem to be a major difference though; just a few less
queues system wide:

root@sage:~# docker run sagemath/sage ipcs -l > inside
root@sage:~# ipcs -l > outside
root@sage:~# diff outside inside
1a2,6
> ------ Messages Limits --------
> max queues system wide = 1285
> max size of message (bytes) = 8192
> default max size of queue (bytes) = 16384
>
14,18d18
<
< ------ Messages Limits --------
< max queues system wide = 1542
< max size of message (bytes) = 8192
< default max size of queue (bytes) = 16384

I would have guessed the test would not be a big resource hog either.

> Afair my image passed doctests on my machine, though that machine
> is still boxed up in my closet right now...

Ok!

Nicolas M. Thiery

unread,
Mar 10, 2015, 5:44:04 AM3/10/15
to sage-...@googlegroups.com, Vincen...@lri.fr
Hi!

Vincent has investigated the issue further with Florent. This is now:

http://trac.sagemath.org/ticket/17924

No good explanation yet, but a proposed workaround. Comments and
review welcome!

Cheers,
Nicolas
Reply all
Reply to author
Forward
0 new messages