problems with testing sandpiles.py on 6.1b2

121 views
Skip to first unread message

Nils Bruin

unread,
Dec 27, 2013, 1:16:06 PM12/27/13
to sage-...@googlegroups.com
I'm finding that "sandpiles.py" often fails for me for varous reasons while doing `make ptest`. Testing the file individually or via `sage -tp src/sage/sandpiles` (i.e., just the relevant directory) seems fine.

The errors I have received up to now are "Timed out (2 times)"
---
 File "src/sage/sandpiles/sandpile.py", line 4684, in sage.sandpiles.sandpile.complete_sandpile
Failed example:
    K.betti(verbose=False)
...
      File "multi_polynomial_libsingular.pyx", line 912, in sage.rings.polynomial.multi_polynomial_libsingular.MPolynomialRing_libsingular.__call__ (sage/rings/polynomial/multi_polynomial_libsingular.cpp:7912)
    TypeError: Could not find a mapping of the passed element to this ring.
---
(same place but different error:)
      File "multi_polynomial_libsingular.pyx", line 807, in sage.rings.polynomial.multi_polynomial_libsingular.MPolynomialRing_libsingular.__call__ (sage/rings/polynomial/multi_polynomial_libsingular.cpp:6207)
      File "/usr/local/sage/sage-dev/local/lib/python2.7/site-packages/sage/interfaces/singular.py", line 1245, in __repr__
        elif self.type() == 'matrix':
      File "/usr/local/sage/sage-dev/local/lib/python2.7/site-packages/sage/interfaces/singular.py", line 1976, in type
        return m.group(1)
    AttributeError: 'NoneType' object has no attribute 'group'
---

I ran into this issue while trying to see if I can be of help to #10963, so some of the failures above were with the branch there (but the errors seem unrelated to that, since failures and timeouts have happened on both). This is on Fedora 19, Intel i7. I figured this would be a good release to hunt for heisenbugs.

Nils Bruin

unread,
Dec 28, 2013, 6:33:59 PM12/28/13
to sage-...@googlegroups.com
On Friday, December 27, 2013 10:16:06 AM UTC-8, Nils Bruin wrote:
I'm finding that "sandpiles.py" often fails for me for varous reasons while doing `make ptest`. Testing the file individually or via `sage -tp src/sage/sandpiles` (i.e., just the relevant directory) seems fine.

It seems that the code involved in sandpiles explicitly uses the "singular" interface rather than libsingular, so I suspect it may be some IPC that goes wrong (the one error I am seeing seems the processing of a mismatched regular expression and for the time-out I am seeing the python and Singular processes just sitting there, not using any  CPU). I have been able to trigger the error much more economically by extracting one of the doctests:

$ cat ~/tt.py
def tt():
    r"""
    The partitions of the vertices of ``S`` into ``k`` parts,
    each of which is connected.

    INPUT:

    ``S`` - Sandpile
    ``k`` - integer

    OUTPUT:

    list of partitions

    EXAMPLES::

        sage: S = Sandpile(graphs.CycleGraph(4), 0)
        sage: P = [admissible_partitions(S, i) for i in [2,3,4]]
        sage: P
        [[{{0}, {1, 2, 3}},
          {{0, 2, 3}, {1}},
          {{0, 1, 3}, {2}},
          {{0, 1, 2}, {3}},
          {{0, 1}, {2, 3}},
          {{0, 3}, {1, 2}}],
         [{{0}, {1}, {2, 3}},
          {{0}, {1, 2}, {3}},
          {{0, 3}, {1}, {2}},
          {{0, 1}, {2}, {3}}],
         [{{0}, {1}, {2}, {3}}]]
        sage: for p in P:
        ...    sum([partition_sandpile(S, i).betti(verbose=False)[-1] for i in p])
        6
        8
        3
        sage: S.betti()
                   0     1     2     3
        ------------------------------
            0:     1     -     -     -
            1:     -     6     8     3
        ------------------------------
        total:     1     6     8     3
    """
    pass
$ ln -s tt.py tt1.py # and similarly for tt2.py ... tt9.py
$ sage -tp 2 tt*.py

This doesn't trigger the errors with absolute certainty, but at least quite regularly (I guess I could up the number of symlinks to increase the probability of at least one failure). Can other people confirm similar failures? My homedir is NFS mounted on this machine, so if the singular interface decides to use temporary files to transmit information I can imagine something going wrong.

Volker Braun

unread,
Dec 28, 2013, 9:43:49 PM12/28/13
to sage-...@googlegroups.com
I can't reproduce it on my ssd, but might be timing related. Can you trigger it when you run as 

strace -s 1024 -ff -o tmpdir/log sage -tp2 tt*.py

If there is a temp file conflict then we can figure it out from the log...

Nils Bruin

unread,
Dec 29, 2013, 4:30:33 AM12/29/13
to sage-...@googlegroups.com
On Saturday, December 28, 2013 6:43:49 PM UTC-8, Volker Braun wrote:
I can't reproduce it on my ssd, but might be timing related. Can you trigger it when you run as 

strace -s 1024 -ff -o tmpdir/log sage -tp2 tt*.py

If there is a temp file conflict then we can figure it out from the log...
 
Wow, running under strace seems to seriously reduce the probability of whatever goes wrong, but it seems I finally got a run to fail under strace control:

sage -t /tmp/HOME/ttg.py
**********************************************************************
File "/tmp/HOME/ttg.py", line 36, in ttg.tt
Failed example:
    S.betti()
Exception raised:
    Traceback (most recent call last):
      File "/usr/local/sage/sage-dev/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 480, in _run
        self.execute(example, compiled, test.globs)
      File "/usr/local/sage/sage-dev/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 839, in execute
        exec compiled in globs
      File "<doctest ttg.tt[4]>", line 1, in <module>
        S.betti()
      File "/usr/local/sage/sage-dev/local/lib/python2.7/site-packages/sage/sandpiles/sandpile.py", line 2036, in betti
        print singular.eval('print(betti(%s),"betti")'%self._singular_resolution.name())
      File "/usr/local/sage/sage-dev/local/lib/python2.7/site-packages/sage/sandpiles/sandpile.py", line 408, in __getattr__
        self._set_resolution()
      File "/usr/local/sage/sage-dev/local/lib/python2.7/site-packages/sage/sandpiles/sandpile.py", line 1909, in _set_resolution
        row = new[j].transpose().sage_matrix(self._ring)
      File "/usr/local/sage/sage-dev/local/lib/python2.7/site-packages/sage/interfaces/singular.py", line 1763, in sage_matrix
        A[x,y]=R(self[x+1,y+1])
      File "multi_polynomial_libsingular.pyx", line 807, in sage.rings.polynomial.multi_polynomial_libsingular.MPolynomialRing_libsingular.__call__ (sage/rings/polynomial/multi_polynomial_libsingular.cpp:6226)

      File "/usr/local/sage/sage-dev/local/lib/python2.7/site-packages/sage/interfaces/singular.py", line 1245, in __repr__
        elif self.type() == 'matrix':
      File "/usr/local/sage/sage-dev/local/lib/python2.7/site-packages/sage/interfaces/singular.py", line 1976, in type
        return m.group(1)
    AttributeError: 'NoneType' object has no attribute 'group'
**********************************************************************
1 item had failures:
   1 of   6 in ttg.tt
    [5 tests, 1 failure, 0.80 s]

The strace log directory sits in:

http://sage.math.washington.edu/home/nbruin/strace.tgz

Nils Bruin

unread,
Dec 29, 2013, 2:08:53 PM12/29/13
to sage-...@googlegroups.com
On Sunday, December 29, 2013 1:30:33 AM UTC-8, Nils Bruin wrote:

More specifically, I think the python process experiencing the error is 31307 and the singular process it is communicating with is  31312. You can compare it with (python) 31305 communicating with (singular) 31308.

Meld is able to find sufficient similarities to easily compare the files. The bottom of log.31312 gives an indication: For some reason 31307 sends "type(sage698);" to singular where 31305 sends some other string, and the result of "type(sage698)" seems to quickly lead to problems. It may well be that the error happens earlier already, because I don't see why 31307 and 31305 decide to do different things.

Nils Bruin

unread,
Dec 29, 2013, 3:46:01 PM12/29/13
to sage-...@googlegroups.com
On Sunday, December 29, 2013 11:08:53 AM UTC-8, Nils Bruin wrote:
 I don't see why 31307 and 31305 decide to do different things.
In fact, it may be the following: Singular at some point does this write:

read(0, "print(sage698);\n", 1024)      = 16
write(1, "x2^2-x0^2", 9)                = 9
write(1, "\n", 1)                       = 1
write(1, "> ", 2)                       = 2

The corresponding reads are:

log.31305:read(26, "print(sage698);\r\nx2^2-x0^2\r\n> ", 1000) = 30
log.31307:read(26, "print(sage698);x2^2-x0^2\r\n> \r\n", 1000) = 30

As you can see, a new line ends up misplaced when 31307 is reading this. It looks like the tty echo doesn't finish before singular manages to write its output. I think you found a problem like this before. It would be nice if we could get a more systematic solution to this problem, e.g.:
 1) turn off echoing altogether
 2) get the tty to complete its echo before returning the line to the process to which the tty is stdin
 3) include a newline with the string we feed to singular (and other interfaces): hopefully the line itself gets echoed in one write, so if we delimit it by a newline ourselves perhaps we can more easily recognize/separate the echo from actual return output.

Nils Bruin

unread,
Dec 30, 2013, 12:50:44 AM12/30/13
to sage-...@googlegroups.com
OK, I think the following clearly indicates the sync problem we're running into. This is a tiny example of a program that prompt for input and then returns (fixed) output.

$cat hiho.c
#include<stdio.h>
#include<unistd.h>
int main(int argc, char** argv) {
    char *line[1000];
    size_t len = 0;
    ssize_t readlen;
    while (1) {
        printf("prompt:");fflush(stdout);
        readlen=read(0,line,1000);
        printf("received 'hiho'\n",line);       
    };
}
$ gcc hiho.c -o hiho
$ sage
[...]
sage: %cpaste
Pasting code; enter '--' alone on the line to stop or use Ctrl-D.
:import pexpect
:child=pexpect.spawn("env ./hiho")
:i=0
:s='hiho'
:Atemplate = s+"\r\n"
:Btemplate = "received '%s'\r\n"%s
:while true:
:    i+=1
:    u=child.expect('prompt:')
:    u=child.sendline(s)
:    A=child.readline()
:    B=child.readline()
:    if A != Atemplate:
:        print "ERROR! received %s"%repr(A)
:        break
:    if B != Btemplate:
:        print "ERROR! received %s"%repr(B)
:        break
:    if (i%100) == 0:
:        print "completed iteration %d"%i
:--
completed iteration 100
completed iteration 200
ERROR! received "hihoreceived 'hiho'\r\n"

This stuff is very sensitive! For instance, using "getline" rather than "read did not allow me to trigger the error, and also with read it is very hard to get it triggered. It's interesting that Singular seems so apt at triggering the condition (how much tighter than the loop above can we get?)

Volker Braun

unread,
Dec 30, 2013, 1:50:31 AM12/30/13
to sage-...@googlegroups.com
This is similar to http://trac.sagemath.org/15440, the problem boils down to: The expect pattern must(!) trigger at a place where all input has been eaten. Including newline/spaces. If not then there is still input coming into the subprocess, and the echo of that further input will show up at random places in the output stream.

The annoying thing is that it is hard to test that the expect is being used correctly, and that it leads to somewhat improbable Heisenbugs.

Nils Bruin

unread,
Dec 30, 2013, 3:33:57 AM12/30/13
to sage-...@googlegroups.com
On Sunday, December 29, 2013 10:50:31 PM UTC-8, Volker Braun wrote:
This is similar to http://trac.sagemath.org/15440, the problem boils down to: The expect pattern must(!) trigger at a place where all input has been eaten. Including newline/spaces. If not then there is still input coming into the subprocess, and the echo of that further input will show up at random places in the output stream.

I don't think that's the issue here. The problem here seems to be that the subprocess sends output before the pty is done echoing. That doesn't seem to within pexpect's control: it's the subprocess that's too fast. Actually, it seems to me that the pty is being a bit careless: shouldn't it make sure that it's done echoing before it actually delivers the character to the process? Particularly because the "\r\n" that the pty make part of its echo is actually invented by the pty. Is the problem that the pty doesn't flush before delivering the input to the subprocess?

Anyway, if we would be able to turn off echoing altogether, we'd be able to get rid of a lot of problems.For instance, in the example above, when I do

sage: import pexpect
sage: child=pexpect.spawn("env ./myecho")
sage: child.setecho(0)
sage: child.expect('prompt:')
0
sage: child.sendline('hiho')
5
sage: child.readline()
"received 'hiho'\r\n"

we see no echoed input any more. In particular there isn't an extra "\r\n" that can go astray. With singular we can perhaps do:

sage: import pexpect
sage: child=pexpect.spawn("Singular -t")
sage: child.setecho(0)
sage: child.expect("> ")
0
sage: child.sendline("1+2;")
5
sage: child.readline()
'3\r\n'

Note that we did not see "1+2;\r\n" echoed to us!

Hopefully the other systems also have a way of turning of echoing (it seems necessary to include the -t option: otherwise Singular starts setting some terminal options of its own, which seems to make it impossible to turn off echoing.

Volker Braun

unread,
Dec 30, 2013, 12:09:38 PM12/30/13
to sage-...@googlegroups.com
On Monday, December 30, 2013 8:33:57 AM UTC, Nils Bruin wrote:
I don't think that's the issue here. The problem here seems to be that the subprocess sends output before the pty is done echoing. That doesn't seem to within pexpect's control: it's the subprocess that's too fast.

That is exactly what I am talking about: The pexpect pattern must match at a point where the subprocess has stopped processing and is just waiting for further input. But if you match for a prompt, say, and then the subprocess spits out another space or return before it waits for input then the additional characters will end up in random places in the output stream.

Actually, it seems to me that the pty is being a bit careless: shouldn't it make sure that it's done echoing before it actually delivers the character to the process?

There is no such guarantee, pexpect is akin to typing into a terminal. If echo is on then the typed key shows up immediately and in-between stdout.
 

Nils Bruin

unread,
Dec 30, 2013, 2:09:19 PM12/30/13
to sage-...@googlegroups.com
On Monday, December 30, 2013 9:09:38 AM UTC-8, Volker Braun wrote:
That is exactly what I am talking about: The pexpect pattern must match at a point where the subprocess has stopped processing and is just waiting for further input. But if you match for a prompt, say, and then the subprocess spits out another space or return before it waits for input then the additional characters will end up in random places in the output stream.

I still have trouble fitting that diagnosis to what I observe:

Parent:

P1: write(26, "print(sage698);", 15)        = 15
P2: select(0, NULL, NULL, NULL, {0, 0})     = 0 (Timeout)
P3: write(26, "\n", 1)                      = 1

Child:

C1: read(0, "print(sage698);\n", 1024)      = 16
C2: write(1, "x2^2-x0^2", 9)                = 9
C3: write(1, "\n", 1)                       = 1
C4: write(1, "> ", 2)                       = 2

Parent:

P4: read(26, "print(sage698);x2^2-x0^2\r\n> \r\n", 1000) = 30

The fact that C1 ends in a newline sugggests that both P1 and P3 have completed before C1 executes. However, P4 suggests that it is seeing the result of
<echo of P1>, C2, C3, C4, <echo of P3>. That looks more like a race condition between the pty echo and the output produced by the child. I don't see how pexpect could avoid this. from happening (and it's rare indeed).

Actually, it seems to me that the pty is being a bit careless: shouldn't it make sure that it's done echoing before it actually delivers the character to the process?

There is no such guarantee, pexpect is akin to typing into a terminal. If echo is on then the typed key shows up immediately and in-between stdout.

 Hm, I've never seen a key I typed end up being echoed *after* the next prompt had printed, and I thought this was because the prompt was printed only *after* the input was received, hence *after* the echo had completed. That is not quite the same scenario as two processes writing to the same stream in an uncoordinated way.

Possibly, if we merge P1 and P3 we can significantly reduce the probability of this happening, since then P1+P3 will arrive together at the pty, so it has a better chance of echoing them both.

Alternatively, if we get rid of the echo altogether, then there is nothing to have a race condition with, so this particular issue might be going away completely.

On #15440 it also seems to be an *echoed* space that gets in the way, so it might help there too. Preliminary tests suggest:

sage: import pexpect
sage: child=pexpect.spawn("maxima")
sage: child.expect("(%i.*) ")

0
sage: child.sendline("1+2;")
5
sage: child.readline()
'1+2;\r\n'
sage: child.readline()
'(%o1)                                  3\r\n'

whereas with echo turned off:

sage: import pexpect
sage: child=pexpect.spawn("maxima")
sage: child.setecho(0)
sage: child.expect("(%i.*) ")

0
sage: child.sendline("1+2;")
5
sage: child.readline()
'(%o1)                                  3\r\n'


Volker Braun

unread,
Dec 31, 2013, 11:23:35 PM12/31/13
to sage-...@googlegroups.com
Yes, agree that this is slightly different.

I also agree that it would be much better if the pexpect interfaces would have been written against non-echoing tty. Of course that'll only work if the subprocess doesn't turn echoing back on. The downside is of course that you have to rewrite the way the pexpect interface works.

The alternative is to get rid of pexpect altogether, and patch the subprocess input to instead setup & read from a socket (and flush on command). I did some work in that direction but haven't had the time to finish it...

Nils Bruin

unread,
Jan 1, 2014, 12:13:23 AM1/1/14
to sage-...@googlegroups.com
On Tuesday, December 31, 2013 8:23:35 PM UTC-8, Volker Braun wrote:
I also agree that it would be much better if the pexpect interfaces would have been written against non-echoing tty. Of course that'll only work if the subprocess doesn't turn echoing back on. The downside is of course that you have to rewrite the way the pexpect interface works.

I tried: With maxima it's easy to inset a self._pexpect.setecho(0) in the _start method. The echo line is explicitly read by a .readline() and deactivating that is enough to get doctests to pass.

For other interfaces (especially singular) it's not so easy: The standard eval in interfaces/pexpect returns the string between the first "\n" and the last "\r", so it actually uses the echoed "\n" to delineate the returned result. Making that work without echo requires a little more work than I tried.

It does seem that at least singular, maxima, gp, don't turn echoing back on. Gap does seem to turn it back on.

Volker Braun

unread,
Jan 4, 2014, 5:07:36 PM1/4/14
to sage-...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages