Too many Maximas!

154 views
Skip to first unread message

Erik Bray

unread,
Feb 7, 2017, 11:30:26 AM2/7/17
to sage-devel
A problem I've been having lately when running Sage's test suite on
Cygwin (i.e. sage -t -a).

Several of the tests that use Maxima are spinning up Maxima processes
(I guess interacted with via pexpect?) and not killing them. And
what's worse, is that each one seems to sit in a busy wait. So even
once the test suite has long since moved on to other tests I'm left
with dozens of maxima processes chewing up my CPU for nothing. I can
kill them manually and it doesn't seem to break any tests.

But in the meantime, can anyone point me in the right direction to
look for what might be causing this? It seems to only be maxima
that's the problem. Other pexpect interfaces don't leave processes
running (much less eating up CPU).

Thanks,
Erik

William Stein

unread,
Feb 7, 2017, 11:47:40 AM2/7/17
to sage-devel
Basic question: is there any reason whatsoever for us to even have a
pexpect interface to maxima anymore? Nils Bruin (and others) put a
massive amount of effort into a C library version (using ecl) over the
years...

>
> Thanks,
> Erik
>
> --
> You received this message because you are subscribed to the Google Groups "sage-devel" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to sage-devel+...@googlegroups.com.
> To post to this group, send email to sage-...@googlegroups.com.
> Visit this group at https://groups.google.com/group/sage-devel.
> For more options, visit https://groups.google.com/d/optout.



--
William (http://wstein.org)

Erik Bray

unread,
Feb 7, 2017, 12:36:28 PM2/7/17
to sage-devel
On Tue, Feb 7, 2017 at 5:46 PM, William Stein <wst...@gmail.com> wrote:
> On Tue, Feb 7, 2017 at 11:30 AM, Erik Bray <erik....@gmail.com> wrote:
>> A problem I've been having lately when running Sage's test suite on
>> Cygwin (i.e. sage -t -a).
>>
>> Several of the tests that use Maxima are spinning up Maxima processes
>> (I guess interacted with via pexpect?) and not killing them. And
>> what's worse, is that each one seems to sit in a busy wait. So even
>> once the test suite has long since moved on to other tests I'm left
>> with dozens of maxima processes chewing up my CPU for nothing. I can
>> kill them manually and it doesn't seem to break any tests.
>>
>> But in the meantime, can anyone point me in the right direction to
>> look for what might be causing this? It seems to only be maxima
>> that's the problem. Other pexpect interfaces don't leave processes
>> running (much less eating up CPU).
>
> Basic question: is there any reason whatsoever for us to even have a
> pexpect interface to maxima anymore? Nils Bruin (and others) put a
> massive amount of effort into a C library version (using ecl) over the
> years...

On this I have no idea. It might be that the pexpect interface is
retained for backwards compat, and that the tests for it are causing
problems. But I don't know--this is why I'm asking.

William Stein

unread,
Feb 7, 2017, 12:38:33 PM2/7/17
to sage-devel
> On this I have no idea. It might be that the pexpect interface is
> retained for backwards compat, and that the tests for it are causing
> problems. But I don't know--this is why I'm asking.

That is my understanding/guess, and we could probably deprecate it or
consider it unsupported/experimental. But I could be wrong...


--
William (http://wstein.org)

Jeroen Demeyer

unread,
Feb 7, 2017, 12:47:02 PM2/7/17
to sage-...@googlegroups.com
On 2017-02-07 17:46, William Stein wrote:
> Basic question: is there any reason whatsoever for us to even have a
> pexpect interface to maxima anymore? Nils Bruin (and others) put a
> massive amount of effort into a C library version (using ecl) over the
> years...

Basic answer: there are tons of things in Sage which don't have any
reason to exist whatsoever except for historical reasons. Most of these
can and should be fixed, but that takes time...

Jeroen Demeyer

unread,
Feb 7, 2017, 12:49:14 PM2/7/17
to sage-...@googlegroups.com
On 2017-02-07 17:30, Erik Bray wrote:
> A problem I've been having lately when running Sage's test suite on
> Cygwin (i.e. sage -t -a).
>
> Several of the tests that use Maxima are spinning up Maxima processes
> (I guess interacted with via pexpect?) and not killing them.

This is probably Cygwin-specific. It would help if you could give some
more details. For example: is the problem reproducible or does it only
happen sometimes? Do you know which files cause the problem? Do the
doctests actually pass? Does Cygwin have something like strace which
might help to debug this?

Jean-Pierre Flori

unread,
Feb 7, 2017, 3:35:18 PM2/7/17
to sage-devel
Our pexpect interface happily leaves zombies around on linux because it never waits on the processes it forks.
At least for each instantiation of the pexpect class it tries to launch sage-cleaner to have at least one instance running and only the first does actual work, the other ones becoming zombies.
Not sure about the real process, i.e. maxima, magma, ..., anymore.

Nils Bruin

unread,
Feb 7, 2017, 5:08:23 PM2/7/17
to sage-devel
On Tuesday, February 7, 2017 at 8:47:40 AM UTC-8, William wrote:

Basic question: is there any reason whatsoever for us to even have a
pexpect interface to maxima anymore?  Nils Bruin (and others) put a
massive amount of effort into a C library version (using ecl) over the
years...

Not for the use of calculus in sage. However, sage's pexpect interfaces in general are pretty useful and it can be worthwhile to be able to compute with separate maxima executables (so that the lisp is *inside* the sage process, and can be a different (higher performance?) lisp. So yes, I think there is some benefit to maintaining the maxima pexpect interface, and for ease of transition, maintain a side to maxima_lib that is compatible with pexpect interfaces. Whether the benefit is worth the cost is another matter ...

William Stein

unread,
Feb 7, 2017, 5:12:30 PM2/7/17
to sage-devel
On Tue, Feb 7, 2017 at 5:08 PM, Nils Bruin <nbr...@sfu.ca> wrote:
> On Tuesday, February 7, 2017 at 8:47:40 AM UTC-8, William wrote:
>>
>>
>> Basic question: is there any reason whatsoever for us to even have a
>> pexpect interface to maxima anymore? Nils Bruin (and others) put a
>> massive amount of effort into a C library version (using ecl) over the
>> years...
>>
> Not for the use of calculus in sage.

Wow. Calculus in Sage is nearly the only critical reason we have
Maxima in Sage; I didn't realize all that C library interface work
wasn't useful for that. Oh well. Thanks for the clarification. Is
the issue that Maxima tries to ask questions?





--
William (http://wstein.org)

Nils Bruin

unread,
Feb 7, 2017, 5:26:12 PM2/7/17
to sage-devel
I think you misunderstood my answer. Having a pexpect interface to maxima is not useful for calculus, because calculus uses the maxima_lib interface. In some places it is still communicating with maxima_lib through constructing and parsing strings, pretending that maxima_lib offers a pexpect interface. Apparently this is not a bottle neck, so people haven't bothered converting to the more efficient APIs that maxima_lib offers.

The "ask questions" problem doesn't arise: maxima_lib gets monkey-patched to raise an error instead of expecting the question to be answered.

Ralf Stephan

unread,
Feb 8, 2017, 2:17:27 AM2/8/17
to sage-devel
On Tuesday, February 7, 2017 at 5:47:40 PM UTC+1, William wrote:
Basic question: is there any reason whatsoever for us to even have a
pexpect interface to maxima anymore?

Erik Bray

unread,
Feb 8, 2017, 4:32:49 AM2/8/17
to sage-devel
What do you mean here exactly? It would be fine if it left a zombie
process since that would ultimately be reaped. What's weird is that
the processes are all busy-waiting.

Erik Bray

unread,
Feb 8, 2017, 4:36:06 AM2/8/17
to sage-devel
Yes, almost certainly Cygwin-specific. Though I'm not sure when it
started--this didn't happen when I was running the tests a few months
ago.

It's reproducible insofar as every time I run the full test suite it
happens. I haven't pinpointed any specific tests that cause the
problem--that's mainly what I was asking for help with. I.e. what are
some tests that use Maxima?

And yes, the tests pass. I just watch `ps` and every few minutes
there will be about a dozen new `maxima` processes running, each using
an equal percentage of the CPU time. I kill them and it doesn't seem
to have any impact on the tests, as whatever tests were responsible
for starting them are long passed.

Erik Bray

unread,
Feb 8, 2017, 4:44:20 AM2/8/17
to sage-devel
On Wed, Feb 8, 2017 at 10:36 AM, Erik Bray <erik....@gmail.com> wrote:
> On Tue, Feb 7, 2017 at 6:49 PM, Jeroen Demeyer <jdem...@cage.ugent.be> wrote:
>> On 2017-02-07 17:30, Erik Bray wrote:
>>>
>>> A problem I've been having lately when running Sage's test suite on
>>> Cygwin (i.e. sage -t -a).
>>>
>>> Several of the tests that use Maxima are spinning up Maxima processes
>>> (I guess interacted with via pexpect?) and not killing them.
>>
>>
>> This is probably Cygwin-specific. It would help if you could give some more
>> details. For example: is the problem reproducible or does it only happen
>> sometimes? Do you know which files cause the problem? Do the doctests
>> actually pass? Does Cygwin have something like strace which might help to
>> debug this?
>
> Yes, almost certainly Cygwin-specific. Though I'm not sure when it
> started--this didn't happen when I was running the tests a few months
> ago.
>
> It's reproducible insofar as every time I run the full test suite it
> happens. I haven't pinpointed any specific tests that cause the
> problem--that's mainly what I was asking for help with. I.e. what are
> some tests that use Maxima?

To answer this question for myself--as the discussion on what Maxima
is used for in Sage pointed me in the right direction--the
sage/calculus tests reliably start up at least 3 maxima processes,
which then run away with my CPU even after the those tests are
finished. I'll see if I can see what exactly they are doing.

kcrisman

unread,
Feb 8, 2017, 10:58:26 AM2/8/17
to sage-devel

To answer this question for myself--as the discussion on what Maxima
is used for in Sage pointed me in the right direction--the
sage/calculus tests reliably start up at least 3 maxima processes,
which then run away with my CPU even after the those tests are
finished.  I'll see if I can see what exactly they are doing.

I believe that some doctests explicitly start up maxima not via maxima_lib (as opposed to some perhaps silently doing this #17753), maybe in the tutorial or constructions documents.  I think it would be reasonable to replace some of those if they are causing problems.

But there is no reason to *remove* the pexpect, just to make Sage not dependent upon it for any "normal" operations.

Ralf Stephan

unread,
Feb 8, 2017, 11:07:10 AM2/8/17
to sage-devel
On Wednesday, February 8, 2017 at 4:58:26 PM UTC+1, kcrisman wrote:
But there is no reason to *remove* the pexpect, just to make Sage not dependent upon it for any "normal" operations.

You mean we should leave its dead code in Sage when we're done, just to make sure porting stays inherently difficult?

William Stein

unread,
Feb 8, 2017, 11:14:32 AM2/8/17
to sage-devel
As explained by several people above, it is not required for
symbolics, but it does implement several unique features that are not
available in any other way...

William

Erik Bray

unread,
Feb 9, 2017, 7:29:11 AM2/9/17
to sage-devel
I've gained a little insight into the problem. On one hand, I would
say there are some bugs in ecl, but on the other hand it can't be
entirely blamed as we're veering into the territory of undefined
behavior here.

The TL;DR version is that when `maxima.quit()` (or something similar)
is called, `SagePtyProcess.close()` calls `self.fileobj.close()`.
This closes the file for the master pty from the forkpty that started
the child process, resulting in an (unhandled, afaict) SIGHUP, and
subsequently broken stdio streams. *How* exactly they are broken
though seems to be platform dependent, leading to different behaviors
(some of which I think is buggy). In turn, there are some buglets in
ECL's error handling on both Cygwin *and* Linux. The bugs on Linux
happen to be a bit nicer so it allows Maxima to exit quickly. The bug
on Cygwin, on the other hand, sense it into an infinite loop of
select() calls. Even though Sage tries to kill the process, this loop
is such a CPU drain that once you get 2 or 3 of them going
simultaneously it bogs down the system.

Then the pty is closed, if maxima's REPL is waiting for user input,
it's in a blocking read() on stdin. This read exits with an error
status, triggering an exception in ECL, which drops into the LISP
debugger. On the way though, it passes through Maxima's custom debug
handler, which prints a message on how to disable Maxima's debug
handler, then passes execution back to Maxima's REPL.

On the way, in the course of printing that message there are some
intermediate steps, but ultimately it goes into this function:

https://gitlab.com/embeddable-common-lisp/ecl/blob/310b51b677aef80f39bdd784e958b5727bcf8c5e/src/c/file.d#L3347

That function calls an fwrite() for one character and uses the return
value of fwrite() to determine if a write error occurred. This,
however, is not reliable. On Linux, in fact, fwrite() is returning 1
here, though errno gets set to 5 (EIO). I'm inclined to call that a
bug in my glibc (2.19, FWIW), but it's not clear exactly what the
behavior should be. ISTM calling ferror() here instead is the only
truly reliable way to determine if an error occurred in fwrite(). So
I think this is "bug" 1 in ECL (and possibly glibc).

The fact that an error isn't detected here on Linux is good news. It
finishes "printing" (with errors) Maxima's debug message, then returns
to the input prompt. It then immediately reads an EOF and exits. No
problem.

On Cygwin, however, this is where everything blows up. On Cygwin, the
error on fwrite() *is* detected. This results in recursively looping
back into ECL's error handler. Fortunately it has a mechanism to
prevent re-entering a custom debug handler if an error occurred in
that debug handler, so it doesn't re-enter Maxima's debug handler and
instead skips straight to ECL's default debugger. One of the first
this this does is to call a function called (clear-input), which is
meant to clear any pending input waiting on stdin. The implementation
for this (for a stdio stream) is here:

https://gitlab.com/embeddable-common-lisp/ecl/blob/310b51b677aef80f39bdd784e958b5727bcf8c5e/src/c/file.d#L3401

It goes into this dastardly loop which first calls a function call
flisten() which typically calls select() on stdin to check if it's
ready for reading. Then it calls getc(stdin) and throws away the
result. It doesn't check the return value of getc() for an error
condition. In flisten() it *does* check if the file is at EOF. On
Linux, if it gets to this point (after fixing the first bug),
feof(stdin) returns 1. On Cygwin, on the other hand, it does not set
EOF on stdin, but ferror(stdin) does return 1 (in other words, it
doesn't treat the stream as at EOF, though it is in an error
condition). I think this is a bug in Cygwin, but that's also a big
unclear. According to [1]:

"If a read error occurs, the error indicator for the stream shall be
set, fgetc() shall return EOF, and shall set errno to indicate the
error."

It does not explicitly say that the stream's end-of-file indicator
should be set, even though the function returns EOF. Linux is setting
the end-of-file indicator even on error, while Cygwin is not. I think
it would be better if it did, though I don't think it's strictly wrong
that it doesn't.

Anyhow, because this loop doesn't check for errors, it goes on
infinitely and busily until the process is killed. Adding an ferror()
check in the flisten() function fixes it.

Aside from the fixes directly to ECL, I think this can be worked
around in Sage by not explicitly closing the master pty until the
process has exited. This could be done by modifying
SagePtyProcess.terminate_async to accept a callback function to be
called after the child process is terminated, by handling SIGCHLD.
(Among other possibilities).

Erik

[1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/fgetc.html

Jeroen Demeyer

unread,
Feb 9, 2017, 7:42:06 AM2/9/17
to sage-...@googlegroups.com
On 2017-02-09 13:29, Erik Bray wrote:
> I think this can be worked
> around in Sage by not explicitly closing the master pty until the
> process has exited.

I think this is doing things in the wrong order. Closing a pty is a way
to signal to a process that it should exit.

This isn't Sage-specific, upstream pexpect/ptyprocess also quits
processes this way.

If terminate_async() is called, it should kill the process anyway. Do
you know why this isn't working?

Erik Bray

unread,
Feb 9, 2017, 8:37:36 AM2/9/17
to sage-devel
On Thu, Feb 9, 2017 at 1:41 PM, Jeroen Demeyer <jdem...@cage.ugent.be> wrote:
> On 2017-02-09 13:29, Erik Bray wrote:
>>
>> I think this can be worked
>> around in Sage by not explicitly closing the master pty until the
>> process has exited.
>
>
> I think this is doing things in the wrong order. Closing a pty is a way to
> signal to a process that it should exit.

I think you're right. It's just surprising how much buggy behavior
this led to in this case.

> This isn't Sage-specific, upstream pexpect/ptyprocess also quits processes
> this way.
>
> If terminate_async() is called, it should kill the process anyway. Do you
> know why this isn't working?

I think it is working--I should have been more explicit though: Cygwin
takes a long time to fork() a process, as it does in
terminate_async(). In the meantime a *lot* of time spent in that
select() loop (it calls select with 0 timeout) and it's a very busy
loop eating up significant CPU time. By one test I did, I count 29290
select() calls before it sees the SIGTERM.

This isn't such a problem in a normal context because it still exits
after a few seconds. But when running the test suite it's starting up
multiple maxima instances in rapid succession, all of which get stuck
in this loop simultaneously, slowing each other down, and also slowing
down creation of the processes that start them.

Another possible solution, just for Cygwin (for now, until I can fix
ECL) would be to not use terminate_async on Cygwin, and instead
terminate synchronously. This still might not help if running the
tests in parallel though?
Reply all
Reply to author
Forward
0 new messages