[PATCH] fix races in sman

9 views
Skip to first unread message

Qian Yun

unread,
May 6, 2024, 7:41:04 AM5/6/24
to fricas-devel
I have not met the races in graphics, but Ralf, Waldek and
I have all met the races in sman:

Namely, when building the book, sometimes there's a few lines
missing at the beginning or at the end of the tex file.
The tex file comes from spool file, which is the piped output
of "fricas" script.

This is quiet easily to reproduce if you build the book with
more jobs in parallel than the number of your threads of CPU.

First, the missing lines at the end.

sman kills all of its child processes if the FRICASsys quits.

For FRICASsys, its output goes to a pty, captured by "sman"
and forwards to "session" via socket, then "session" forward
to "spadclient" via socket, finally "spadclient" prints to
stdio.

So when FRICASsys outputs everything and quits, sometimes
sman kill everything while there's still a few final lines
in socket buffer, not printed out yet.

I solve this by adding sleep 100ms before "kill_all_children".


Second, for the missing liens at the beginning.

At startup, "spadclient" connects to "session", then "session"
connects to "FRICASsys" and get a prompt ("(1) -> ") back.

So after "connect_to_local_server", there's quite a few things
going on in the background. So it is necessary to sleep a while
before forwarding stdio (in this case, ")read xxx" from pipe)
to "FRICASsys". Otherwise sometimes the output order is messed up.


Ralf, please test if this fixes the build problem on your side.

- Qian

diff --git a/src/sman/sman.c b/src/sman/sman.c
index d5114624..05208cd3 100644
--- a/src/sman/sman.c
+++ b/src/sman/sman.c
@@ -773,9 +773,10 @@ monitor_children(void)
}
switch(proc->death_action) {
case Die:
+ fricas_sleep(100);
kill_all_children();
clean_up_sockets();
- fricas_sleep(200);
+ fricas_sleep(100);
exit(0);
case NadaDelShitsky:
break;
diff --git a/src/sman/spadclient.c b/src/sman/spadclient.c
index b0121181..0e3f5bb9 100644
--- a/src/sman/spadclient.c
+++ b/src/sman/spadclient.c
@@ -56,6 +56,7 @@ main(void)
{
sock = connect_to_local_server(SessionServer, InterpWindow, Forever);
bsdSignal(SIGINT, inter_handler,RestartSystemCalls);
+ fricas_sleep(60);
remote_stdio(sock);
return(0);
}

Ralf Hemmecke

unread,
May 6, 2024, 8:09:05 AM5/6/24
to fricas...@googlegroups.com
Qian,

I can only test later, but your analysis sounds reasonable.
And since thise milliseconds would be added for the book only, your
patch would only affect the building of the book. I think that would be
acceptable.

I come back after testing.

Thanks
Ralf

Waldek Hebisch

unread,
May 6, 2024, 12:06:42 PM5/6/24
to fricas...@googlegroups.com
Looks OK. Just one remark: this is workaround, not a fix. Proper
fix would involve change to protocol, in particular all I/O would
need an acknowlegment and 'FRICASsys' would exit only after
receiving acknowlegment about its output. Of course proper fix
needs much more work than your patch, and we can live with
workaround possibly for long time.

--
Waldek Hebisch

Ralf Hemmecke

unread,
May 6, 2024, 1:05:26 PM5/6/24
to fricas...@googlegroups.com
This time the following error shows. And indeed 23DColB.ps is nowhere to
be found.

Ralf
====================
LaTeX Warning: File `23DColB.ps' not found on input line 612.


tmp/ug07.tex:612: LaTeX Error: File `23DColB.ps' not found.

See the LaTeX manual or LaTeX Companion for explanation.
Type H <return> for immediate help.
...

l.612 \epsffile[0 0 295 295]{23DColB.ps}

Output written on book.dvi (238 pages, 1767652 bytes).
Transcript written on book.log.
==> Detected problem while running LaTeX.
==> LaTeX must be installed.
==> The following LaTeX packages must be available.
==> amsmath, breqn, tensor, mleftright, graphicx, verbatim,
==> hyperref, color, listings, makeidx, xparse, tikz
make: *** [Makefile:895: book.dvi] Error 1
make: *** Waiting for unfinished jobs....

Qian Yun

unread,
May 6, 2024, 7:54:54 PM5/6/24
to fricas...@googlegroups.com
Does this error persist if you redo the build by
cd src/doc; make clean; make book.pdf ?

If it persists, can you examine tmp/ug07.spool to see if there's
error related with the generation of this image.

- Qian
Reply all
Reply to author
Forward
0 new messages