On Sun, May 05, 2024 at 05:39:21PM +0200, Ralf Hemmecke wrote:
> On 5/5/24 16:16, Waldek Hebisch wrote:
> > Testing recent patch by Qian I got 'view3D' which showed empty
> > window and used 100% of a core. A little investigation using 'gdb'
> > shows that this 'view3D' was trying to exit. This was triggered from
> > a signal handler: 'view3D' received a signal (SIGTERM I think) and
> > 'goodbye' in 'stuff3d.c' was the hander. However, signal arrived during
> > X11 call, namely 'view3D' was calling 'XGetWindowAttributes' from
> > 'libX11'. IIUC this is incorrect: one is not allowed to perform
> > X11 call when another call is in progress.
>
> So, IIUC what you are saying then X11 is to blame.
Well, X-related code.
> With Qian's patch one file, for example, ug11.input is processed under
> xvfb. This is basically one X11 session running. If we could split the
> commands to several xvfb+fricas calls, then they cannot possibly
> interact with each other since these would be different X11 sessions.
> Is this a correct picture?
>
> Well, I am not saying, that it is actually possible to split the \psXtc
> stuff into separate parts, since some might depend on the execution of
> previous \psXtc environments. But I guess such a splitting would also
> avoid races. Am I wrong?
Yes, you are wrong. Maybe it will help if I explain a bit design of
X11. First, there is X server. X server is doing real graphic work
(that is drawing images on the screen) and also some administrative
work. Programs connect to X server using sockets and _each_ client
program has its own connection. AFAICS X server is working fine and
trouble is within client program.
Theortically client program could directly talk to X server. However,
usually clients use Xlib. Client calls Xlib functions and Xlib is
responsible for actual communication with X server. There are higher
level libraries, but our program currently use directly call Xlib.
The "100% cpu use" problem is due to signal handling. Client, like
'view3D' receives a signal and tries to shut down connection to
X server. However, signal handling is tricky, when signal arrives
program may be in the middle of modifying critical data structures.
In particular most Unix/Linux libraries inculding Xlib does not
allow calls from signal handlers. There is a handful of Xlib calls
that are unconditionally safe and _most_ of the time calling Xlib
from signal handler works as naively expected. But sometimes
calls from signal handler cause trouble. If you want to find
guilty party, that is easy: our code in 'view3D' is doing forbidden
thing which causes trouble.
Concerning possible fixes: Xt has special support for signal handlers,
it remembers that signal arrived and calls handler when it is safe.
We could implement similar thing ourself: set a flag inside handler
and regularly test if flag is set.
In a sense core trouble is that we have parallel channels of
communication. Data mostly goes via sockets, but there are also
signals, ordinary files and Unix standard input/output.
I know clear reasons just for one problem ("100% cpu use"). Sometimes
graphic processes just hang waiting for something, probably in this
case signal to shut down was lost (but this is just a guess).
> > Recent changes just give me extra incentive to run 'make
> > book.pdf'...
>
> I am against inclusionn of Qians patch if it forces me only to hope that the
> book will be compiled correctly.
Sorry to say this: this is not a new problem, all the time you had to
hope that book will be compiled correctly. Simply probabilites were
in out favour and most of the time book worked correctly. So now could
be the first time when you see the problem.
> Waldek, what you are saying does not sound like these races come from "make
> -jN ..." with N>1, since each file is processed under a separete xvfb. These
> races should also happen with -j1, right?
Yes, races can happen with -j1. Simply changing machine load, adding
multiple jobs, changing delays affects timing and may change result of
race.
--
Waldek Hebisch