I just read this email after I finished typing it. Turned into a small
book. Oh well. Here it is:
Ok, I've been trying to track down this problem all day. We (the people at
this company and I) have been experiencing a lot of bad UI response in all
of our applications in certain situations. I wrote a test program and
figured out that a call to XCopyArea from a fairly complex part of our
code was directly or indirectly causing this problem. Basically, the test
program does this:
- Set up a form with a canvas, a button that does nothing, an input box
(to make sure key events are lagging, too), and a quit button.
- Enter into a very simple loop, like this:
while (fl_check_forms() != test->quit) {
FunctionThatCallsXCopyArea();
};
We know it's the XCopyArea call because when we comment out just that
call, or replace it with something like XFillRectangle, everything is
smooth. The UI sluggishness we are experiencing is on the order of about
3/4 of a second, but sometimes it can be up to 7 or 8 seconds (note: the
program and x server are both running on the local machine).
I've traced the problem down to somewhere in get_next_event() in forms.c.
Here are the steps I took to narrow it down to that (without getting too
descriptive, it took me all day):
- Figured out how much time was spent in fl_check_forms, and how much time
was spent in the image calls (the parts that end up calling XCopyArea).
- I wanted to know if the problem was either 1) XCopyArea taking a long
time and lagging things or 2) XCopyArea creating some conflict somewhere
causing events to be dropped or delayed. I figured out the average time
that one call to our imaging functions took. It was 1200 microseconds.
- I replaced the image calls with a loop that delayed the program for 1200
usecs, and calculated the function timings again.
- With the delay instead of the image calls, the loop was running much
slower (surprise!). The reason for this is that the fl_check_forms() call
was taking almost 60x longer to execute than it was with the actual
imaging calls instead. What this suggested is that when XCopyArea was
being called, fl_check_forms() was returning a lot sooner than it should,
apparently before the events were processed or without processing the
events at all.
- More timing and experimenting narrowed it down to the
fl_treat_interaction_events() call, then to the fl_do_interaction_step()
call (rather than form_event_queued()).
- In do_interaction_step(), it turned out that when we used the 1200usec
delay, the ratio of null to non-null returns from get_next_event() was
about 68:1 with no activity, and 1:1 with lots of activity. By contrast,
when we used XCopyArea (so the UI was real laggy), the ratio was 2277:1
with no activity, and 68:1 with lots of activity (mouse movements and
such). Although these ratios varied a lot, the trend was the same, and
suggested that get_next_event() was returning 0 a lot more than it should
when we used XCopyArea (which would definitely cause the UI to be very
unresponsive).
Here is what I understand about get_next_event() (directly copied from the
notes I was typing at the time):
- ok here, get_next_event kind of works like this (it's real hard to look
at):
+ check for an event, if none: flush buffer, read new ones
+ if there's an event, process it like so:
+ if it's MappingNotify, handle_global_event() then return 0
+ if it's not meant for a form, it does some weird stuff
(fl_compress_event)
that i can't figure out, then ungets the event.
+ if it was meant for a form then it got one so return !0.
+ now check again, if none: flush buffer, read new ones
+ if there's an event, then don't do anything, just return 0.
+ if there's still no event, call fl_watch_io. this also happens regardless
of whether there's an event or not every 11 events, to keep non-x things
flowing smoothly. fl_watch_io does a bunch of socket crap.
There are a lot of odd things about the get_next_event() call. Lots of
sloppy coding and some minor hacks here and there. Here are the few things
that stood out to me (they may or may not suggest the real cause of this
problem, I dunno):
By the way, "point 3" refers to the 3rd "return" statement in
get_next_event(), the one here:
if ((dox % 11) && XEventsQueued(flx->display, QueuedAfterFlush)) {
return has_event;
};
This is significant because, out of the 4 return statements in the
function, when the program was run for about 8 seconds with lots of
activity, it went like this (the first number is the 1200usec delay one,
the second number is with the real image calls):
Returned from 1st one: 0 no imaging, 0 with imaging
Returned from 2nd one: 1012 no imaging, 288 with imaging
Returned from 3rd one: 5 no imaging, 5810 with imaging
Returned from 4th one: 1905 no imaging, 618 with imaging
See how many more times it returned from "point 3" when we used the delay
instead of the XCopyArea call? Anyway, notes ("point 2" is the return in
the "if (has_event || do_x_only)" conditional"):
- note that get_next_event() returned from point 3 lots of times when the iv
stuff was there, but not many times when the delay was there.
- point 3 is around forms.c:2626. odd things about that code:
- when it gets to point 3 it returns has_event, but to get to point 3
has_event must be 0 (otherwise it would have returned at point 2).
this seems like sloppy coding.
- has_event will always be 0 for point 1, which handles global events,
of which the only one is MappingNotify, because has_event is init'd
to 0 and never gets set before that point. also seems sloppy, why
not just return 0?
- the only time has_event gets modified is just before point 1, as a
result of calling select_form_event(). the does not happen every 11th
event (i'm sure there's a reason for this), and does not happen if
there are no xevents queued.
- do_x_only is a global variable used to skip the fl_watch_io() call.
do_x_only is only used in a small bit of code in close_form_win().
incidently, it's not even used here because it's in the #else of
a "#if 1". looks like a hack that got #if'd away by more sloppiness.
- the x reference index says XEventsQueued is on page 468, but it's
actually on 249 (468 is XPeekEvent). that's annoying.
Another very odd thing that I noticed is that by the time get_next_event()
has reached point 3, it has called XEventsQueued twice with
QueuedAfterFlush (which flushes the event buffer and checks for new events
if there's no events in the buffer when it is called). Now, since
get_next_event() will normally be getting called over and over again with
not too much lower level xevent handling stuff in between, I know this
isn't really a problem. It still seemed strange to me, though. Earlier on
(it's worth mentioning now), I changed my test loop in my program to this
(and used the actual XCopyArea stuff):
while (fl_check_forms() != test->quit) {
if (fl_check_forms() == test->quit)
break;
FunctionThatCallsXCopyArea();
};
Calling fl_check_forms() twice completely solved the lag problem. I think
that it is weird that calling fl_check_forms() twice instead of once
eliminates UI lags that would normally be over 3 or 4 seconds long and
makes the application completely responsive. Calling fl_check_forms()
twice in many of our loops, however, is an unacceptable hack. Because of
the market we develop our applications for, stableness and cleanness are
crucial, so it is very important that I found the real cause of this
problem.
I removed the second fl_check_forms() call and continued playing around.
As an experiment, I modified get_next_event() so that it behaved like
this:
- If it was about to return from point 3, jump back up to the top of the
function and (without incrementing dox or calling XEventsQueued a third
time) try to deal with the event again. If, during this second time
around, it got to point 3 again, just return. Don't try it again.
This also completely solved the problem, suggesting something... I have no
idea what, though. I'm primarily a Windows and console-only developer and
I am not yet very familiar with X or the internal workings of the forms
library. For some reason, it appeared as though every other call to
get_next_event was missing events, perhaps because of some weird logic in
get_next_event(). But, more experimenting suggests otherwise.
Moving on, I read this comment in get_next_events():
/* if incoming XEvent has already being pumped from the socket,
watch_io() will time out, causing a bad delay in handling xevent.
Make sure there is no event in the X event queue before we go into
watch_io() */
This comment is just before the "point 3" return statement. It is another
oddity of get_next_events(), as the actual code is:
if ((dox % 11) && XEventsQueued(flx->display, QueuedAfterFlush)) {
return has_event;
};
Which means that every 11th call to get_next_events() should proceed to
the fl_watch_io() stuff regardless of whether or not there is an event in
the queue (this is not what the comment indicates that it should do).
This supports the findings I had above when I made get_next_event() try
processing the event twice, I think. Here is my logic: If, every 11th
event, there is an event in the queue, fl_watch_io() will still happen
"causing a bad delay in handling xevents". The comment indicates that
there should be no XEvent in the queue when fl_watch_io() is called,
therefore if we are on an 11th event, and there is an XEvent in the queue,
WE SHOULD PROCESS IT BEFORE PROCEEDING TO fl_watch_io(), rather than
leaving it in there. That seems to make sense, and I have already seen
that by processing this event I fix the problem.
Here is a very small portion of output from a log file that is filled with
some information during get_next_event() calls. The first number on each
line is the number of events in the queue reported by XEventsQueued at the
beginning of the function, before the event processing might occur. The
second number is the number of queued events just before point 3, where
XEventsQueued is called to determine if the function should return before
calling fl_watch_io(). The string "GOT ONE!!!" is printed if has_event is
non-zero and the function returns before point 3. The string "TO WATCH
IO!" is printed just before fl_watch_io() is called.
293 292
292 291
291 291 TO WATCH IO!
291 GOT ONE!!!
325 324
325 324
324 323
323 322
322 321
321 GOT ONE!!!
324 323
324 323
323 322
322 322 TO WATCH IO!
Keeping in mind that file comment from above, notice how once every 11
calls to the function, fl_watch_io() is called with about 300 events in
the queue! This is during a period of lots of mouse and keyboard activity.
Here is the an excerpt from the log with my "event reprocessing" hack in
place (which eliminates the lag completely):
1 0 TO WATCH IO!
1 1 TO WATCH IO!
1 0 REHANDLING GOT ONE!!!
2 1 REHANDLING 0 TO WATCH IO!
1 0 TO WATCH IO!
1 0 TO WATCH IO!
1 0 TO WATCH IO!
1 0 TO WATCH IO!
1 0 TO WATCH IO!
1 GOT ONE!!!
1 0 TO WATCH IO!
1 0 TO WATCH IO!
1 1 TO WATCH IO!
1 0 TO WATCH IO!
1 GOT ONE!!!
2 1 REHANDLING 0 TO WATCH IO!
2 1 REHANDLING GOT ONE!!!
The string "REHANDLING" is printed when get_next_event() would normally
return without processing the event from point 3, but instead went back to
process the event that is still in the queue. The number after that string
but just before "TO WATCH IO!" is the number of events in the queue when
fl_watch_io() is about to be called. This part of the log represents the
peak of the queues capacity. 1 or 2 events, as it should be.
So, some weird logic in get_next_event() is leading to a backlog of
hundreds of unprocessed events in the queue. Here is why I believe this
backlog is happening (talking about get_next_event() without my hacks and
modifications):
- When more than one event is in the queue when get_next_event() is
called, it only processes one. It leaves the rest of the events in the
queue to be handled the next time get_next_event() is called. When only
one event is in the queue, it processes it and leaves none in the queue.
This is all good and well.
- Every 11th event, if you look at get_next_event(), you see that event
processing is completely disabled. So every 11th event, one extra event
remains unprocessed in the queue. These eventually add up and create a lag
problem. I think they peak around 320 because perhaps that is when the
queue is full.
Imagine it like this:
Say you have a bunch of candy. At regular intervals you add a single piece
of candy to an initially empty bowl. Each time you do that, a child walks
by and takes one piece of candy. So, you never really have more than one
piece of candy in the bowl.
Situation 1: You are doing this, but every once in a while you put 2
pieces of candy in the bowl instead of one. Each child, however, is only
taking one piece of candy. So the extra pieces of candy slowly add up, and
eventually you have a bowl full of candy (because you're putting it in
faster than the children are taking it out -- for every N>=10 pieces you
put in, only 10 get removed).
Situation 2: You are doing this, but every 11th child does not take a
piece of candy. These pieces of candy add up, also leaving you with a bowl
full of candy (once again, because for every 11 pieces you put in, only 10
get removed).
This situation: Not only are you putting in 2 pieces of candy sometimes,
but every 11th child isn't taking a piece. So the bowl gets filled even
quicker.
It's almost like get_next_event() is experiencing two logic errors (I
guess those are called 'bugs', heh) at once, both resulting in a backlog
of xevents in the queue.
Here are some other things I found that solve the problem, in addition to
the things I mentioned above:
- As I stated, calling fl_check_forms() twice solves the problem (perhaps
this is because they events are getting handled faster relative to how
fast you, the human, are causing them to be generated; since this also
results in every 11th call happening twice as much).
- Also as I stated, processing the "leftover" event in get_next_event()
fixes the problem, too. This may or may not be a real fix, since if you
started putting 3 or more events in the queue, the backlog would happen
again.
- Commenting out the fl_watch_io() stuff, or just forcing a return from
"point 3" fixes this. But it also causes fl_watch_io() to not be called,
which is bad, I think.
- Running a loop with the imaging calls in it on a separate thread than
the fl_check_forms() calls are on (using proper X threading techniques
[i.e. XInitThreads, XLockDisplay, XUnlockDisplay], of course). I don't
know why this works, maybe it has something to do with different event
queues for different threads.
It is also beyond me as to why the XCopyArea call alone indirectly causes
this problem. I have some theories but I'd like to test them out first.
XFillRectangle does not do this, nor do any of the other drawing calls I
tested.
Point is, the lag is definitely being causes by a backlog of events due to
a problem in the way get_next_events() removes events from the queue. But
I don't think that that is the root cause of the problem, because the
question remains: Why are there so many more events being generated when
XCopyArea is called rather than when we are not calling any X routines,
and simply delaying for the same amount of time that the X calls would
normally take?
I have a theory that involves select_form_event and events sent to canvas
windows. But I'm not sure if this makes any sense, and I really need to
look into it a little further, so don't attack that just yet.
Any thoughts or anything would be appreciated...
Jason Cipriani
_________________________________________________
To unsubscribe, send the message "unsubscribe" to
xforms-...@bob.usuhs.mil or see
http://bob.usuhs.mil/mailserv/xforms.html
XForms Home Page: http://world.std.com/~xforms
List Archive: http://bob.usuhs.mil/mailserv/list-archives/
Development: http://savannah.nongnu.org/files/?group=xforms
One very important thing I forgot to mention was that at the beginning,
when we replaced the imaging stuff with the 1200usec delay, there was no
longer any lag and the UI was as responsive as it should have been.
Jason
> I suspect that you will find detailed answers hard to come by, for the
> simple reason that the author of the code (TC) and its maintainer for
> the last donkey's years (SPL) are the ones with the real, detailed
> knowledge about this end of the library. Your delvings have probably
> made you "the" expert.
This is very unfortunate because in reality, I have absolutely no idea
what I'm talking about.
One last thing. Just as an "fyi" for people who are interested, here is
the email I sent to my boss last night which contains a half-coherent
summary of the stuff I found (btw, pardon the "some genius" comment, I was
pretty annoyed at that point):
after 9 hours of digging through forms library code and all sorts of crap,
i found the cause of the problem, why it's happening, and why only
XCopyArea is doing it (XCopyPlane would do it, too). turns out it's a
combination of a few bugs in the forms library and some unfortunate
default settings in the [removed].
if you look under GraphicsExpose and NoExpose events in the xlib reference
manual, it states:
"If graphics_exposures is True in the GC used for the copy, either one
NoExpose event or one or more GraphicsExpose events will be generated for
every XCopyArea or XCopyPlane call made."
so what is happening is, every time you call fl_check_forms, you follow it
immediately by a call to [removed], which calls XCopyArea. this
causes a NoExpose (GraphicsExpose events weren't the problem, they weren't
getting sent) event to be sent to the canvas. via a few other functions,
fl_check_forms eventually calls the function get_next_event(). some bad
logic in the function eventually leads to the event queue getting too full
and lagging the ui (it peaks around 320 events when it should only be
peaking around 2 or 3... and it only stops at 320 because it can't hold
any more). here's what get_next_event() does:
- if there's at least one event in the queue, grab the first event.
- if this event is not destined for a form window (if it's going to say, a
canvas window), do some minor tweaking of the event then put it back on
the queue so that it can be handled by the next call to get_next_event().
this is all good and well. now, get_next_event() also calls a function
fl_watch_io(). this function is called every 11th call to get_next_event()
just so it doesn't eat up too much cpu time. fl_watch_io() does a bunch of
socket stuff that i'm not too clear on. there is a comment in
get_next_event() that says fl_watch_io() shouldn't be called with xevents
in the queue because it will delay processing of the events. but this is
ok because the queue should be empty or only contain 1 or 2 events when
fl_watch_io() is called (and 1 or 2 events doesn't lag it that much).
HOWEVER, some genius decided that on every 11th call to get_next_event(),
when fl_watch_io() is called, event processing should be completely
skipped! this means that for every 11 calls to get_next_event(), only 10
events are removed from the queue. now recall that since you are calling
one XCopyArea() per fl_check_forms(), you are adding 1 event to the queue
each time. so for every 11 events you add to the queue, only 10 are
removed. this builds up quickly and, not only does it overflow the queue,
but it leads to fl_watch_io() being called with over 300 events in the
queue, which *really* slows things down. by the way, an interesting thing
to note is that fl_check_forms() will ALWAYS return NULL every 11 times
you call it.
when you don't call XCopyArea, the only time events get added to the queue
are in response to mouse and keyboard and expose events and such. in this
case it's ok that 10 out of 11 calls to get_next_event() actually remove
an event, because there's so few events in the queue that they all get
processed very quickly.
this is why calling fl_check_forms() twice fixed the problem. because
every time you called XCopyArea and a NoExpose event happened, you removed
at least 1 event from the queue, so it was always cool. calling
fl_check_forms() twice doesn't work if you call XCopyArea() twice.
i have many possible solutions to this but i narrowed them down to two
simple ones. i have tried them both and they both work perfectly
(in the test program, anyway) with no side effects:
1) make get_next_event() *not* skip the event processing every 11 events.
there is no reason for it to do so. but still make it call fl_watch_io()
every 11 events. this way, all the events get processed and fl_watch_io()
still gets called. about 5% of the time, fl_watch_io() is called with 1 or
2 events in the queue, but this is ok and fl_watch_io() doesn't noticeably
hang.
2) make the [removed] GC have it's graphics_exposures set to False so
that NoExpose events aren't generated. do this by modifying the part where
_gc gets initialized like so:
XGCValues values;
values.graphics_exposures = False;
_gc = XCreateGC(_display, _window, GCGraphicsExposures, &values);
instead of:
_gc = XCreateGC(_display, _window, 0, NULL);
both ways are good for different reasons, and i'd actually recommend doing
them both. we have the xforms 1.0 source and we can modify it however we
want, so we can fix it there and stop using 0.89. if we don't do way 1,
the possibility for this problem to occur is still there -- we've only
fixed one of the things that lead up to the queue overflow occuring. i
don't like that.
another thing is, the bug is kind of "unfixable" in a way... but it's
weird because the only way to "fix" it would be to hack things in and
start doing risky things like ignoring events in weird places and such.
the reason is: fl_check_forms(), for the most part, only processes one
event each time it is called. so if you are explicitly generating more
than one event per fl_check_forms() call, you're kind of screwed. there's
no way around this except to use fl_do_forms() instead of
fl_check_forms(), which basically processes all the events in the queue
before returning.
so, here's my suggestions:
1) make sure graphics_exposures is false in the [removed], and
2) fix get_next_event() so it never skips an event, and
3) use fl_do_forms() whenever possible, which i guess is tough for the way
[removed] and the [removed] work.
jason
On Wednesday 23 April 2003 6:18 am, j...@casurgica.com wrote:
> So, my conclusions on this are that I suggest one of two possible fixes to
> this problem: Solution 7 (very easy, no side effects) and solution 3
> (haven't tried it). Does anybody have any thoughts as to what would be the
> most appropriate?
I suspect that you will find detailed answers hard to come by, for the simple
reason that the author of the code (TC) and its maintainer for the last
donkey's years (SPL) are the ones with the real, detailed knowledge about
this end of the library. Your delvings have probably made you "the" expert.
I would suggest trying out both clean solutions for a while to see if there
are any unexpected side effects and then send in a patch. I also think that
detailed commentary of the code would be a good thing in this case; it does
seem quite subtle.
> Are NoExpose and GraphicsExpose events ever really needed by
> the forms library?
I am very ambivalent about dictating what is and is not important. The library
is meant to be extensible, meaning weird and wonderful widgets may well
exist.
Best regards,
Angus
Ok so I know exactly what the problem is:
I made it so that select_form_event writes the Window of the canvas on my
test form, along with the Window that the XEvent is for and the event type
itself to the log file in addition to all the other stuff. Here is a small
excerpt:
[win=0x200003c canvas=0x200003c 0 NoExpose] 122 [win=0x200003c
canvas=0x200003c 0 NoExpose] 121
[win=0x200003c canvas=0x200003c 0 NoExpose] 121 121 TO WATCH
IO!
[win=0x200003c canvas=0x200003c 0 NoExpose] 121 [win=0x200003c
canvas=0x200003c 0 NoExpose] 120
[win=0x200003c canvas=0x200003c 0 NoExpose] 120 [win=0x200003c
canvas=0x200003c 0 NoExpose] 119
[win=0x200003c canvas=0x200003c 0 NoExpose] 119 [win=0x200003c
canvas=0x200003c 0 NoExpose] 118
[win=0x200003c canvas=0x200003c 0 NoExpose] 118 [win=0x200003c
canvas=0x200003c 0 NoExpose] 117
[win=0x200003c canvas=0x200003c 0 NoExpose] 117 [win=0x200003c
canvas=0x200003c 0 NoExpose] 116
[win=0x200003c canvas=0x200003c 0 NoExpose] 217 [win=0x200003c
canvas=0x200003c 0 NoExpose] 216
[win=0x200003c canvas=0x200003c 0 NoExpose] 216 [win=0x200003c
canvas=0x200003c 0 NoExpose] 215
[win=0x200003c canvas=0x200003c 0 NoExpose] 215 [win=0x200003c
canvas=0x200003c 0 NoExpose] 214
Yeah, it's sucks and it's hard to read. But, as you can see, when
select_form_event() processes an event that is intended for the canvas, it
returns 0 which causes that event to be put back in the queue by
get_next_event(). This documentation taken from the Xlib reference manual
tells why the is happening with XCopyArea only (apparently it would also
happen with XCopyPlane):
"If graphics_exposures is True in the GC used for the copy, either one
NoExpose event or one or more GraphicsExpose events will be generated for
every XCopyArea or XCopyPlane call made."
One interesting thing I noticed, which you can see in the above log
snippet, is every time the event queue increases in size, it increases by
exactly 100 events and this increase happens every 100 events. The only
exception is just after the test program starts, when the event queue is
rising to it's peak of around 300 events (the increases happen at the same
frequency, but the increase is by 120-130 events). This probably has
something to do with the way events are queued or read when they start
coming in real fast. Aside from the relatively few expected form events,
they are all NoExpose events directed to the canvas.
The initial rise of the number of events in the event queue supports my
point about an event not being handled every 11th call to
get_next_event(). It takes too long for me to paste the log file in, but
what happens is it starts out around 1, then increases slowly, one event
at a time (as the backlog builds up because every 11th call to
get_next_event() ignores an event. Eventually the increases start
happening in jumps, but like I said, this is probably a side effect of the
delay, or some other weird thing that doesn't really matter. Here is a
part of the log that shows the initial growth of the event queue:
[win=0x200003c canvas=0x200003c 0 NoExpose] 1 1 TO WATCH
IO!
[win=0x200003c canvas=0x200003c 0 NoExpose] 1 [win=0x200003c
canvas=0x200003c 0 NoExpose] TO WATCH IO!
[win=0x200003c canvas=0x200003c 0 NoExpose] 2 [win=0x200003c
canvas=0x200003c 0 NoExpose] 1
[win=0x200003c canvas=0x200003c 0 NoExpose] 2 [win=0x200003c
canvas=0x200003c 0 NoExpose] 1
[win=0x200003c canvas=0x200003c 0 NoExpose] 1 [win=0x200003c
canvas=0x200003c 0 NoExpose] TO WATCH IO!
[win=0x200003c canvas=0x200003c 0 NoExpose] 2 [win=0x200003c
canvas=0x200003c 0 NoExpose] 1
[win=0x200003c canvas=0x200003c 0 NoExpose] 2 [win=0x200003c
canvas=0x200003c 0 NoExpose] 1
[win=0x200003c canvas=0x200003c 0 NoExpose] 1 [win=0x200003c
canvas=0x200003c 0 NoExpose] TO WATCH IO!
[win=0x200003c canvas=0x200003c 0 NoExpose] 2 [win=0x200003c
canvas=0x200003c 0 NoExpose] 1
[win=0x200003c canvas=0x200003c 0 NoExpose] 2 [win=0x200003c
canvas=0x200003c 0 NoExpose] 1
[win=0x200003c canvas=0x200003c 0 NoExpose] 1 [win=0x200003c
canvas=0x200003c 0 NoExpose] TO WATCH IO!
[win=0x200003c canvas=0x200003c 0 NoExpose] 2 2 TO WATCH
IO!
[win=0x200003c canvas=0x200003c 0 NoExpose] 2 [win=0x200003c
canvas=0x200003c 0 NoExpose] 1
[win=0x200003c canvas=0x200003c 0 NoExpose] 3 [win=0x200003c
canvas=0x200003c 0 NoExpose] 2
[win=0x200003c canvas=0x200003c 0 NoExpose] 2 [win=0x200003c
canvas=0x200003c 0 NoExpose] 1
[win=0x200003c canvas=0x200003c 0 NoExpose] 3 [win=0x200003c
canvas=0x200003c 0 NoExpose] 2
It is very hard to read, I know, because of the line breaks. But you can
see how it starts to build up.
So I've come to the conclusion that what is happening is that events are
not removed from the queue fast enough by get_next_event(), because an
event gets ignored every once in a while. Additionally, get_next_event()
incorrectly proceeds to fl_watch_io() with events still in the queue,
which causes fl_watch_io() to hang as that source comment indiciates it
should.
I can think of a number of solutions but I'm not sure which one is correct
with the least bad side effects:
1) Make select_form_event() ignore NoExpose events. Bad because: This is a
hack, it's not what select_form_event() is supposed to be doing.
2) Make get_next_event() ignore NoExpose events right after the call to
XNextEvent, and keep calling XNextEvent until either no events are left or
a different event type is received. Good because: This is an appropriate
place to do that.
3) Don't set graphics_exposures to True in the GC, so that NoExpose events
are never generated. Good because: Cuts down on the overhead of receiving
and ignoring NoExpose events, and is as low level as possible a way of
stopping these events.
4) Make get_next_event() keep taking in events until there aren't any
before the fl_watch_io() call. Bad because: Some events won't be handled,
since the dispatching is done in do_interaction_step(), not in
get_next_event(). This looping could go on for a long time if lots of
events come in.
5) Remove the fl_watch_io() stuff from get_next_event(), and put it in
do_interaction_step(). Then, make do_interaction_step() loop and handle
xevents until their are none in the queue, and only call fl_watch_io()
after that is done. Good because: All events get handled. Bad because:
This starts to defeat the purpose of using fl_check_forms() instead of
fl_do_forms().
6) Move the fl_watch_io() stuff into do_interaction_step() like in #5, but
instead of looping, make it do this: Every 11th call, skip event
processing and call fl_watch_io(). On the *next* call, process and handle
2 events instead of 1. Good because: events get handled, no looping. Bad
because: If two different objects end up getting the FL_* event, which one
should fl_check_forms() return?
7) Make it so that fl_watch_io() still gets called every 11th call, *but*
an event still gets received (never skip event processing). After the
event is received, it will be removed from the queue. If there's still
some events left in the queue by the time fl_watch_io() is called, then
too bad... 1 or 2 events are better than 300 of them.
Most of these are based on the observation that do_interaction_step() does
nothing with NoExpose and GraphicsExpose events; they end up getting
passed along to the form as FL_OTHER. But I haven't looked at the canvas
code close enough to see if it does anything with these.
I like solution 3, but then the problem still remains with the every 11th
event being skipped thing. Solution 7 seems OK to me, too, because it is
very, very simple.
I tried solution 7, it was implemented very easily. It worked perfectly
and resulted in fl_watch_io() being called with a single event in the
queue about only 5% of the times it was called, even during periods of
lots of activity. No more than 3 messages were in the queue at any given
time.
So, my conclusions on this are that I suggest one of two possible fixes to
this problem: Solution 7 (very easy, no side effects) and solution 3
(haven't tried it). Does anybody have any thoughts as to what would be the
most appropriate? Are NoExpose and GraphicsExpose events ever really
needed by the forms library?
Jason
Hi Jason,
Your analysis is right on the money. As to the solutions, my view is that the
cleanest way is to remove the GraphicsExposure from the GC, which you have
control and presumbly know if you want it or not in *your* application.
Except for those automatic events, it's not easy for humans to fill up the
queue that processing ten events in the row would still accumulate large
number of events.
Giving X events too high a priority by emptying the X event queue before
calling fl_watch_io() can create problems for people who use IO callbacks (to
monitor or drive external devices) and maybe some other timing issues if many
X events are generated by something like NoExposure. No matter how we look at
this, we're only simulating multitasking/multithreading inperfectly, and will
run into difficulties under some situations that can only be solved by some
type of collaborative effort from applications.
One thing that may work is to adjust the queue priority dynamically. Say, if
there are no IO callbacks (need to check if timer, signal facility connects
to this or not, can't quite remember off the top of my head), we can process
X events N (say 25) times before calling fl_watch_io() or even process X
events until there are none left before calling fl_watch_io(). I think if we
were to change how the events are processed, this is probably the only way
that might work with minimal chance of breaking other applications.
-TC
__________________________________________________
Do you Yahoo!?
The New Yahoo! Search - Faster. Easier. Bingo
http://search.yahoo.com