Threading unfairness

Matt Kimball

unread,

May 18, 2002, 4:54:36 PM5/18/02

to

While writing some multithreaded GUI code with ActivePython 2.2.1 under Windows
XP, on a single processor machine, and I noticed that my user-interface thread
was very unresponsive when I had a background computation thread going. This
annoyed me. It seemed like the main thread where the GUI code was running was
being starved of CPU time. Since there doesn't seem to be any way to set thread
priorities with Python, I "fixed" the problem with inserting frequent
'time.sleep(0)' calls in my compute-intensive thread, and that seemed to give my
main GUI thread more time to execute when a user-interface event occurred.

However, it got me wondering just how bad the situation was, so I wrote a simple
program to test Python threads, and find the maximum time that individual
threads go without getting any CPU time. Interestingly, the problem seems worse
with a low number of threads. In the case of only the main thread and one
thread spawned by the main thread, I see execution gaps of ~0.25 seconds. That
is, each thread is getting a full 0.25 second execution time without the other
thread executing at all! However, if I increase the number of thread spawned by
the main thread to 10, it seems much better. Nine out of ten spawned thread and
the main thread have a maximum gap time of ~0.005 seconds, which seems much more
reasonable to me, but the first thread spawned by the main thread has an
execution gap of ~0.17 seconds, every time.

Here is a typical case of one spawned thread:

Thread Gap time

main 0.241325084613
thread1 0.292677141927

Three spawned threads:

Thread Gap time

main 0.100827339788
thread1 0.163031893718
thread2 0.100502438159
thread3 0.100673130244

Five spawned threads:

Thread Gap time

main 0.00585884518841
thread1 0.174716898377
thread2 0.00717269932352
thread3 0.00577838803542
thread4 0.00579235629107
thread5 0.00580353089572

Ten spawned threads:

Thread Gap time

main 0.00570351818465
thread1 0.175031463496
thread10 0.00545823561356
thread2 0.00513724509665
thread3 0.00610692141026
thread4 0.0051269085875
thread5 0.00494085142122
thread6 0.0050587434996
thread7 0.00564568960567
thread8 0.00589795630435
thread9 0.00749285174516

The code I used to measure this is at http://matt.kimball.net/test_thread.py --
Maybe someone wants to look it over to make sure I'm not doing anything funny,
and run a few tests on other operating systems to see how they differ.

I think this behavior is probably highly operating system dependent. The python
interpreter is assuming the OS will do something reasonably fair when it gives
up the global interpreter lock, but it looks like this isn't the case, at least
under Windows XP. Still, it would be nice if something could be done to make
threading execution more fair, but I'm not sure how to do this without creating
some sort of scheduler in the python interpreter itself.

Is there anything which can be done?

Is the problem as bad under other operating systems?

Am I insane to try to do anything in the neighborhood of soft-realtime with
Python?

--
Matt Kimball
ma...@kimball.net

Ingo Blank

unread,

May 18, 2002, 6:40:42 PM5/18/02

to

"Matt Kimball" <ma...@kimball.net> schrieb im Newsbeitrag
news:mailman.1021755390...@python.org...

Although I don't understand what you want to achieve with your test, I
downloaded and run it
under Windows 2000 Pro.

Regardless how much threads it spawns, I tried 2 -10, it yields an empty
<gaps> list here.

So I inserted a statement to dump the <times> list.
Maybe that helps you ...

=====================

threads=2
[['thread1', 0.0010048763180795324], ['thread2', 0.30240240030506671],
['main',
0.39534354226584661]]

Thread Gap time

threads=4
[['thread1', 0.0010048763180795324], ['thread2', 0.30136707318946959],
['thread3
', 0.60282382258080291], ['thread4', 0.90467950535612762], ['main',
0.9970795170
8946252]]

Thread Gap time

threads=6
[['thread1', 0.0010023620320459723], ['thread2', 0.301279073178295],
['thread3',
0.60228045743243908], ['thread4', 0.90370312427976185], ['thread5',
1.204664000
592254], ['thread6', 1.5053779181432279], ['main', 1.5845150710495328]]

Thread Gap time

threads=8
[['thread1', 0.0010043175878498524], ['thread2', 0.3018732827775597],
['thread3'
, 0.60322694644151698], ['thread4', 0.9041073656009353], ['thread5',
1.205502095
9367742], ['thread6', 1.5066381341762711], ['thread7', 1.8069075818295341],
['th
read8', 2.1083386296303024], ['main', 2.2006606984965966]]

Thread Gap time

threads=10

[['thread1', 0.0010051556831943724], ['thread2', 0.30162017798351465],
['thread3
', 0.60236398760177623], ['thread4', 0.90368384808683788], ['thread5',
1.2039647
497098094], ['thread6', 1.5042448132374366], ['thread7',
1.8056222229361554], ['
thread8', 2.1058894356684998], ['thread9', 2.4071897405955225], ['thread10',
2.7
074871247602696], ['main', 2.7877696746374188]]

Thread Gap time

=========================

--ingo
voy NG arktb QBG qr [ rot13 ]

PGP PublicKey
http://keys.pgp.com:11371/pks/lookup?op=get&search=0x18B44974

Tim Peters

unread,

May 18, 2002, 9:08:55 PM5/18/02

to

[Matt Kimball]

> While writing some multithreaded GUI code with ActivePython 2.2.1
> under Windows XP, on a single processor machine, and I noticed that
> my user-interface thread was very unresponsive when I had a background
> computation thread going. This annoyed me. It seemed like the main
> thread where the GUI code was running was being starved of CPU time.

It can be very difficult to sort things like that out.

> Since there doesn't seem to be any way to set thread priorities
> with Python,

There is not. And it wouldn't help if there were <0.5 wink>.

> I "fixed" the problem with inserting frequent 'time.sleep(0)' calls
> in my compute-intensive thread, and that seemed to give my main GUI
> thread more time to execute when a user-interface event occurred.

That works on Windows because the underlying Win32 API Sleep() call takes an
argument of 0 as *meaning* "yield the timeslice". On other OSes it may or
may not work. For example, people playing this trick on Solaris usually use
time.sleep(0.01) instead, because time.sleep(0) is a nop there.

> However, it got me wondering just how bad the situation was, so I
> wrote a simple program to test Python threads, and find the maximum
> time that individual threads go without getting any CPU time.
> Interestingly, the problem seems worse with a low number of threads.

> ...

> The code I used to measure this is at
> http://matt.kimball.net/test_thread.py --
> Maybe someone wants to look it over to make sure I'm not doing
> anything funny, and run a few tests on other operating systems to see
> how they differ.

Sorry, I had too much trouble understanding the code, so wrote a different
driver that seems to me much simpler and more informative. See attached.
Here's a typical run on Win98SE, after killing most background processes
(note: that's important -- Python isn't the OS, and has no control over
when the OS decides to boot your program to run some other task):

With span 1.0 and 1 threads:
tid 0 meangap 0.01 maxgap 47.98 ms at iter 2313 of 103305

With span 1.0 and 2 threads:
tid 0 meangap 0.04 maxgap 12.41 ms at iter 22985 of 22985
tid 1 meangap 0.05 maxgap 15.06 ms at iter 18494 of 22079

With span 1.0 and 3 threads:
tid 0 meangap 0.07 maxgap 0.73 ms at iter 517 of 14530
tid 1 meangap 0.07 maxgap 1.02 ms at iter 14018 of 14101
tid 2 meangap 0.07 maxgap 1.02 ms at iter 14013 of 14107

With span 1.0 and 4 threads:
tid 0 meangap 0.09 maxgap 1.06 ms at iter 298 of 10765
tid 1 meangap 0.10 maxgap 1.04 ms at iter 10475 of 10506
tid 2 meangap 0.10 maxgap 1.05 ms at iter 10469 of 10513
tid 3 meangap 0.10 maxgap 0.94 ms at iter 10461 of 10509

With span 1.0 and 5 threads:
tid 0 meangap 0.11 maxgap 1.08 ms at iter 2550 of 8853
tid 1 meangap 0.12 maxgap 1.11 ms at iter 4399 of 8422
tid 2 meangap 0.12 maxgap 1.08 ms at iter 2074 of 8425
tid 3 meangap 0.12 maxgap 1.11 ms at iter 8385 of 8424
tid 4 meangap 0.12 maxgap 1.00 ms at iter 8379 of 8425

With span 1.0 and 6 threads:
tid 0 meangap 0.13 maxgap 1.39 ms at iter 5312 of 7646
tid 1 meangap 0.14 maxgap 1.50 ms at iter 6894 of 6970
tid 2 meangap 0.14 maxgap 1.31 ms at iter 746 of 6972
tid 3 meangap 0.14 maxgap 1.53 ms at iter 6883 of 6972
tid 4 meangap 0.14 maxgap 1.31 ms at iter 6509 of 6971
tid 5 meangap 0.14 maxgap 1.39 ms at iter 729 of 6971

With span 1.0 and 7 threads:
tid 0 meangap 0.16 maxgap 1.61 ms at iter 1396 of 6213
tid 1 meangap 0.17 maxgap 1.56 ms at iter 2792 of 5981
tid 2 meangap 0.17 maxgap 1.60 ms at iter 2786 of 5982
tid 3 meangap 0.17 maxgap 1.52 ms at iter 1136 of 5982
tid 4 meangap 0.17 maxgap 1.66 ms at iter 1130 of 5981
tid 5 meangap 0.17 maxgap 1.57 ms at iter 1126 of 5980
tid 6 meangap 0.17 maxgap 1.56 ms at iter 2765 of 5978

Note the gap times are measured in milliseconds here, not seconds. The
worst gap was less than 50ms, which isn't noticeable to most people. Note
that the iterations at which the max gaps occur tend to cluster: this is
evidence that *all* threads in the program are getting booted, due to the OS
giving some other process a chance to run.

WRT other platforms:

1. This will probably print gaps of 0. time.clock() has extraordinarily
fine resolution on Windows (finer than microsecond). On other
platforms, change the "now" import to use "time" instead of "clock".

2. Of all the OSes I've futzed with, Windows flavors are actually the
best-behaved in this sense wrt threads: Windows is much happier to
switch threads frequently than other OSes. Since typical Windows apps
are thread-happy GUI programs, this probably isn't an accident.

> I think this behavior is probably highly operating system
> dependent.

Extremely, yes.

> The python interpreter is assuming the OS will do something reasonably
> fair when it gives up the global interpreter lock,

Python isn't assuming anything here. "It's a feature" that Python thread
behavior reflects your native thread behavior; Python isn't trying to be an
operating system.

> ...

> Still, it would be nice if something could be done to make
> threading execution more fair,

As the output above shows, scheduling is very fair on Win98SE, except for
slightly favoring the first thread spawned (tid 0 always got to do more
iterations of its loop than the other tids, while the number of iterations
the other tids got to do are amazingly close to each other).

> but I'm not sure how to do this without creating some sort of
> scheduler in the python interpreter itself.

It wouldn't work -- Python can't tell the OS kernel how to schedule. You
may be able to *influence* that via piles of platform-specific extension
modules. BTW, XP may have a slider or swtich buried in one of the Control
Panel applets to influence whether foreground processes get preference in
scheduling (I can't remember, and don't have XP here now).

> ...

> Is the problem as bad under other operating systems?

Probably worse.

> Am I insane to try to do anything in the neighborhood of
> soft-realtime with Python?

Start with a soft-realtime OS. After spending a few miserable weeks on it,
you'll probably find that the apparent unfairness in your GUI app is a
shallow problem after all <0.9 wink>.

threads-are-a-hoot-ly y'rs - tim

import threading
from time import clock as now

SPAN = 1.0
MAXTHREADS = 100

maxgap = [None] * MAXTHREADS
meangap = [None] * MAXTHREADS
maxiter = [None] * MAXTHREADS
totalit = [None] * MAXTHREADS

def busy_thread(index):
maxgap[index] = -1.0
current = last = now()
finish = current + SPAN
i = 0
totalgaps = 0.0
while current <= finish:
current = now()
i += 1
delta = current - last
totalgaps += delta
if delta > maxgap[index]:
maxgap[index] = delta
maxiter[index] = i
last = current
totalit[index] = i
meangap[index] = totalgaps / i

def drive(nthreads):
threads = [threading.Thread(target=busy_thread, args=(i,))
for i in range(nthreads)]

for t in threads:
t.start()
for t in threads:
t.join()

print "With span", SPAN, "and", nthreads, "threads:"
for i in range(nthreads):
print ("tid %3d meangap %6.2f maxgap %6.2f ms "
" at iter %6d of %6d" % (i, meangap[i] * 1000,
maxgap[i] * 1000, maxiter[i], totalit[i]))
print

for i in range(1, 8):
drive(i)

Peter Hansen

unread,

May 19, 2002, 8:50:41 AM5/19/02

to

Tim Peters wrote:
>
> [Matt Kimball]

> > I "fixed" the problem with inserting frequent 'time.sleep(0)' calls
> > in my compute-intensive thread, and that seemed to give my main GUI
> > thread more time to execute when a user-interface event occurred.
>
> That works on Windows because the underlying Win32 API Sleep() call takes an
> argument of 0 as *meaning* "yield the timeslice". On other OSes it may or
> may not work. For example, people playing this trick on Solaris usually use
> time.sleep(0.01) instead, because time.sleep(0) is a nop there.

Minor, almost off-topic note:

We had been unthinkingly using time.sleep(0.01) for a while, just
kind of blindly using it as a free "yield timeslice". We had
some latency issues and eventually realized that on our system
10 ms is a rather long time to sleep sometimes. We changed to
time.sleep(0.001), again without decent analysis that would let me
back up any specific claims as to the effect, but our problems appeared
to go away.

Just in case people start thinking of 0.01 as an idiom for "yield
timeslice", without understanding that it can really slow down
certain programs...

(I'll offer here to listen intently to anyone willing to share
his or her knowledge of implementation details (under Linux) relating
to what sleep() does with that delay... what _is_ the resolution?)

-Peter

Jochen Küpper

unread,

May 19, 2002, 11:57:55 PM5/19/02

to

On Sun, 19 May 2002 08:50:41 -0400 Peter Hansen wrote:

Peter> We had been unthinkingly using time.sleep(0.01) for a while,
Peter> just kind of blindly using it as a free "yield timeslice".
^^^^^^^

Yep. And then no documentation... :)

Peter> (I'll offer here to listen intently to anyone willing to share
Peter> his or her knowledge of implementation details (under Linux) relating
Peter> to what sleep() does with that delay... what _is_ the resolution?)

Python uses 'select' for the sleep implementation (besides special
cases like Max, Windozer, ...).

On Linux the resolution of select is depending on the HZ kernel config
variable. On x86 it is (standard-wise) 100 (Hz), giving you 10 ms.
On Linux/AXP it is 1024, IIRC. Some people set it to higher values,
although that has other implications, too.

You might want to look at something like userspace-RTAI or KURT if you
do soft-realtime on Linux.

Greetings,
Jochen
--
Einigkeit und Recht und Freiheit http://www.Jochen-Kuepper.de
Liberté, Égalité, Fraternité GnuPG key: 44BCCD8E
Sex, drugs and rock-n-roll

Peter Hansen

unread,

May 21, 2002, 12:26:55 AM5/21/02

to

Jochen Küpper wrote:
>
> On Sun, 19 May 2002 08:50:41 -0400 Peter Hansen wrote:
>
> Peter> We had been unthinkingly using time.sleep(0.01) for a while,
> Peter> just kind of blindly using it as a free "yield timeslice".
>

> Python uses 'select' for the sleep implementation (besides special
> cases like Max, Windozer, ...).
>
> On Linux the resolution of select is depending on the HZ kernel config
> variable. On x86 it is (standard-wise) 100 (Hz), giving you 10 ms.

So does that mean that while time.sleep(0.01) really sleeps for 10 ms,
time.sleep(0.001) translates to a "yield", or does it round up to
the next tick? (Or if you'd rather teach me how to fish, where's
a good place (please don't say "the source" ;-) to find this out myself?

> You might want to look at something like userspace-RTAI or KURT if you
> do soft-realtime on Linux.

It's working fine without that step so far, but thanks for the tip.

-Peter

James J. Besemer

unread,

May 21, 2002, 12:43:16 AM5/21/02

to

Peter Hansen wrote:

> We had been unthinkingly using time.sleep(0.01) for a while, just
> kind of blindly using it as a free "yield timeslice".

Too bad there is no such implementation for this useful function.

It'd be great if time.sleep(0) did the job. Or even better would be if there was
an explicit thread.yield().

--jb

--
James J. Besemer 503-280-0838 voice
http://cascade-sys.com 503-280-0375 fax
mailto:j...@cascade-sys.com

David LeBlanc

unread,

May 21, 2002, 4:04:07 AM5/21/02

to

hehehe

We'd have to call it thread.giveitup() since, in spite of my arguments to
the contrary, "yield" now has different semantics in Python then in most
other languages/libraries that support the keyword.

David LeBlanc
Seattle, WA USA

> --
> http://mail.python.org/mailman/listinfo/python-list

Peter Hansen

unread,

May 21, 2002, 7:57:48 AM5/21/02

to

David LeBlanc wrote (top-posting...David, please don't top-post):

>
> hehehe
>
> We'd have to call it thread.giveitup() since, in spite of my arguments to
> the contrary, "yield" now has different semantics in Python then in most
> other languages/libraries that support the keyword.

Alternatively, we could use threading.Thread.yield_(), in keeping with the
ugly precedent established with unittest.TestCase.assert_() and the
recently discussed poplib.POP3.pass_() methods.

-Peter

David LeBlanc

unread,

May 21, 2002, 2:14:44 PM5/21/02

to

and in the fullness of time, when Python has acquired enough cruft, will it
be Threaded.Threadable.threading.Thread.yield()? :-)

David LeBlanc
Seattle, WA USA

> -----Original Message-----
> From: python-l...@python.org
> [mailto:python-l...@python.org]On Behalf Of Peter Hansen
> Sent: Tuesday, May 21, 2002 4:58
> To: pytho...@python.org
> Subject: Re: Threading unfairness
>
>

> --
> http://mail.python.org/mailman/listinfo/python-list

Jochen Küpper

unread,

May 21, 2002, 12:35:42 AM5/21/02

to

On Tue, 21 May 2002 00:26:55 -0400 Peter Hansen wrote:

Peter> Jochen Küpper wrote:

>> On Linux the resolution of select is depending on the HZ kernel config
>> variable. On x86 it is (standard-wise) 100 (Hz), giving you 10 ms.

Peter> So does that mean that while time.sleep(0.01) really sleeps for 10 ms,
Peter> time.sleep(0.001) translates to a "yield", or does it round up to
Peter> the next tick? (Or if you'd rather teach me how to fish, where's
Peter> a good place (please don't say "the source" ;-) to find this out myself?

I'd say you would look up the kernel source; or maybe glibc is good
enough... :)) Am not sure. The man-page says that 'timeout' is an
upper bound, but I think that is sloppy. I seem to remember that it
always waits at least one scheduler cycle on a blocking call. Ask a
guru, though.

Greetings,
Jochen

Ps: Since you don't like Luke the source, Stevens book(s) on IPC
might be a good source for this kind of stuff. (Sorry, my copies
are ~6000 km away.)

John La Rooy

unread,

May 22, 2002, 4:04:57 AM5/22/02

to

On Tue, 21 May 2002 11:14:44 -0700
"David LeBlanc" <whi...@oz.net> wrote:

> and in the fullness of time, when Python has acquired enough cruft, will it
> be Threaded.Threadable.threading.Thread.yield()? :-)
>
> David LeBlanc
> Seattle, WA USA
>
>

Perhaps. But only if you:

from __the_good_old_days__ import when_yield_was_not_a_keyword

;o)

Cliff Wells

unread,

May 23, 2002, 1:23:54 PM5/23/02

to

On Sat, 18 May 2002 13:54:36 -0700 (PDT)
Matt Kimball wrote:

> While writing some multithreaded GUI code with ActivePython 2.2.1 under
Windows
> XP, on a single processor machine, and I noticed that my user-interface
thread
> was very unresponsive when I had a background computation thread going. This
> annoyed me. It seemed like the main thread where the GUI code was running
was

> being starved of CPU time. Since there doesn't seem to be any way to set
thread
> priorities with Python, I "fixed" the problem with inserting frequent

> 'time.sleep(0)' calls in my compute-intensive thread, and that seemed to give
my
> main GUI thread more time to execute when a user-interface event occurred.

I wonder if sys.setcheckinterval() would help. I've seen it in the docs but
haven't tested it myself.

--
Cliff Wells, Software Engineer
Logiplex Corporation (www.logiplex.net)
(503) 978-6726 x308 (800) 735-0555 x308

Aahz

unread,

May 23, 2002, 5:17:03 PM5/23/02

to

In article <mailman.102217480...@python.org>,

Cliff Wells <logiplex...@earthlink.net> wrote:
>
>I wonder if sys.setcheckinterval() would help. I've seen it in the
>docs but haven't tested it myself.

You really don't want to muck with that. You also don't get much
utility out of it.
--
Aahz (aa...@pythoncraft.com) <*> http://www.pythoncraft.com/

"In the end, outside of spy agencies, people are far too trusting and
willing to help." --Ira Winkler

Tim Peters

unread,

May 24, 2002, 1:54:17 PM5/24/02

to

[Cliff Wells]

> I wonder if sys.setcheckinterval() would help. I've seen it in
> the docs but haven't tested it myself.

Try it. It can make a significant difference for better or worse, depending
on your app. For example, Zope has threads that want to get a lot done per
timeslice, and the default check interval is too low for its needs; Zope
boosts it to 120. That's probably the wrong direction to go if you're
worried about GUI responsiveness, though.