Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Overcoming python performance penalty for multicore CPU
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  20 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
John Nagle  
View profile  
 More options Feb 2 2010, 6:02 pm
Newsgroups: comp.lang.python
From: John Nagle <na...@animats.com>
Date: Tue, 02 Feb 2010 15:02:49 -0800
Local: Tues, Feb 2 2010 6:02 pm
Subject: Overcoming python performance penalty for multicore CPU
    I know there's a performance penalty for running Python on a
multicore CPU, but how bad is it?  I've read the key paper
("www.dabeaz.com/python/GIL.pdf"), of course.  It would be adequate
if the GIL just limited Python to running on one CPU at a time,
but it's worse than that; there's excessive overhead due to
a lame locking implementation.  Running CPU-bound multithreaded
code on a dual-core CPU runs HALF AS FAST as on a single-core
CPU, according to Beasley.

    My main server application, which runs "sitetruth.com"
has both multiple processes and multiple threads in each process.
The system rates web sites, which involves reading and parsing
up to 20 pages from each domain.  Analysis of each domain is
performed in a separate process, but each process uses multiple
threads to read process several web pages simultaneously.

    Some of the threads go compute-bound for a second or two at a time as
they parse web pages.  Sometimes two threads (but never more than three)
in the same process may be parsing web pages at the same time, so
they're contending for CPU time.

    So this is nearly the worst case for the lame GIL lock logic.
Has anyone tried using "affinity" ("http://pypi.python.org/pypi/affinity")
to lock each Python process to a single CPU?  Does that help?

                                John Nagle


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
exar...@twistedmatrix.com  
View profile  
 More options Feb 2 2010, 6:24 pm
Newsgroups: comp.lang.python
From: exar...@twistedmatrix.com
Date: Tue, 02 Feb 2010 23:24:59 -0000
Local: Tues, Feb 2 2010 6:24 pm
Subject: Re: Overcoming python performance penalty for multicore CPU
On 11:02 pm, na...@animats.com wrote:

>    I know there's a performance penalty for running Python on a
>multicore CPU, but how bad is it?  I've read the key paper
>("www.dabeaz.com/python/GIL.pdf"), of course.  It would be adequate
>if the GIL just limited Python to running on one CPU at a time,
>but it's worse than that; there's excessive overhead due to
>a lame locking implementation.  Running CPU-bound multithreaded
>code on a dual-core CPU runs HALF AS FAST as on a single-core
>CPU, according to Beasley.

It's not clear that Beasley's performance numbers apply to any platform
except OS X, which has a particularly poor implementation of the
threading primitives CPython uses to implement the GIL.

You should check to see if it actually applies to your deployment
environment.

The GIL has been re-implemented recently.  Python 3.2, I think, will
include the new implementation, which should bring OS X performance up
to the level of other platforms.  It may also improve certain other
aspects of thread switching.

Jean-Paul


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
alex23  
View profile  
 More options Feb 2 2010, 9:02 pm
Newsgroups: comp.lang.python
From: alex23 <wuwe...@gmail.com>
Date: Tue, 2 Feb 2010 18:02:09 -0800 (PST)
Local: Tues, Feb 2 2010 9:02 pm
Subject: Re: Overcoming python performance penalty for multicore CPU
On Feb 3, 9:02 am, John Nagle <na...@animats.com> wrote:

>     I know there's a performance penalty for running Python on a
> multicore CPU, but how bad is it?  I've read the key paper
> ("www.dabeaz.com/python/GIL.pdf"), of course.

It's a shame that Python 3.x is dead to you, otherwise you'd be able
to enjoy the new GIL implementation in 3.2: http://www.dabeaz.com/python/NewGIL.pdf

Actually, it looks like you probably still can:
 + patch for 2.5.4: http://thread.gmane.org/gmane.comp.python.devel/109929
 + patch for 2.7? http://bugs.python.org/issue7753

(Can't comment on affinity, though, sorry)


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Terry Reedy  
View profile  
 More options Feb 3 2010, 1:33 am
Newsgroups: comp.lang.python
From: Terry Reedy <tjre...@udel.edu>
Date: Wed, 03 Feb 2010 01:33:20 -0500
Local: Wed, Feb 3 2010 1:33 am
Subject: Re: Overcoming python performance penalty for multicore CPU
On 2/2/2010 9:02 PM, alex23 wrote:

> On Feb 3, 9:02 am, John Nagle<na...@animats.com>  wrote:
>>      I know there's a performance penalty for running Python on a
>> multicore CPU, but how bad is it?  I've read the key paper
>> ("www.dabeaz.com/python/GIL.pdf"), of course.

> It's a shame that Python 3.x is dead to you, otherwise you'd be able
> to enjoy the new GIL implementation in 3.2: http://www.dabeaz.com/python/NewGIL.pdf

> Actually, it looks like you probably still can:
>   + patch for 2.5.4: http://thread.gmane.org/gmane.comp.python.devel/109929
>   + patch for 2.7? http://bugs.python.org/issue7753

The patch was rejected for 2.7 (and earlier) because it could break code
as explained in the discussion. One would have to apply and compile
their own binary.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Paul Rubin  
View profile  
 More options Feb 3 2010, 8:51 pm
Newsgroups: comp.lang.python
From: Paul Rubin <no.em...@nospam.invalid>
Date: Wed, 03 Feb 2010 17:51:36 -0800
Local: Wed, Feb 3 2010 8:51 pm
Subject: Re: Overcoming python performance penalty for multicore CPU

John Nagle <na...@animats.com> writes:
> Analysis of each domain is
> performed in a separate process, but each process uses multiple
> threads to read process several web pages simultaneously.

>    Some of the threads go compute-bound for a second or two at a time as
> they parse web pages.  

You're probably better off using separate processes for the different
pages.  If I remember, you were using BeautifulSoup, which while very
cool, is pretty doggone slow for use on large volumes of pages.  I don't
know if there's much that can be done about that without going off on a
fairly messy C or C++ coding adventure.  Maybe someday someone will do
that.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
John Nagle  
View profile  
 More options Feb 3 2010, 10:46 pm
Newsgroups: comp.lang.python
From: John Nagle <na...@animats.com>
Date: Wed, 03 Feb 2010 19:46:43 -0800
Local: Wed, Feb 3 2010 10:46 pm
Subject: Re: Overcoming python performance penalty for multicore CPU

Paul Rubin wrote:
> John Nagle <na...@animats.com> writes:
>> Analysis of each domain is
>> performed in a separate process, but each process uses multiple
>> threads to read process several web pages simultaneously.

>>    Some of the threads go compute-bound for a second or two at a time as
>> they parse web pages.  

> You're probably better off using separate processes for the different
> pages.  If I remember, you were using BeautifulSoup, which while very
> cool, is pretty doggone slow for use on large volumes of pages.  I don't
> know if there's much that can be done about that without going off on a
> fairly messy C or C++ coding adventure.  Maybe someday someone will do
> that.

    I already use separate processes for different domains.  I could
live with Python's GIL as long as moving to a multicore server
doesn't make performance worse.  That's why I asked about CPU dedication
for each process, to avoid thrashing at the GIL.

    There's enough intercommunication between the threads working on
a single site that it's a pain to do them as subprocesses. And I
definitely don't want to launch subprocesses for each page; the
Python load time would be worse than the actual work.  The
subprocess module assumes you're willing to launch a subprocess
for each transaction.

    The current program organization is that there's a scheduler
process which gets requests, prioritizes them, and runs the requested
domains through the site evaluation mill.  The scheduler maintains a
pool of worker processes which get work request via their input pipe, in Pickle
format, and return results, again in Pickle format.  When not in
use, the worker processes sit there dormant, so there's no Python
launch cost for each transaction.  If a worker process crashes, the
scheduler replaces it with a fresh one, and every few hundred uses,
each worker process is replaced with a fresh copy, in case Python
has a memory leak.   It's a lot like the way
FCGI works.

    Scheduling is managed using an in-memory
table in MySQL, so the load can be spread over a cluster if desired,
with a scheduler process on each machine.

    So I already have a scalable architecture.  The only problem
is excess overhead on multicore CPUs.

                                John Nagle


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Steve Holden  
View profile  
 More options Feb 3 2010, 10:50 pm
Newsgroups: comp.lang.python
From: Steve Holden <st...@holdenweb.com>
Date: Wed, 03 Feb 2010 22:50:19 -0500
Local: Wed, Feb 3 2010 10:50 pm
Subject: Re: Overcoming python performance penalty for multicore CPU

I believe it's already been said that the GIL thrashing is mostly MacOS
specific. You might also find something in the affinity module

  http://pypi.python.org/pypi/affinity/0.1.0

to ensure that each process in your pool runs on only one processor.

regards
 Steve
--
Steve Holden           +1 571 484 6266   +1 800 494 3119
PyCon is coming! Atlanta, Feb 2010  http://us.pycon.org/
Holden Web LLC                 http://www.holdenweb.com/
UPCOMING EVENTS:        http://holdenweb.eventbrite.com/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
John Nagle  
View profile  
 More options Feb 4 2010, 1:11 am
Newsgroups: comp.lang.python
From: John Nagle <na...@animats.com>
Date: Wed, 03 Feb 2010 22:11:56 -0800
Local: Thurs, Feb 4 2010 1:11 am
Subject: Re: Overcoming python performance penalty for multicore CPU

    No, the original analysis was MacOS oriented, but the same mechanism
applies for fighting over the GIL on all platforms.  There was some
pontification that it might be a MacOS-only issue, but no facts
were presented.  It might be cheaper on C implementations with mutexes
that don't make system calls for the non-blocking cases.

                                        John Nagle


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Paul Rubin  
View profile  
 More options Feb 4 2010, 1:57 am
Newsgroups: comp.lang.python
From: Paul Rubin <no.em...@nospam.invalid>
Date: Wed, 03 Feb 2010 22:57:40 -0800
Local: Thurs, Feb 4 2010 1:57 am
Subject: Re: Overcoming python performance penalty for multicore CPU

John Nagle <na...@animats.com> writes:
>    There's enough intercommunication between the threads working on
> a single site that it's a pain to do them as subprocesses. And I
> definitely don't want to launch subprocesses for each page; the
> Python load time would be worse than the actual work.  The
> subprocess module assumes you're willing to launch a subprocess
> for each transaction.

Why not just use socketserver and have something like a fastcgi?

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Anh Hai Trinh  
View profile  
 More options Feb 4 2010, 6:13 am
Newsgroups: comp.lang.python
From: Anh Hai Trinh <anh.hai.tr...@gmail.com>
Date: Thu, 4 Feb 2010 03:13:58 -0800 (PST)
Local: Thurs, Feb 4 2010 6:13 am
Subject: Re: Overcoming python performance penalty for multicore CPU
On Feb 4, 10:46 am, John Nagle <na...@animats.com> wrote:

>     There's enough intercommunication between the threads working on
> a single site that it's a pain to do them as subprocesses. And I
> definitely don't want to launch subprocesses for each page; the
> Python load time would be worse than the actual work.  The
> subprocess module assumes you're willing to launch a subprocess
> for each transaction.

You could perhaps use a process pool inside each domain worker to work
on the pages?  There is multiprocessing.Pool and other
implementations.

For examples, in this library, you can s/ThreadPool/ProcessPool/g and
this example would work: <http://www.onideas.ws/stream.py/#retrieving-
web-pages-concurrently>.

If you want to DIY, with multiprocessing.Lock/Pipe/Queue, I don't
understand why it would be more of a pain to write your threads as
processes.

// aht
http://blog.onideas.ws


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Stefan Behnel  
View profile  
 More options Feb 8 2010, 3:59 am
Newsgroups: comp.lang.python
From: Stefan Behnel <stefan...@behnel.de>
Date: Mon, 08 Feb 2010 09:59:42 +0100
Local: Mon, Feb 8 2010 3:59 am
Subject: Re: Overcoming python performance penalty for multicore CPU
Paul Rubin, 04.02.2010 02:51:

> John Nagle writes:
>> Analysis of each domain is
>> performed in a separate process, but each process uses multiple
>> threads to read process several web pages simultaneously.

>>    Some of the threads go compute-bound for a second or two at a time as
>> they parse web pages.  

> You're probably better off using separate processes for the different
> pages.  If I remember, you were using BeautifulSoup, which while very
> cool, is pretty doggone slow for use on large volumes of pages.  I don't
> know if there's much that can be done about that without going off on a
> fairly messy C or C++ coding adventure.  Maybe someday someone will do
> that.

Well, if multi-core performance is so important here, then there's a pretty
simple thing the OP can do: switch to lxml.

http://blog.ianbicking.org/2008/03/30/python-html-parser-performance/

Stefan


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Paul Rubin  
View profile  
 More options Feb 8 2010, 4:10 am
Newsgroups: comp.lang.python
From: Paul Rubin <no.em...@nospam.invalid>
Date: Mon, 08 Feb 2010 01:10:07 -0800
Local: Mon, Feb 8 2010 4:10 am
Subject: Re: Overcoming python performance penalty for multicore CPU

Stefan Behnel <stefan...@behnel.de> writes:
> Well, if multi-core performance is so important here, then there's a pretty
> simple thing the OP can do: switch to lxml.

> http://blog.ianbicking.org/2008/03/30/python-html-parser-performance/

Well, lxml is uses libxml2, a fast XML parser written in C, but AFAIK it
only works on well-formed XML.  The point of Beautiful Soup is that it
works on all kinds of garbage hand-written legacy HTML with mismatched
tags and other sorts of errors.  Beautiful Soup is slower because it's
full of special cases and hacks for that reason, and it is written in
Python.  Writing something that complex in C to handle so much
potentially malicious input would be quite a lot of work to write at
all, and very difficult to ensure was really safe.  Look at the many
browser vulnerabilities we've seen over the years due to that sort of
problem, for example.  But, for web crawling, you really do need to
handle the messy and wrong HTML properly.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Antoine Pitrou  
View profile  
 More options Feb 8 2010, 9:28 am
Newsgroups: comp.lang.python
From: Antoine Pitrou <solip...@pitrou.net>
Date: Mon, 8 Feb 2010 14:28:41 +0000 (UTC)
Local: Mon, Feb 8 2010 9:28 am
Subject: Re: Overcoming python performance penalty for multicore CPU
Le Tue, 02 Feb 2010 15:02:49 -0800, John Nagle a écrit :

> I know there's a performance penalty for running Python on a multicore
> CPU, but how bad is it?  I've read the key paper
> ("www.dabeaz.com/python/GIL.pdf"), of course.  It would be adequate if
> the GIL just limited Python to running on one CPU at a time, but it's
> worse than that; there's excessive overhead due to a lame locking
> implementation.  Running CPU-bound multithreaded code on a dual-core CPU
> runs HALF AS FAST as on a single-core CPU, according to Beasley.

That's on certain types of workloads, and perhaps on certain OSes, so you
should try benching your own workload to see whether it applies.

Two closing remarks:
- this should (hopefully) be fixed in 3.2, as exarkun noticed
- instead of spawning one thread per Web page, you could use Twisted or
another event loop mechanism in order to process pages serially, in the
order of arrival

Regards

Antoine.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
J Kenneth King  
View profile  
 More options Feb 8 2010, 11:21 am
Newsgroups: comp.lang.python
From: J Kenneth King <ja...@agentultra.com>
Date: Mon, 08 Feb 2010 11:21:07 -0500
Local: Mon, Feb 8 2010 11:21 am
Subject: Re: Overcoming python performance penalty for multicore CPU

If the difference is great enough, you might get a benefit from
analyzing all pages with lxml and throwing invalid pages into a bucket
for later processing with BeautifulSoup.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
John Krukoff  
View profile  
 More options Feb 8 2010, 11:43 am
Newsgroups: comp.lang.python
From: John Krukoff <jkruk...@ltgc.com>
Date: Mon, 08 Feb 2010 09:43:10 -0700
Local: Mon, Feb 8 2010 11:43 am
Subject: Re: Overcoming python performance penalty for multicore CPU

Actually, lxml has an HTML parser which does pretty well with the
standard level of broken one finds most often on the web. And, when it
falls down, it's easy to integrate BeautifulSoup as a slow backup for
when things go really wrong (as J Kenneth King mentioned earlier):

http://codespeak.net/lxml/lxmlhtml.html#parsing-html

At least in my experience, I haven't actually had to parse anything that
lxml couldn't handle yet, however.
--
John Krukoff <jkruk...@ltgc.com>
Land Title Guarantee Company


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Martin v. Loewis  
View profile  
 More options Feb 21 2010, 4:22 pm
Newsgroups: comp.lang.python
From: "Martin v. Loewis" <mar...@v.loewis.de>
Date: Sun, 21 Feb 2010 22:22:28 +0100
Local: Sun, Feb 21 2010 4:22 pm
Subject: Re: Overcoming python performance penalty for multicore CPU

John Nagle wrote:
>    I know there's a performance penalty for running Python on a
> multicore CPU, but how bad is it?  I've read the key paper
> ("www.dabeaz.com/python/GIL.pdf"), of course.  It would be adequate
> if the GIL just limited Python to running on one CPU at a time,
> but it's worse than that; there's excessive overhead due to
> a lame locking implementation.  Running CPU-bound multithreaded
> code on a dual-core CPU runs HALF AS FAST as on a single-core
> CPU, according to Beasley.

I couldn't reproduce these results on Linux. Not sure what "HALF AS
FAST" is; I suppose it means "it runs TWICE AS LONG" - this is what I
couldn't reproduce.

If I run Beazley's program on Linux 2.6.26, on a 4 processor Xeon (3GHz)
machine, I get 30s for the sequential execution, 40s for the
multi-threaded case, and 32s for the multi-threaded case when pinning
the Python process to a single CPU (using taskset(1)).

So it's 6% overhead for threading, and 25% penalty for multicore CPUs -
far from the 100% you seem to expect.

Regards,
Martin


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ryan Kelly  
View profile  
 More options Feb 21 2010, 4:45 pm
Newsgroups: comp.lang.python
From: Ryan Kelly <r...@rfk.id.au>
Date: Mon, 22 Feb 2010 08:45:50 +1100
Local: Sun, Feb 21 2010 4:45 pm
Subject: Re: Overcoming python performance penalty for multicore CPU

It's far from scientific, but I've seen behaviour that's close to a 100%
performance penalty on a dual-core linux system:

   http://www.rfk.id.au/blog/entry/a-gil-adventure-threading2

Short story:  a particular test suite of mine used to run in around 25
seconds, but a bit of ctypes magic to set thread affinity dropped the
running time to under 13 seconds.

  Cheers,

     Ryan

--
Ryan Kelly
http://www.rfk.id.au  |  This message is digitally signed. Please visit
r...@rfk.id.au        |  http://www.rfk.id.au/ramblings/gpg/ for details

  signature.asc
< 1K Download

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Martin v. Loewis  
View profile  
 More options Feb 21 2010, 5:05 pm
Newsgroups: comp.lang.python
From: "Martin v. Loewis" <mar...@v.loewis.de>
Date: Sun, 21 Feb 2010 23:05:44 +0100
Local: Sun, Feb 21 2010 5:05 pm
Subject: Re: Overcoming python performance penalty for multicore CPU

> It's far from scientific, but I've seen behaviour that's close to a 100%
> performance penalty on a dual-core linux system:

>    http://www.rfk.id.au/blog/entry/a-gil-adventure-threading2

> Short story:  a particular test suite of mine used to run in around 25
> seconds, but a bit of ctypes magic to set thread affinity dropped the
> running time to under 13 seconds.

Indeed, it's not scientific - but with a few more details, you could
improve it quite a lot: what specific Linux distribution (the posting
doesn't even say it's Linux), what specific Python version had you been
using? (less important) what CPUs? If you can: what specific test suite?

A lot of science is about repeatability. Making a systematic study is
(IMO) over-valued - anecdotal reports are useful, too, as long as they
allow for repeatable experiments.

Regards,
Martin


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Martin v. Loewis  
View profile  
 More options Feb 21 2010, 5:05 pm
Newsgroups: comp.lang.python
From: "Martin v. Loewis" <mar...@v.loewis.de>
Date: Sun, 21 Feb 2010 23:05:44 +0100
Local: Sun, Feb 21 2010 5:05 pm
Subject: Re: Overcoming python performance penalty for multicore CPU

> It's far from scientific, but I've seen behaviour that's close to a 100%
> performance penalty on a dual-core linux system:

>    http://www.rfk.id.au/blog/entry/a-gil-adventure-threading2

> Short story:  a particular test suite of mine used to run in around 25
> seconds, but a bit of ctypes magic to set thread affinity dropped the
> running time to under 13 seconds.

Indeed, it's not scientific - but with a few more details, you could
improve it quite a lot: what specific Linux distribution (the posting
doesn't even say it's Linux), what specific Python version had you been
using? (less important) what CPUs? If you can: what specific test suite?

A lot of science is about repeatability. Making a systematic study is
(IMO) over-valued - anecdotal reports are useful, too, as long as they
allow for repeatable experiments.

Regards,
Martin


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ryan Kelly  
View profile  
 More options Feb 21 2010, 5:39 pm
Newsgroups: comp.lang.python
From: Ryan Kelly <r...@rfk.id.au>
Date: Mon, 22 Feb 2010 09:39:27 +1100
Local: Sun, Feb 21 2010 5:39 pm
Subject: Re: Overcoming python performance penalty for multicore CPU

On Sun, 2010-02-21 at 23:05 +0100, Martin v. Loewis wrote:
> > It's far from scientific, but I've seen behaviour that's close to a 100%
> > performance penalty on a dual-core linux system:

> >    http://www.rfk.id.au/blog/entry/a-gil-adventure-threading2

> > Short story:  a particular test suite of mine used to run in around 25
> > seconds, but a bit of ctypes magic to set thread affinity dropped the
> > running time to under 13 seconds.

> Indeed, it's not scientific - but with a few more details, you could
> improve it quite a lot: what specific Linux distribution (the posting
> doesn't even say it's Linux), what specific Python version had you been
> using? (less important) what CPUs? If you can: what specific test suite?

I'm on Ubuntu Karmic, Python 2.6.4, an AMD Athlon 7750 dual core.

Unfortunately the test suite is for a proprietary application.  I've
been able to reproduce similar behaviour with an open-source test suite,
using the current trunk of the "pyfilesystem" project:

  http://code.google.com/p/pyfilesystem/

In this project "OSFS" is an object-oriented interface to the local
filesystem.  The test case "TestOSFS.test_cases_in_separate_dirs" runs
three theads, each doing a bunch of IO in a different directory.

Running the tests normally:

    rfk@durian:/storage/software/fs$ nosetests fs/tests/test_fs.py:TestOSFS.test_cases_in_separate_dirs
    .
    ----------------------------------------------------------------------
    Ran 1 test in 9.787s

That's the best result from five runs - I saw it go as high as 12
seconds.  Watching it in top, I see CPU usage at around 150%.

Now using threading2 to set the process cpu affinity at the start of the
test run:

    rfk@durian:/storage/software/fs$ nosetests fs/tests/test_fs.py:TestOSFS.test_cases_in_separate_dirs
    .
    ----------------------------------------------------------------------
    Ran 1 test in 3.792s

Again, best of five.  The variability in times here is much lower - I
never saw it go above 4 seconds.  CPU usage is consistently 100%.

  Cheers,

     Ryan

--
Ryan Kelly
http://www.rfk.id.au  |  This message is digitally signed. Please visit
r...@rfk.id.au        |  http://www.rfk.id.au/ramblings/gpg/ for details

  signature.asc
< 1K Download

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »