Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Speeding up network access: threading?

1 view
Skip to first unread message

Jens Müller

unread,
Jan 4, 2010, 11:22:42 AM1/4/10
to
Hello,

what would be best practise for speeding up a larger number of http-get
requests done via urllib? Until now they are made in sequence, each request
taking up to one second. The results must be merged into a list, while the
original sequence needs not to be kept.

I think speed could be improved by parallizing. One could use multiple
threads.
Are there any python best practises, or even existing modules, for creating
and handling a task queue with a fixed number of concurrent threads?

Thanks and regards!

Terry Reedy

unread,
Jan 4, 2010, 12:17:46 PM1/4/10
to pytho...@python.org

I believe code of this type has been published here in various threads.
The fairly obvious thing to do is use a queue.queue for tasks and
another for results and a pool of threads that read, fetch, and write.


exa...@twistedmatrix.com

unread,
Jan 4, 2010, 11:52:28 AM1/4/10
to

Using multiple threads is one approach. There are a few thread pool
implementations lying about; one is part of Twisted,
<http://twistedmatrix.com/documents/current/api/twisted.python.threadpool.ThreadPool.html>.

Another approach is to use non-blocking or asynchronous I/O to make
multiple requests without using multiple threads. Twisted can help you
out with this, too. There's two async HTTP client APIs available. The
older one:

http://twistedmatrix.com/documents/current/api/twisted.web.client.getPage.html
http://twistedmatrix.com/documents/current/api/twisted.web.client.HTTPClientFactory.html

And the newer one, introduced in 9.0:

http://twistedmatrix.com/documents/current/api/twisted.web.client.Agent.html

Jean-Paul

Jens Müller

unread,
Jan 5, 2010, 11:15:45 AM1/5/10
to
Hello,

> The fairly obvious thing to do is use a queue.queue for tasks and another
> for results and a pool of threads that read, fetch, and write.

Thanks, indeed.

Is a list thrad-safe or do I need to lock when adding the results of my
worker threads to a list? The order of the elements in the list does not
matter.

Jens

Jens Müller

unread,
Jan 5, 2010, 9:04:56 AM1/5/10
to
Hello,

> The fairly obvious thing to do is use a queue.queue for tasks and another
> for results and a pool of threads that read, fetch, and write.

Thanks, indeed.

MRAB

unread,
Jan 5, 2010, 12:02:29 PM1/5/10
to pytho...@python.org
Terry said "queue". not "list". Use the Queue class (it's thread-safe)
in the "Queue" module (assuming you're using Python 2.x; in Python 3.x
it's called the "queue" module).

Antoine Pitrou

unread,
Jan 5, 2010, 12:43:06 PM1/5/10
to pytho...@python.org
Le Tue, 05 Jan 2010 15:04:56 +0100, Jens Müller a écrit :
>
> Is a list thrad-safe or do I need to lock when adding the results of my
> worker threads to a list? The order of the elements in the list does not
> matter.

The built-in list type is thread-safe, but is doesn't provide the waiting
features that queue.Queue provides.

Regards

Antoine.

Jens Müller

unread,
Jan 5, 2010, 1:45:50 PM1/5/10
to
Hi and sorry for double posting - had mailer problems,

> Terry said "queue". not "list". Use the Queue class (it's thread-safe)
> in the "Queue" module (assuming you're using Python 2.x; in Python 3.x
> it's called the "queue" module).

Yes yes, I know. I use a queue to realize the thread pool queue, that works
all right.

But each worker thread calculates a result and needs to make it avaialable
to the application in the main thread again. Therefore, it appends its
result to a common list. This seems works as well, but I was thinking of
possible conflict situations that maybe could happen when two threads append
their results to that same result list at the same moment.

Regards,
Jens

Steve Holden

unread,
Jan 5, 2010, 2:11:05 PM1/5/10
to pytho...@python.org, pytho...@python.org
If you don't need to take anything off the list ever, just create a
separate thread that reads items from an output Queue and appends them
to the list.

If you *do* take them off, then use a Queue.

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
PyCon is coming! Atlanta, Feb 2010 http://us.pycon.org/
Holden Web LLC http://www.holdenweb.com/
UPCOMING EVENTS: http://holdenweb.eventbrite.com/

Steve Holden

unread,
Jan 5, 2010, 2:11:05 PM1/5/10
to Jens Müller, pytho...@python.org
0 new messages