Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Do I have to use threads?
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  11 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Philip Semanchuk  
View profile  
 More options Jan 5 2010, 11:36 pm
Newsgroups: comp.lang.python
From: Philip Semanchuk <phi...@semanchuk.com>
Date: Tue, 5 Jan 2010 23:36:19 -0500
Local: Tues, Jan 5 2010 11:36 pm
Subject: Re: Do I have to use threads?

On Jan 5, 2010, at 11:26 PM, aditya shukla wrote:

> Hello people,

> I have 5 directories corresponding 5  different urls .I want to  
> download
> images from those urls and place them in the respective  
> directories.I have
> to extract the contents and download them simultaneously.I can  
> extract the
> contents and do then one by one. My questions is for doing it  
> simultaneously
> do I have to use threads?

No. You could spawn 5 copies of wget (or curl or a Python program that  
you've written). Whether or not that will perform better or be easier  
to code, debug and maintain depends on the other aspects of your  
program(s).

bye
Philip


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Gary Herron  
View profile  
 More options Jan 6 2010, 2:29 am
Newsgroups: comp.lang.python
From: Gary Herron <gher...@islandtraining.com>
Date: Tue, 05 Jan 2010 23:29:20 -0800
Local: Wed, Jan 6 2010 2:29 am
Subject: Re: Do I have to use threads?

aditya shukla wrote:
> Hello people,

> I have 5 directories corresponding 5  different urls .I want to
> download images from those urls and place them in the respective
> directories.I have to extract the contents and download them
> simultaneously.I can extract the contents and do then one by one. My
> questions is for doing it simultaneously do I have to use threads?

> Please point me in the right direction.

> Thanks

> Aditya

You've been given some bad advice here.

First -- threads are lighter-weight than processes, so threads are
probably *more* efficient.  However, with only five thread/processes,
the difference is probably not noticeable.    (If the prejudice against
threads comes from concerns over the GIL -- that also is a misplaced
concern in this instance.  Since you only have network connection, you
will receive only one packet at a time, so only one thread will be
active at a time.   If the extraction process uses a significant enough
amount of CPU time so that the extractions are all running at the same
time *AND* if you are running on a machine with separate CPU/cores *AND*
you would like the extractions to be running truly in parallel on those
separate cores,  *THEN*, and only then, will processes be more efficient
than threads.)

Second, running 5 wgets is equivalent to 5 processes not 5 threads.

And third -- you don't have to use either threads *or* processes.  There
is another possibility which is much more light-weight:  asynchronous
I/O,  available through the low level select module, or more usefully
via the higher-level asyncore module.  (Although the learning curve
might trip you up, and some people find the programming model for
asyncore hard to fathom,  I find it more intuitive in this case than
threads/processes.)

In fact, the asyncore manual page has a ~20 line class which implements
a web page retrieval.  You could replace that example's single call to
http_client with five calls, one for each of your ULRs.  Then when you
enter the last line (that is the asyncore.loop() call) the five  will be
downloading simultaneously.

See http://docs.python.org/library/asyncore.html

Gary Herron


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Philip Semanchuk  
View profile  
 More options Jan 6 2010, 8:24 am
Newsgroups: comp.lang.python
From: Philip Semanchuk <phi...@semanchuk.com>
Date: Wed, 6 Jan 2010 08:24:54 -0500
Local: Wed, Jan 6 2010 8:24 am
Subject: Re: Do I have to use threads?

On Jan 6, 2010, at 12:45 AM, Brian J Mingus wrote:

???

Process != thread


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
exar...@twistedmatrix.com  
View profile  
 More options Jan 6 2010, 9:11 am
Newsgroups: comp.lang.python
From: exar...@twistedmatrix.com
Date: Wed, 06 Jan 2010 14:11:34 -0000
Local: Wed, Jan 6 2010 9:11 am
Subject: Re: Do I have to use threads?
On 04:26 am, adityashukla1...@gmail.com wrote:

>Hello people,

>I have 5 directories corresponding 5  different urls .I want to
>download
>images from those urls and place them in the respective directories.I
>have
>to extract the contents and download them simultaneously.I can extract
>the
>contents and do then one by one. My questions is for doing it
>simultaneously
>do I have to use threads?

>Please point me in the right direction.

See Twisted,

  http://twistedmatrix.com/

in particular, Twisted Web's asynchronous HTTP client,

  http://twistedmatrix.com/documents/current/web/howto/client.html
  http://twistedmatrix.com/documents/current/api/twisted.web.client.html

Jean-Paul


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Marco Salden  
View profile  
 More options Jan 7 2010, 3:00 am
Newsgroups: comp.lang.python
From: Marco Salden <marco.sal...@gmail.com>
Date: Thu, 7 Jan 2010 00:00:05 -0800 (PST)
Local: Thurs, Jan 7 2010 3:00 am
Subject: Re: Do I have to use threads?
On Jan 6, 5:36 am, Philip Semanchuk <phi...@semanchuk.com> wrote:

Yep, the more easier and straightforward the approach, the better:
threads are always (programmers')-error-prone by nature.
But my question would be: does it REALLY need to be simultaneously:
the CPU/OS only has more overhead doing this in parallel with
processess. Measuring sequential processing and then trying to
optimize (e.g. for user response or whatever) would be my prefered way
to go. Less=More.

regards,
Marco


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jorgen Grahn  
View profile  
 More options Jan 7 2010, 11:32 am
Newsgroups: comp.lang.python
From: Jorgen Grahn <grahn+n...@snipabacken.se>
Date: 7 Jan 2010 16:32:58 GMT
Local: Thurs, Jan 7 2010 11:32 am
Subject: Re: Do I have to use threads?

Normally when you do HTTP in parallell over several TCP sockets, it
has nothing to do with CPU overhead. You just don't want every GET to
be delayed just because the server(s) are lazy responding to the first
few ones; or you might want to read the text of a web page and the CSS
before a few huge pictures have been downloaded.

His "I have to [do them] simultaneously" makes me want to ask "Why?".

If he's expecting *many* pictures, I doubt that the parallel download
will buy him much.  Reusing the same TCP socket for all of them is
more likely to help, especially if the pictures aren't tiny. One
long-lived TCP connection is much more efficient than dozens of
short-lived ones.

Personally, I'd popen() wget and let it do the job for me.

/Jorgen

--
  // Jorgen Grahn <grahn@  Oo  o.   .  .
\X/     snipabacken.se>   O  o   .


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
MRAB  
View profile  
 More options Jan 7 2010, 12:38 pm
Newsgroups: comp.lang.python
From: MRAB <pyt...@mrabarnett.plus.com>
Date: Thu, 07 Jan 2010 17:38:46 +0000
Local: Thurs, Jan 7 2010 12:38 pm
Subject: Re: Do I have to use threads?

 From my own experience:

I wanted to download a number of webpages.

I noticed that there was a significant delay before it would reply, and
an especially long delay for one of them, so I used a number of threads,
each one reading a URL from a queue, performing the download, and then
reading the next URL, until there were none left (actually, until it
read the sentinel None, which it put back for the other threads).

The result?

Shorter total download time because it could be downloading one webpage
while waiting for another to reply.

(Of course, I had to make sure that I didn't have too many threads,
because that might've put too many demands on the website, not a nice
thing to do!)


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Philip Semanchuk  
View profile  
 More options Jan 7 2010, 12:53 pm
Newsgroups: comp.lang.python
From: Philip Semanchuk <phi...@semanchuk.com>
Date: Thu, 7 Jan 2010 12:53:39 -0500
Local: Thurs, Jan 7 2010 12:53 pm
Subject: Re: Do I have to use threads?

On Jan 7, 2010, at 11:32 AM, Jorgen Grahn wrote:

Exactly what I was thinking. He's surely doing something more  
complicated than his post suggests, and without that detail it's  
impossible to say whether threads, processes, asynch or voodoo is the  
best approach.

bye
P


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jorgen Grahn  
View profile  
 More options Jan 8 2010, 9:21 am
Newsgroups: comp.lang.python
From: Jorgen Grahn <grahn+n...@snipabacken.se>
Date: 8 Jan 2010 14:21:38 GMT
Local: Fri, Jan 8 2010 9:21 am
Subject: Re: Do I have to use threads?

I wonder what that "extraction" would be, by the way.  Unless you ask
for compression of the HTTP data, the images come as-is on the TCP
stream.

> so that the extractions are all running at the same
> time *AND* if you are running on a machine with separate CPU/cores *AND*
> you would like the extractions to be running truly in parallel on those
> separate cores,  *THEN*, and only then, will processes be more efficient
> than threads.)

I can't remember what the bad advice was, but here processes versus
threads clearly doesn't matter performance-wise.  I generally
recommend processes, because how they work is well-known and they're
not as vulnerable to weird synchronization bugs as threads.

> Second, running 5 wgets is equivalent to 5 processes not 5 threads.

> And third -- you don't have to use either threads *or* processes.  There
> is another possibility which is much more light-weight:  asynchronous
> I/O,  available through the low level select module, or more usefully
> via the higher-level asyncore module.

Yeah, that would be my first choice too for a problem which isn't
clearly CPU-bound.  Or my second choice -- the first would be calling
on a utility like wget(1).

/Jorgen

--
  // Jorgen Grahn <grahn@  Oo  o.   .  .
\X/     snipabacken.se>   O  o   .


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
r0g  
View profile  
 More options Jan 8 2010, 9:31 am
Newsgroups: comp.lang.python
From: r0g <aioe....@technicalbloke.com>
Date: Fri, 08 Jan 2010 14:31:21 +0000
Local: Fri, Jan 8 2010 9:31 am
Subject: Re: Do I have to use threads?

Threads aren't as hard a some people make out although it does depend on
the problem. If your processes are effectively independent then threads
are probably the right solution. You can turn any function into a thread
quite easily, I posted a function for this a while back...

http://groups.google.com/group/comp.lang.python/msg/3361a897db3834b4?...

Also it's often a good idea to build in a flag that switches your app
from multi threaded to single threaded as it's easier to debug the latter.

Roger.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Tom  
View profile  
 More options Jan 13 2010, 12:09 pm
Newsgroups: comp.lang.python
From: Tom <sharpbla...@gmail.com>
Date: Wed, 13 Jan 2010 09:09:59 -0800 (PST)
Local: Wed, Jan 13 2010 12:09 pm
Subject: Re: Do I have to use threads?
On Jan 7, 5:38 pm, MRAB <pyt...@mrabarnett.plus.com> wrote:

A fair few of my scripts require multiple uploads and downloads, and I
always use threads to do so. I was using an API which was quite badly
designed, and I got a list of UserId's from one API call then had to
query another API method to get info on each of the UserId's I got
from the first API. I could have used twisted, but in the end I just
made a simple thread pool (30 threads and an in/out Queue). The
result? A *massive* speedup, even with the extra complications of
waiting until all the threads are done then grouping the results
together from the output Queue.

Since then I always use native threads.

Tom


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »