Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Please help with Threading

72 views
Skip to first unread message

Jurgens de Bruin

unread,
May 18, 2013, 4:58:13 AM5/18/13
to
This is my first script where I want to use the python threading module. I have a large dataset which is a list of dict this can be as much as 200 dictionaries in the list. The final goal is a histogram for each dict 16 histograms on a page ( 4x4 ) - this already works.
What I currently do is a create a nested list [ [ {} ], [ {} ] ] each inner list contains 16 dictionaries, thus each inner list is a single page of 16 histograms. Iterating over the outer-list and creating the graphs takes to long. So I would like multiple inner-list to be processes simultaneously and creating the graphs in "parallel".
I am trying to use the python threading for this. I create 4 threads loop over the outer-list and send a inner-list to the thread. This seems to work if my nested lists only contains 2 elements - thus less elements than threads. Currently the scripts runs and then seems to get hung up. I monitor the resource on my mac and python starts off good using 80% and when the 4-thread is created the CPU usages drops to 0%.

My thread creating is based on the following : http://www.tutorialspoint.com/python/python_multithreading.htm

Any help would be create!!!

Peter Otten

unread,
May 18, 2013, 5:09:03 AM5/18/13
to pytho...@python.org
Can you show us the code?

Jurgens de Bruin

unread,
May 18, 2013, 7:23:12 AM5/18/13
to
I will post code - the entire scripts is 1000 lines of code - can I post the threading functions only?

Peter Otten

unread,
May 18, 2013, 8:01:04 AM5/18/13
to pytho...@python.org
Jurgens de Bruin wrote:

> I will post code - the entire scripts is 1000 lines of code - can I post
> the threading functions only?

Try to condense it to the relevant parts, but make sure that it can be run
by us.

As a general note, when you add new stuff to an existing longish script it
is always a good idea to write it in such a way that you can test it
standalone so that you can have some confidence that it will work as
designed once you integrate it with your old code.

Dave Angel

unread,
May 18, 2013, 9:55:56 AM5/18/13
to pytho...@python.org
CPython, and apparently (all of?) the other current Python
implementations, uses a GIL to prevent multi-threaded applications from
shooting themselves in the foot.

However the practical effect of the GIL is that CPU-bound applications
do not multi-thread efficiently; the single-threaded version usually
runs faster.

The place where CPython programs gain from multithreading is where each
thread spends much of its time waiting for some external trigger.

(More specifically, if such a wait is inside well-written C code, it
releases the GIL so other threads can get useful work done. Example is
a thread waiting for internet activity, and blocks inside a system call)


--
DaveA
Message has been deleted

Carlos Nepomuceno

unread,
May 18, 2013, 8:02:29 PM5/18/13
to pytho...@python.org
----------------------------------------
> To: pytho...@python.org
> From: wlf...@ix.netcom.com
> Subject: Re: Please help with Threading
> Date: Sat, 18 May 2013 15:28:56 -0400
>
> On Sat, 18 May 2013 01:58:13 -0700 (PDT), Jurgens de Bruin
> <debr...@gmail.com> declaimed the following in
> gmane.comp.python.general:

>
>> This is my first script where I want to use the python threading module. I have a large dataset which is a list of dict this can be as much as 200 dictionaries in the list. The final goal is a histogram for each dict 16 histograms on a page ( 4x4 ) - this already works.
>> What I currently do is a create a nested list [ [ {} ], [ {} ] ] each inner list contains 16 dictionaries, thus each inner list is a single page of 16 histograms. Iterating over the outer-list and creating the graphs takes to long. So I would like multiple inner-list to be processes simultaneously and creating the graphs in "parallel".
>> I am trying to use the python threading for this. I create 4 threads loop over the outer-list and send a inner-list to the thread. This seems to work if my nested lists only contains 2 elements - thus less elements than threads. Currently the scripts runs and then seems to get hung up. I monitor the resource on my mac and python starts off good using 80% and when the 4-thread is created the CPU usages drops to 0%.
>>
>
> The odds are good that this is just going to run slower...

Just been told that GIL doesn't make things slower, but as I didn't know that such a thing even existed I went out looking for more info and found that document: http://www.dabeaz.com/python/UnderstandingGIL.pdf

Is it current? I didn't know Python threads aren't preemptive. Seems to be something really old considering the state of the art on parallel execution on multi-cores.

What's the catch on making Python threads preemptive? Are there any ongoing projects to make that?

> One: The common Python implementation uses a global interpreter lock
> to prevent interpreted code from interfering with itself in multiple
> threads. So "number cruncher" applications don't gain any speed from
> being partitioned into thread -- even on a multicore processor, only one
> thread can have the GIL at a time. On top of that, you have the overhead
> of the interpreter switching between threads (GIL release on one thread,
> GIL acquire for the next thread).
>
> Python threads work fine if the threads either rely on intelligent
> DLLs for number crunching (instead of doing nested Python loops to
> process a numeric array you pass it to something like NumPy which
> releases the GIL while crunching a copy of the array) or they do lots of
> I/O and have to wait for I/O devices (while one thread is waiting for
> the write/read operation to complete, another thread can do some number
> crunching).
>
> If you really need to do this type of number crunching in Python
> level code, you'll want to look into the multiprocessing library
> instead. That will create actual OS processes (each with a copy of the
> interpreter, and not sharing memory) and each of those can run on a core
> without conflicting on the GIL.

Which library do you suggest?

> --
> Wulfraed Dennis Lee Bieber AF6VN
> wlf...@ix.netcom.com HTTP://wlfraed.home.netcom.com/
>
> --
> http://mail.python.org/mailman/listinfo/python-list

Chris Angelico

unread,
May 18, 2013, 8:38:14 PM5/18/13
to pytho...@python.org
On Sun, May 19, 2013 at 10:02 AM, Carlos Nepomuceno
<carlosne...@outlook.com> wrote:
> I didn't know Python threads aren't preemptive. Seems to be something really old considering the state of the art on parallel execution on multi-cores.
>
> What's the catch on making Python threads preemptive? Are there any ongoing projects to make that?

Preemption isn't really the issue here. On the C level, preemptive vs
cooperative usually means the difference between a stalled thread
locking everyone else out and not doing so. Preemption is done at a
lower level than user code (eg the operating system or the CPU),
meaning that user code can't retain control of the CPU.

With interpreted code eg in CPython, it's easy to implement preemption
in the interpreter. I don't know how it's actually done, but one easy
implementation would be "every N bytecode instructions, context
switch". It's still done at a lower level than user code (N bytecode
instructions might all actually be a single tight loop that the
programmer didn't realize was infinite), but it's not at the OS level.

But none of that has anything to do with multiple core usage. The
problem there is that shared data structures need to be accessed
simultaneously, and in CPython, there's a Global Interpreter Lock to
simplify that; but the consequence of the GIL is that no two threads
can simultaneously execute user-level code. There have been
GIL-removal proposals at various times, but the fact remains that a
global lock makes a huge amount of sense and gives pretty good
performance across the board. There's always multiprocessing when you
need multiple CPU-bound threads; it's an explicit way to separate the
shared data (what gets transferred) from local (what doesn't).

ChrisA

Cameron Simpson

unread,
May 18, 2013, 11:10:36 PM5/18/13
to Carlos Nepomuceno, pytho...@python.org
On 19May2013 03:02, Carlos Nepomuceno <carlosne...@outlook.com> wrote:
| Just been told that GIL doesn't make things slower, but as I
| didn't know that such a thing even existed I went out looking for
| more info and found that document:
| http://www.dabeaz.com/python/UnderstandingGIL.pdf
|
| Is it current? I didn't know Python threads aren't preemptive.
| Seems to be something really old considering the state of the art
| on parallel execution on multi-cores.
| What's the catch on making Python threads preemptive? Are there any ongoing projects to make that?

Depends what you mean by preemptive. If you have multiple CPU bound
pure Python threads they will all get CPU time without any of them
explicitly yeilding control. But thread switching happens between
python instructions, mediated by the interpreter.

The standard answers for using multiple cores is to either run
multiple processes (either explicitly spawning other executables,
or spawning child python processes using the multiprocessing module),
or to use (as suggested) libraries that can do the compute intensive
bits themselves, releasing the while doing so so that the Python
interpreter can run other bits of your python code.

Plenty of OS system calls (and calls to other libraries from the
interpreter) release the GIL during the call. Other python threads
can run during that window.

And there are other Python implementations other than CPython.

Cheers,
--
Cameron Simpson <c...@zip.com.au>

Processes are like potatoes. - NCR device driver manual
Message has been deleted

Chris Angelico

unread,
May 19, 2013, 5:52:23 PM5/19/13
to pytho...@python.org
On Mon, May 20, 2013 at 7:46 AM, Dennis Lee Bieber
<wlf...@ix.netcom.com> wrote:
> On Sun, 19 May 2013 10:38:14 +1000, Chris Angelico <ros...@gmail.com>
> declaimed the following in gmane.comp.python.general:
>> With interpreted code eg in CPython, it's easy to implement preemption
>> in the interpreter. I don't know how it's actually done, but one easy
>> implementation would be "every N bytecode instructions, context
>> switch". It's still done at a lower level than user code (N bytecode
>
> Which IS how the common Python interpreter does it -- barring the
> thread making some system call that triggers a preemption ahead of time
> (even time.sleep(0.0) triggers scheduling). Forget if the default is 20
> or 100 byte-code instructions -- as I recall, it DID change a few
> versions back.

Incidentally, is the context-switch check the same as the check for
interrupt signal raising KeyboardInterrupt? ISTR that was another
"every N instructions" check.

ChrisA

Dave Angel

unread,
May 19, 2013, 9:04:42 PM5/19/13
to pytho...@python.org
On 05/19/2013 05:46 PM, Dennis Lee Bieber wrote:
> On Sun, 19 May 2013 10:38:14 +1000, Chris Angelico <ros...@gmail.com>
> declaimed the following in gmane.comp.python.general:
>
>> On Sun, May 19, 2013 at 10:02 AM, Carlos Nepomuceno
>> <carlosne...@outlook.com> wrote:
>>> I didn't know Python threads aren't preemptive. Seems to be something really old considering the state of the art on parallel execution on multi-cores.
>>>
>>> What's the catch on making Python threads preemptive? Are there any ongoing projects to make that?
>>
> <snip>
>
>> With interpreted code eg in CPython, it's easy to implement preemption
>> in the interpreter. I don't know how it's actually done, but one easy
>> implementation would be "every N bytecode instructions, context
>> switch". It's still done at a lower level than user code (N bytecode
>
> Which IS how the common Python interpreter does it -- barring the
> thread making some system call that triggers a preemption ahead of time
> (even time.sleep(0.0) triggers scheduling). Forget if the default is 20
> or 100 byte-code instructions -- as I recall, it DID change a few
> versions back.
>
> Part of the context switch is to transfer the GIL from the preempted
> thread to the new thread.
>
> So, overall, on a SINGLE CORE processor running multiple CPU bound
> threads takes a bit longer just due to the overhead of thread swapping.
>
> On a multi-core processor, the effect is the same, since -- even
> though one may have a thread running on each core -- the GIL is only
> assigned to one thread, and other threads get blocked when trying to
> access runtime data structures. And you may have even more overhead from
> processor cache misses if the a thread gets assigned to a different
> core.
>
> (yes -- I'm restating the same thing as I had just trimmed below
> this point... but the target is really the OP, where repetition may be
> helpful in understanding)
>

So what's the mapping between real (OS) threads, and the fake ones
Python uses? The OS keeps track of a separate stack and context for
each thread it knows about; are they one-to-one with the ones you're
describing here? If so, then any OS thread that gets scheduled will
almost always find it can't get the GIL, and spend time thrashing. But
the change that CPython does intentionally would be equivalent to a
sleep(0).

On the other hand, if these threads are distinct from the OS threads, is
it done with some sort of thread pool, where CPython has its own stack,
and doesn't really use the one managed by the OS?

Understand the only OS threading I really understand is the one in
Windows (which I no longer use). So assuming Linux has some form of
lightweight threading, the distinction above may not map very well.



--
DaveA
Message has been deleted
Message has been deleted

Fábio Santos

unread,
May 20, 2013, 2:25:17 AM5/20/13
to Dennis Lee Bieber, pytho...@python.org


On 18 May 2013 20:33, "Dennis Lee Bieber" <wlf...@ix.netcom.com> wrote:
>         Python threads work fine if the threads either rely on intelligent
> DLLs for number crunching (instead of doing nested Python loops to
> process a numeric array you pass it to something like NumPy which
> releases the GIL while crunching a copy of the array) or they do lots of
> I/O and have to wait for I/O devices (while one thread is waiting for
> the write/read operation to complete, another thread can do some number
> crunching).

Has nobody thought of a context manager to allow a part of your code to free up the GIL? I think the GIL is not inherently bad, but if it poses a problem at times, there should be a way to get it out of your... Way.

Cameron Simpson

unread,
May 20, 2013, 3:45:14 AM5/20/13
to Fábio Santos, pytho...@python.org, Dennis Lee Bieber
The GIL makes individual python operations thread safe by never
running two at once. This makes the implementation of the operations
simpler, faster and safer. It is probably totally infeasible to
write meaningful python code inside your suggested context
manager that didn't rely on the GIL; if the GIL were not held the
code would be unsafe.

It is easy for a C extension to release the GIL, and then to do
meaningful work until it needs to return to python land. Most C
extensions will do that around non-trivial sections, and anything
that may stall in the OS.

So your use case for the context manager doesn't fit well.
--
Cameron Simpson <c...@zip.com.au>

Gentle suggestions being those which are written on rocks of less than 5lbs.
- Tracy Nelson in comp.lang.c

Carlos Nepomuceno

unread,
May 20, 2013, 3:45:22 AM5/20/13
to pytho...@python.org
----------------------------------------
> Date: Sun, 19 May 2013 13:10:36 +1000
> From: c...@zip.com.au
> To: carlosne...@outlook.com
> CC: pytho...@python.org

> Subject: Re: Please help with Threading
>
> On 19May2013 03:02, Carlos Nepomuceno <carlosne...@outlook.com> wrote:
> | Just been told that GIL doesn't make things slower, but as I
> | didn't know that such a thing even existed I went out looking for
> | more info and found that document:
> | http://www.dabeaz.com/python/UnderstandingGIL.pdf
> |
> | Is it current? I didn't know Python threads aren't preemptive.

> | Seems to be something really old considering the state of the art
> | on parallel execution on multi-cores.
> | What's the catch on making Python threads preemptive? Are there any ongoing projects to make that?
>
> Depends what you mean by preemptive. If you have multiple CPU bound
> pure Python threads they will all get CPU time without any of them
> explicitly yeilding control. But thread switching happens between
> python instructions, mediated by the interpreter.

I meant operating system preemptive. I've just checked and Python does not start Windows threads.

> The standard answers for using multiple cores is to either run
> multiple processes (either explicitly spawning other executables,
> or spawning child python processes using the multiprocessing module),
> or to use (as suggested) libraries that can do the compute intensive
> bits themselves, releasing the while doing so so that the Python
> interpreter can run other bits of your python code.

I've just discovered the multiprocessing module[1] and will make some tests with it later. Are there any other modules for that purpose?

I've found the following articles about Python threads. Any suggestions?

http://www.ibm.com/developerworks/aix/library/au-threadingpython/
http://pymotw.com/2/threading/index.html
http://www.laurentluce.com/posts/python-threads-synchronization-locks-rlocks-semaphores-conditions-events-and-queues/


[1] http://docs.python.org/2/library/multiprocessing.html

Carlos Nepomuceno

unread,
May 20, 2013, 3:53:24 AM5/20/13
to pytho...@python.org
----------------------------------------
> Date: Mon, 20 May 2013 17:45:14 +1000
> From: c...@zip.com.au
> To: fabiosa...@gmail.com

> Subject: Re: Please help with Threading
> CC: pytho...@python.org; wlf...@ix.netcom.com

>
> On 20May2013 07:25, Fábio Santos <fabiosa...@gmail.com> wrote:
> | On 18 May 2013 20:33, "Dennis Lee Bieber" <wlf...@ix.netcom.com> wrote:
> |> Python threads work fine if the threads either rely on intelligent
> |> DLLs for number crunching (instead of doing nested Python loops to
> |> process a numeric array you pass it to something like NumPy which
> |> releases the GIL while crunching a copy of the array) or they do lots of
> |> I/O and have to wait for I/O devices (while one thread is waiting for
> |> the write/read operation to complete, another thread can do some number
> |> crunching).
> |
> | Has nobody thought of a context manager to allow a part of your code to
> | free up the GIL? I think the GIL is not inherently bad, but if it poses a
> | problem at times, there should be a way to get it out of your... Way.
>
> The GIL makes individual python operations thread safe by never
> running two at once. This makes the implementation of the operations
> simpler, faster and safer. It is probably totally infeasible to
> write meaningful python code inside your suggested context
> manager that didn't rely on the GIL; if the GIL were not held the
> code would be unsafe.

I just got my hands dirty trying to synchronize Python prints from many threads.
Sometimes they mess up when printing the newlines.

I tried several approaches using threading.Lock and Condition. None of them worked perfectly and all of them made the code sluggish.

Is there a 100% sure method to make print thread safe? Can it be fast???


> It is easy for a C extension to release the GIL, and then to do
> meaningful work until it needs to return to python land. Most C
> extensions will do that around non-trivial sections, and anything
> that may stall in the OS.
>
> So your use case for the context manager doesn't fit well.
> --
> Cameron Simpson <c...@zip.com.au>
>
> Gentle suggestions being those which are written on rocks of less than 5lbs.
> - Tracy Nelson in comp.lang.c

> --
> http://mail.python.org/mailman/listinfo/python-list

Fábio Santos

unread,
May 20, 2013, 3:55:22 AM5/20/13
to Cameron Simpson, pytho...@python.org, Dennis Lee Bieber

My use case was a tight loop processing an image pixel by pixel, or crunching a CSV file. If it only uses local variables (and probably hold a lock before releasing the GIL) it should be safe, no?

My idea is that it's a little bad to have to write C or use multiprocessing just to do simultaneous calculations. I think an application using a reactor loop such as twisted would actually benefit from this. Sure, it will be slower than a C implementation of the same loop, but isn't fast prototyping a very important feature of the Python language?

On 20 May 2013 08:45, "Cameron Simpson" <c...@zip.com.au> wrote:
On 20May2013 07:25, Fábio Santos <fabiosa...@gmail.com> wrote:
| On 18 May 2013 20:33, "Dennis Lee Bieber" <wlf...@ix.netcom.com> wrote:
| >         Python threads work fine if the threads either rely on intelligent
| > DLLs for number crunching (instead of doing nested Python loops to
| > process a numeric array you pass it to something like NumPy which
| > releases the GIL while crunching a copy of the array) or they do lots of
| > I/O and have to wait for I/O devices (while one thread is waiting for
| > the write/read operation to complete, another thread can do some number
| > crunching).
|
| Has nobody thought of a context manager to allow a part of your code to
| free up the GIL? I think the GIL is not inherently bad, but if it poses a
| problem at times, there should be a way to get it out of your... Way.

The GIL makes individual python operations thread safe by never
running two at once. This makes the implementation of the operations
simpler, faster and safer. It is probably totally infeasible to
write meaningful python code inside your suggested context
manager that didn't rely on the GIL; if the GIL were not held the
code would be unsafe.

Cameron Simpson

unread,
May 20, 2013, 4:35:20 AM5/20/13
to Carlos Nepomuceno, pytho...@python.org
On 20May2013 10:53, Carlos Nepomuceno <carlosne...@outlook.com> wrote:
| I just got my hands dirty trying to synchronize Python prints from many threads.
| Sometimes they mess up when printing the newlines.
| I tried several approaches using threading.Lock and Condition.
| None of them worked perfectly and all of them made the code sluggish.

Show us some code, with specific complaints.

Did you try this?

_lock = Lock()

def lprint(*a, **kw):
global _lock
with _lock:
print(*a, **kw)

and use lprint() everywhere?

For generality the lock should be per file: the above hack uses one
lock for any file, so that's going to stall overlapping prints to
different files; inefficient.

There are other things than the above, but at least individual prints will
never overlap. If you have interleaved prints, show us.

| Is there a 100% sure method to make print thread safe? Can it be fast???

Depends on what you mean by "fast". It will be slower than code
with no lock; how much would require measurement.

Cheers,
--
Cameron Simpson <c...@zip.com.au>

My own suspicion is that the universe is not only queerer than we suppose,
but queerer than we *can* suppose.
- J.B.S. Haldane "On Being the Right Size"
in the (1928) book "Possible Worlds"

Chris Angelico

unread,
May 20, 2013, 5:09:13 AM5/20/13
to pytho...@python.org
On Mon, May 20, 2013 at 6:35 PM, Cameron Simpson <c...@zip.com.au> wrote:
> _lock = Lock()
>
> def lprint(*a, **kw):
> global _lock
> with _lock:
> print(*a, **kw)
>
> and use lprint() everywhere?

Fun little hack:

def print(*args,print=print,lock=Lock(),**kwargs):
with lock:
print(*args,**kwargs)

Question: Is this a cool use or a horrible abuse of the scoping rules?

ChrisA

Fábio Santos

unread,
May 20, 2013, 5:26:20 AM5/20/13
to Chris Angelico, pytho...@python.org

It is pretty cool although it looks like a recursive function at first ;)

Cameron Simpson

unread,
May 20, 2013, 5:54:19 AM5/20/13
to Chris Angelico, pytho...@python.org
On 20May2013 19:09, Chris Angelico <ros...@gmail.com> wrote:
| On Mon, May 20, 2013 at 6:35 PM, Cameron Simpson <c...@zip.com.au> wrote:
| > _lock = Lock()
| >
| > def lprint(*a, **kw):
| > global _lock
| > with _lock:
| > print(*a, **kw)
| >
| > and use lprint() everywhere?
|
| Fun little hack:
|
| def print(*args,print=print,lock=Lock(),**kwargs):
| with lock:
| print(*args,**kwargs)
|
| Question: Is this a cool use or a horrible abuse of the scoping rules?

I carefully avoided monkey patching print itself:-)

That's... mad! I can see what the end result is meant to be, but
it looks like a debugging nightmare. Certainly my scoping-fu is too
weak to see at a glance how it works.
--
Cameron Simpson <c...@zip.com.au>

I will not do it as a hack I will not do it for my friends
I will not do it on a Mac I will not write for Uncle Sam
I will not do it on weekends I won't do ADA, Sam-I-Am
- Gregory Bond <g...@bby.com.au>

Carlos Nepomuceno

unread,
May 20, 2013, 6:01:13 AM5/20/13
to pytho...@python.org
----------------------------------------
> Date: Mon, 20 May 2013 18:35:20 +1000
> Subject: Re: Please help with Threading
>
> On 20May2013 10:53, Carlos Nepomuceno <carlosne...@outlook.com> wrote:
> | I just got my hands dirty trying to synchronize Python prints from many threads.
> | Sometimes they mess up when printing the newlines.
> | I tried several approaches using threading.Lock and Condition.
> | None of them worked perfectly and all of them made the code sluggish.
>
> Show us some code, with specific complaints.
>
> Did you try this?
>
> _lock = Lock()
>
> def lprint(*a, **kw):
> global _lock
> with _lock:
> print(*a, **kw)
>
> and use lprint() everywhere?


It works! Think I was running the wrong script...

Anyway, the suggestion you've made is the third and latest attempt that I've tried to synchronize the print outputs from the threads.

I've also used:

### 1st approach ###
lock  = threading.Lock()
[...]
try:
    lock.acquire()
    [thread protected code]
finally:
    lock.release()


### 2nd approach ###
cond  = threading.Condition()
[...]
try:
    [thread protected code]
    with cond:
        print '[...]'


### 3rd approach ###
from __future__ import print_function

def safe_print(*args, **kwargs):
    global print_lock
    with print_lock:
        print(*args, **kwargs)
[...]
try:
    [thread protected code]
    safe_print('[...]')

Except for the first one all kind of have the same performance. The
problem was I placed the acquire/release around the whole code block,
instead of only the print statements.

Thanks a lot! ;)

Chris Angelico

unread,
May 20, 2013, 6:09:13 AM5/20/13
to pytho...@python.org
On Mon, May 20, 2013 at 7:54 PM, Cameron Simpson <c...@zip.com.au> wrote:
> On 20May2013 19:09, Chris Angelico <ros...@gmail.com> wrote:
> | On Mon, May 20, 2013 at 6:35 PM, Cameron Simpson <c...@zip.com.au> wrote:
> | > _lock = Lock()
> | >
> | > def lprint(*a, **kw):
> | > global _lock
> | > with _lock:
> | > print(*a, **kw)
> | >
> | > and use lprint() everywhere?
> |
> | Fun little hack:
> |
> | def print(*args,print=print,lock=Lock(),**kwargs):
> | with lock:
> | print(*args,**kwargs)
> |
> | Question: Is this a cool use or a horrible abuse of the scoping rules?
>
> I carefully avoided monkey patching print itself:-)
>
> That's... mad! I can see what the end result is meant to be, but
> it looks like a debugging nightmare. Certainly my scoping-fu is too
> weak to see at a glance how it works.

Hehe. Like I said, could easily be called abuse.

Referencing a function's own name in a default has to have one of
these interpretations:

1) It's a self-reference, which can be used to guarantee recursion
even if the name is rebound
2) It references whatever previously held that name before this def statement.

Either would be useful. Python happens to follow #2; though I can't
point to any piece of specification that mandates that, so all I can
really say is that CPython 3.3 appears to follow #2. But both
interpretations make sense, and both would be of use, and use of
either could be called abusive of the rules. Figure that out. :)

The second defaulted argument (lock=Lock()), of course, is a common
idiom. No abuse there, that's pretty Pythonic.

This same sort of code could be done as a decorator:

def serialize(fn):
lock=Lock()
def locked(*args,**kw):
with lock:
fn(*args,**kw)
return locked

print=serialize(print)

Spelled like this, it's obvious that the argument to serialize has to
be the previous 'print'. The other notation achieves the same thing,
just in a quirkier way :)

ChrisA

Ned Batchelder

unread,
May 20, 2013, 6:46:38 AM5/20/13
to Chris Angelico, pytho...@python.org
On 5/20/2013 6:09 AM, Chris Angelico wrote:
> Referencing a function's own name in a default has to have one of
> these interpretations:
>
> 1) It's a self-reference, which can be used to guarantee recursion
> even if the name is rebound
> 2) It references whatever previously held that name before this def statement.

The meaning must be #2. A def statement is nothing more than a fancy
assignment statement. This:

def foo(a):
return a + 1

is really just the same as:

foo = lambda a: a+1

(in fact, they compile to identical bytecode). More complex def's don't
have equivalent lambdas, but are still assignments to the name of the
function. So your "apparently recursive" print function is no more
ambiguous "x = x + 1". The x on the right hand side is the old value of
x, the x on the left hand side will be the new value of x.

# Each of these updates a name
x = x + 1
def print(*args,print=print,lock=Lock(),**kwargs):
with lock:
print(*args,**kwargs)

Of course, if you're going to use that code, a comment might be in order
to help the next reader through the trickiness...

--Ned.

Dave Angel

unread,
May 20, 2013, 7:06:04 AM5/20/13
to pytho...@python.org
On 05/20/2013 03:55 AM, F�bio Santos wrote:
> My use case was a tight loop processing an image pixel by pixel, or
> crunching a CSV file. If it only uses local variables (and probably hold a
> lock before releasing the GIL) it should be safe, no?
>

Are you making function calls, using system libraries, or creating or
deleting any objects? All of these use the GIL because they use common
data structures shared among all threads. At the lowest level, creating
an object requires locked access to the memory manager.


Don't forget, the GIL gets used much more for Python internals than it
does for the visible stuff.


--
DaveA

Chris Angelico

unread,
May 20, 2013, 10:52:25 AM5/20/13
to pytho...@python.org
=On Mon, May 20, 2013 at 8:46 PM, Ned Batchelder <n...@nedbatchelder.com> wrote:
> On 5/20/2013 6:09 AM, Chris Angelico wrote:
>>
>> Referencing a function's own name in a default has to have one of
>> these interpretations:
>>
>> 1) It's a self-reference, which can be used to guarantee recursion
>> even if the name is rebound
>> 2) It references whatever previously held that name before this def
>> statement.
>
>
> The meaning must be #2. A def statement is nothing more than a fancy
> assignment statement.

Sure, but the language could have been specced up somewhat
differently, with the same syntax. I was fairly confident that this
would be universally true (well, can't do it with 'print' per se in
older Pythons, but for others); my statement about CPython 3.3 was
just because I hadn't actually hunted down specification proof.

> So your "apparently recursive" print function is no more
> ambiguous "x = x + 1". The x on the right hand side is the old value of x,
> the x on the left hand side will be the new value of x.
>
> # Each of these updates a name
> x = x + 1
>
> def print(*args,print=print,lock=Lock(),**kwargs):
> with lock:
> print(*args,**kwargs)

Yeah. The decorator example makes that fairly clear.

> Of course, if you're going to use that code, a comment might be in order to
> help the next reader through the trickiness...

Absolutely!!

ChrisA

Fábio Santos

unread,
May 20, 2013, 2:04:50 PM5/20/13
to Dave Angel, pytho...@python.org

I didn't know that.


On 20 May 2013 12:10, "Dave Angel" <da...@davea.name> wrote:
> Are you making function calls, using system libraries, or creating or deleting any objects?  All of these use the GIL because they use common data structures shared among all threads.  At the lowest level, creating an object requires locked access to the memory manager.
>
>
> Don't forget, the GIL gets used much more for Python internals than it does for the visible stuff.

I did not know that. It's both interesting and somehow obvious, although I didn't know it yet.

88888 Dihedral

unread,
May 20, 2013, 9:44:46 PM5/20/13
to
Chris Angelico於 2013年5月20日星期一UTC+8下午5時09分13秒寫道:
OK, if the python interpreter has a global hiden print out
buffer of ,say, 2to 16 K bytes, and all string print functions
just construct the output string from the format to this string
in an efficient low level way, then the next question
would be that whether the uses can use functions in this
low level buffer for other string formatting jobs.

Chris Angelico

unread,
May 20, 2013, 9:50:17 PM5/20/13
to pytho...@python.org
On Tue, May 21, 2013 at 11:44 AM, 88888 Dihedral
<dihedr...@googlemail.com> wrote:
> OK, if the python interpreter has a global hiden print out
> buffer of ,say, 2to 16 K bytes, and all string print functions
> just construct the output string from the format to this string
> in an efficient low level way, then the next question
> would be that whether the uses can use functions in this
> low level buffer for other string formatting jobs.

You remind me of George.
http://www.chroniclesofgeorge.com/

Both make great reading when I'm at work and poking around with random
stuff in our .SQL file of carefully constructed mayhem.

ChrisA

Carlos Nepomuceno

unread,
May 20, 2013, 10:53:46 PM5/20/13
to pytho...@python.org
sys.stdout.write() does not suffer from the newlines mess up when printing from many threads, like print statement does.

The only usage difference, AFAIK, is to add '\n' at the end of the string.

It's faster and thread safe (really?) by default.

BTW, why I didn't find the source code to the sys module in the 'Lib' directory?

----------------------------------------
> Date: Tue, 21 May 2013 11:50:17 +1000


> Subject: Re: Please help with Threading

> From: ros...@gmail.com
> To: pytho...@python.org


>
> On Tue, May 21, 2013 at 11:44 AM, 88888 Dihedral
> <dihedr...@googlemail.com> wrote:

>> OK, if the python interpreter has a global hiden print out
>> buffer of ,say, 2to 16 K bytes, and all string print functions
>> just construct the output string from the format to this string
>> in an efficient low level way, then the next question
>> would be that whether the uses can use functions in this
>> low level buffer for other string formatting jobs.
>

> You remind me of George.
> http://www.chroniclesofgeorge.com/
>
> Both make great reading when I'm at work and poking around with random
> stuff in our .SQL file of carefully constructed mayhem.
>
> ChrisA

> --
> http://mail.python.org/mailman/listinfo/python-list

Carlos Nepomuceno

unread,
May 20, 2013, 11:36:02 PM5/20/13
to pytho...@python.org
> On Tue, May 21, 2013 at 11:44 AM, 88888 Dihedral
> <dihedr...@googlemail.com> wrote:
>> OK, if the python interpreter has a global hiden print out
>> buffer of ,say, 2to 16 K bytes, and all string print functions
>> just construct the output string from the format to this string
>> in an efficient low level way, then the next question
>> would be that whether the uses can use functions in this
>> low level buffer for other string formatting jobs.
>
> You remind me of George.
> http://www.chroniclesofgeorge.com/
>
> Both make great reading when I'm at work and poking around with random
> stuff in our .SQL file of carefully constructed mayhem.
>
> ChrisA


lol I need more cowbell!!! Please!!! lol

Steven D'Aprano

unread,
May 21, 2013, 12:48:31 AM5/21/13
to
On Tue, 21 May 2013 05:53:46 +0300, Carlos Nepomuceno wrote:

> BTW, why I didn't find the source code to the sys module in the 'Lib'
> directory?

Because sys is a built-in module. It is embedded in the Python
interpreter.

--
Steven

Jurgens de Bruin

unread,
Jun 2, 2013, 9:47:48 PM6/2/13
to
On Saturday, 18 May 2013 10:58:13 UTC+2, Jurgens de Bruin wrote:
> This is my first script where I want to use the python threading module. I have a large dataset which is a list of dict this can be as much as 200 dictionaries in the list. The final goal is a histogram for each dict 16 histograms on a page ( 4x4 ) - this already works.
>
> What I currently do is a create a nested list [ [ {} ], [ {} ] ] each inner list contains 16 dictionaries, thus each inner list is a single page of 16 histograms. Iterating over the outer-list and creating the graphs takes to long. So I would like multiple inner-list to be processes simultaneously and creating the graphs in "parallel".
>
> I am trying to use the python threading for this. I create 4 threads loop over the outer-list and send a inner-list to the thread. This seems to work if my nested lists only contains 2 elements - thus less elements than threads. Currently the scripts runs and then seems to get hung up. I monitor the resource on my mac and python starts off good using 80% and when the 4-thread is created the CPU usages drops to 0%.
>
>
>
> My thread creating is based on the following : http://www.tutorialspoint.com/python/python_multithreading.htm
>
>
>
> Any help would be create!!!

Thanks to all for the discussion/comments on threading, although I have not been commenting I have been following. I have learnt a lot and I am still reading up on everything mentioned. Thanks again
Will see how I am going to solve my senario.
0 new messages