Google Groups unterstützt keine neuen Usenet-Beiträge oder ‑Abos mehr. Bisherige Inhalte sind weiterhin sichtbar.

PyThread_acquire_lock freezes at pthread_cond_wait although lock not occupied

4 Aufrufe
Direkt zur ersten ungelesenen Nachricht

Gernot Hillier

ungelesen,
06.02.2003, 09:16:2306.02.03
an
Hi!

I'm now debugging a very nasty bug in a multithreadded program embedding
Python for several days.

After quite some work I found the following:

One of the threads occasionally locks at
PyThread_acquire_lock/pthread_cond_wait when trying to get the
interpreter_lock or the import_lock. This thread will block there forever.

But other threads may get the same lock w/o any problem at all as it seems.
And when I look on it in gdb it looks even more astonishing:

(gdb) bt
#0 0x4026cea9 in sigsuspend () from /lib/libc.so.6
#1 0x4003bd48 in __pthread_wait_for_restart_signal () from
/lib/libpthread.so.0
#2 0x4003814b in pthread_cond_wait () from /lib/libpthread.so.0
#3 0x080fc44a in PyThread_acquire_lock (lock=0x81ac940, waitflag=1)
at Python/thread_pthread.h:374
#4 0x080d3187 in PyEval_RestoreThread (tstate=0x815a358) at
Python/ceval.c:342
#5 0x0806ac04 in capisuite_disconnect (args=0x8208bdc) at
capisuitemodule.cpp:422
#6 0x080ab897 in PyCFunction_Call (func=0x8172fa0, arg=0x8208bdc,
kw=0x4035bc90) at Objects/methodobject.c:90
#7 0x080d0290 in eval_frame (f=0x81e42dc) at Python/ceval.c:2004
#8 0x080d0c9c in PyEval_EvalCodeEx (co=0x826c318, globals=0xfffffffc,
locals=0x0, args=0x81fbda4, argcount=5, kws=0x81fbdb8, kwcount=0,
defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2585
#9 0x080d1c4d in fast_function (func=0xfffffffc, pp_stack=0xfffffffc, n=-4,
na=5, nk=136297892) at Python/ceval.c:3161
#10 0x080d01e7 in eval_frame (f=0x81fbbec) at Python/ceval.c:2024
#11 0x080d0c9c in PyEval_EvalCodeEx (co=0x82ae2a8, globals=0xfffffffc,
locals=0x0,args=0x81f77b8, argcount=1, kws=0x0, kwcount=0, defs=0x0,
defcount=0, closure=0x0) at Python/ceval.c:2585
#12 0x08115f84 in function_call (func=0x8265f0c, arg=0x81f77ac, kw=0x0)
at Objects/funcobject.c:374
#13 0x080936ae in PyObject_Call (func=0x8, arg=0x81f77ac, kw=0x0) at
Objects/abstract.c:1684
#14 0x080d1a49 in PyEval_CallObjectWithKeywords (func=0x8265f0c,
arg=0x81f77ac, kw=0x0) at Python/ceval.c:3049
#15 0x08093685 in PyObject_CallObject (o=0x8265f0c, a=0x81f77ac) at
Objects/abstract.c:1675
#16 0x08066386 in PythonScript::run() (this=0x8174ee0) at
pythonscript.cpp:80
#17 0x0806822e in IdleScript::run() (this=0x8174ee0) at idlescript.cpp:59
#18 0x40090ee1 in ost::ThreadImpl::ThreadExecHandler(ost::Thread*) ()
from /usr/lib/libccgnu2-0.99.so.0
#19 0x4009025c in ccxx_exec_handler.1 () from /usr/lib/libccgnu2-0.99.so.0
#20 0x400391b0 in pthread_start_thread () from /lib/libpthread.so.0

Ok, it tries to get the global lock but:

(gdb) print *((pthread_lock*) interpreter_lock)
$7 = {locked = 0 '\0', lock_released = {__c_lock = {__status = 0,
__spinlock = 0}, __c_waiting = 0x0}, mut = {__m_reserved = 0, __m_count = 0,
__m_owner = 0x0, __m_kind = 0, __m_lock = {__status = 0, __spinlock = 0}}}

So the lock is indeed not locked!! I don't understand this at all.

The same phenomenom I saw once when trying to get the import_lock.

pthread_cond_wait () was blocked but import_lock_level was 0 and
import_lock_thread was -1.

Anybody seen anything like this?

It occurs on GNU/Linux using pthreads from glibc 2.2.5. I'm using Python
2.2.1 but can't see that 2.2.2 will improve this somehow...

This is very important for me as the program will get my diploma thesis and
if I couldn't get this problem solved in the next week, I'll get in real
trouble. :-((

So please, please if any of you has some idea which could be helpful, please
tell me!! TIA!!!

--
Ciao,

Gernot

Jeremy Hylton

ungelesen,
06.02.2003, 13:18:0906.02.03
an
Gernot Hillier <ghi...@suse.de> wrote in message news:<b1tqqs$q4e$1...@Fourier.suse.de>...

> pthread_cond_wait () was blocked but import_lock_level was 0 and
> import_lock_thread was -1.
>
> Anybody seen anything like this?
>
> It occurs on GNU/Linux using pthreads from glibc 2.2.5. I'm using Python
> 2.2.1 but can't see that 2.2.2 will improve this somehow...
>
> This is very important for me as the program will get my diploma thesis and
> if I couldn't get this problem solved in the next week, I'll get in real
> trouble. :-((
>
> So please, please if any of you has some idea which could be helpful, please
> tell me!! TIA!!!

A condition variable that is waiting does not hold the lock. You can
only call wait with the lock acquired, and the first thing wait does
is release the lock.
So there shouldn't be any surprise that you are blocked in wait
without holding
the lock.

If you have a multi-threaded application, you should look at what all
the other threads are doing. One of the other threads has the lock
and hasn't released
it. There's a good chance that there is inconsistent use of app-level
locking
with the GIL.

Jeremy

Tim Peters

ungelesen,
06.02.2003, 14:48:2206.02.03
an
[Gernot Hillier]

> I'm now debugging a very nasty bug in a multithreadded program embedding
> Python for several days.

So you've barely got a start on it <wink>.

> After quite some work I found the following:
>
> One of the threads occasionally locks at
> PyThread_acquire_lock/pthread_cond_wait when trying to get the
> interpreter_lock or the import_lock. This thread will block there forever.
>
> But other threads may get the same lock w/o any problem at all as
> it seems.
> And when I look on it in gdb it looks even more astonishing:
>
> (gdb) bt
> #0 0x4026cea9 in sigsuspend () from /lib/libc.so.6
> #1 0x4003bd48 in __pthread_wait_for_restart_signal () from
> /lib/libpthread.so.0

That's enough. If the it never gets a restart signal, it will stay there
forever. Whether it does get a restart signal is out of Python's hands,
though -- that's up to the pthreads implementation.

> ...


> Ok, it tries to get the global lock but:
>
> (gdb) print *((pthread_lock*) interpreter_lock)
> $7 = {locked = 0 '\0', lock_released = {__c_lock = {__status = 0,
> __spinlock = 0}, __c_waiting = 0x0}, mut = {__m_reserved = 0,
> __m_count = 0,
> __m_owner = 0x0, __m_kind = 0, __m_lock = {__status = 0, __spinlock = 0}}}
>
> So the lock is indeed not locked!! I don't understand this at all.

"The lock" is ambiguous. There's the pthreads mutex ("mut" in the above),
and there's the Python GIL (implemented by that entire data structure). As
Jeremy said, it's normal for mut to be unlocked during a condvar wait.
Whether the GIL is locked is really irrelevant, because your stack trace
shows that it's in the bowels of the platform condvar wait implementation,
presumably waiting for a signal it's never going to get.

> The same phenomenom I saw once when trying to get the import_lock.
>
> pthread_cond_wait () was blocked but import_lock_level was 0 and
> import_lock_thread was -1.
>
> Anybody seen anything like this?
>
> It occurs on GNU/Linux using pthreads from glibc 2.2.5. I'm using Python
> 2.2.1 but can't see that 2.2.2 will improve this somehow...

Trying Python 2.3a1 might. The GIL under Linux is implemented via POSIX
semaphores in 2.3, instead of via a condvar+mutex pair.

> This is very important for me as the program will get my diploma
> thesis and if I couldn't get this problem solved in the next week,
> I'll get in real trouble. :-((

Then let me ask you an odd question: are you using fork()? If so, move
heaven and earth to get rid of it. Over a year ago a number of people spent
more than a week trying to solve a similar problem on Linux, and never did
manage to solve it. All the evidence pointed to a bug in the Linux pthreads
implementation, due to improper treatment of internal pthreads memory after
a fork. Forking and threading mix like bananas and motor oil under the best
of conditions.

If you're not using fork(), I have no ideas other than to try a different
OS, or move to Python 2.3a1 and hope the same bug doesn't plague your
platform semaphore implementation.

all-oses-are-buggy-ly y'rs - tim


Gernot Hillier

ungelesen,
06.02.2003, 16:13:1506.02.03
an
Hi!

Jeremy Hylton wrote:
> A condition variable that is waiting does not hold the lock. You can

I think you misunderstood me. I don't speak about a Python-level condition
variable. I mean the condition variable which is used internally to
implement the lock:

I'll quote Python/thread_pthread.h (shortened):

PyThread_acquire_lock(PyThread_type_lock lock, int waitflag)
{
status = pthread_mutex_lock( &thelock->mut );
success = thelock->locked == 0;
if (success) thelock->locked = 1;
status = pthread_mutex_unlock( &thelock->mut );

if ( !success && waitflag ) {
status = pthread_mutex_lock( &thelock->mut );
while ( thelock->locked ) {
status = pthread_cond_wait(&thelock->lock_released,
&thelock->mut);
}
thelock->locked = 1;
status = pthread_mutex_unlock( &thelock->mut );
success = 1;
}
if (error) success = 0;
return success;
}

So if I'm understanding this right, pthread_cond_wait() will be only called
if the lock is held and ...

void
PyThread_release_lock(PyThread_type_lock lock)
{
status = pthread_mutex_lock( &thelock->mut );
thelock->locked = 0;
status = pthread_mutex_unlock( &thelock->mut );

status = pthread_cond_signal( &thelock->lock_released );
}

... will be signalled as soon as the GIL is released.

So IMHO thelock->locked MUST be 1 while pthread_cond_wait is blocking.

But sometimes it isn't :-(

> If you have a multi-threaded application, you should look at what all
> the other threads are doing. One of the other threads has the lock
> and hasn't released
> it.

That was my first idea, but no other thread is doing anything Python-related
at the moment the one thread is blocking.

And what is very astonishing is the fact that other threads can (later on)
quite happily acquire and release the GIL as often as they want to - my
firstly started thread still blocks at the cond_wait() call and will do so
forever :-(

> There's a good chance that there is inconsistent use of app-level
> locking with the GIL.

Possible, but I tripled checked all parts of my programs - and that wouldn't
explain how a similar block can occur with the import_lock - as I don't use
this thingie myself anywhere in my code...

--
Ciao,

Gernot

Gernot Hillier

ungelesen,
06.02.2003, 16:38:4406.02.03
an
Hi!

Tim Peters wrote:
> [Gernot Hillier]


>> (gdb) bt
>> #0 0x4026cea9 in sigsuspend () from /lib/libc.so.6
>> #1 0x4003bd48 in __pthread_wait_for_restart_signal () from
>> /lib/libpthread.so.0
>
> That's enough. If the it never gets a restart signal, it will stay there
> forever. Whether it does get a restart signal is out of Python's hands,
> though -- that's up to the pthreads implementation.

Anything I can do to see if this signal arrives?

Any nice gdb magic to get closer to the solution?

But as this also seems to be a race condition (won't happen everytime, seems
to be dependant on the machine speed, doesn't occur on any machine, ...), I
doubt it will occur when running in gdb at all. *sigh*

>> (gdb) print *((pthread_lock*) interpreter_lock)
>> $7 = {locked = 0 '\0', lock_released = {__c_lock = {__status = 0,
>> __spinlock = 0}, __c_waiting = 0x0}, mut = {__m_reserved = 0,
>> __m_count = 0,
>> __m_owner = 0x0, __m_kind = 0, __m_lock = {__status = 0, __spinlock =
>> 0}}}
>>
>> So the lock is indeed not locked!! I don't understand this at all.
>
> "The lock" is ambiguous. There's the pthreads mutex ("mut" in the above),
> and there's the Python GIL (implemented by that entire data structure).
> As Jeremy said, it's normal for mut to be unlocked during a condvar wait.

I referred to interpreter_lock->locked.

What I wanted to say: when it blocks at cond_wait_signal() but "locked"
being 0 at the same time it can't be that I've forgetten a ReleaseLock()
anywhere in my source code so it can't be my fault, can it?

> Whether the GIL is locked is really irrelevant, because your stack trace
> shows that it's in the bowels of the platform condvar wait implementation,
> presumably waiting for a signal it's never going to get.

Just want to be sure it can't be a problem of my embedding/extending code...

>> It occurs on GNU/Linux using pthreads from glibc 2.2.5. I'm using Python
>> 2.2.1 but can't see that 2.2.2 will improve this somehow...
>
> Trying Python 2.3a1 might. The GIL under Linux is implemented via POSIX
> semaphores in 2.3, instead of via a condvar+mutex pair.

Hmmm... No real solution for me as this program must run on the Python
stable tree. But anyway surely worth a try. Thx for the suggestion...

> Then let me ask you an odd question: are you using fork()? If so, move
> heaven and earth to get rid of it. Over a year ago a number of people

[...]

No, I'm not using fork(). Only threads via the CommonC++ via libpthreads.

> If you're not using fork(), I have no ideas other than to try a different
> OS,

I doubt that this will be an option for me (look at my mail address) ;-))

> or move to Python 2.3a1 and hope the same bug doesn't plague your
> platform semaphore implementation.

Hmmm... I'll firstly try to get rid of the CommonC++ library and implement
my threads myself using pthreads. I already had some very odd bugs which
nailed down to only occur when using the CommonC++ library and which
disappeared when I used libpthreads directly.

So I'll give that a try in the next hours/days (let's see)...

> all-oses-are-buggy-ly y'rs - tim

:)

--
Ciao,

Gernot

Laura Creighton

ungelesen,
06.02.2003, 16:51:2906.02.03
an
> Forking and threading mix like bananas and motor oil under the best
> of conditions. - tim

+1 for QOTW -- and its been a good week ...

Laura

Erik Max Francis

ungelesen,
06.02.2003, 18:38:2506.02.03
an
Laura Creighton wrote:

> +1 for QOTW -- and its been a good week ...

How come nobody was making QOTW recommendations when I was doing the
Python-URL! digests? :-)

--
Erik Max Francis / m...@alcyone.com / http://www.alcyone.com/max/
__ San Jose, CA, USA / 37 20 N 121 53 W / &tSftDotIotE
/ \ War is like love, it always finds a way.
\__/ Bertolt Brecht
Bosskey.net: Return to Wolfenstein / http://www.bosskey.net/rtcw/
A personal guide to Return to Castle Wolfenstein.

o maj

ungelesen,
06.02.2003, 20:50:4906.02.03
an
> Is python ideal for beginners?

Well, Pedro you do have a point, although you should try and change
your method of presentation. It does look like a cheap marketing
gimmick, and i am sure it put lots of people off reading it.

Okay, 2-3 years ago, i tried to study python as a first language,
because i went to Eric Raymonds site and basically he recommended it
for beginers.
Initially, i did have problems trying to get round it, because i
couldnt find many basic examples. But because i was determined to
learn it, i simply
borrowed a book of basic algorithms and studied them. I then
transfered this
knowledge to python.

I also found it very easy then to pick up vb, vbscript and later c. I
doubt i would have found it easy to pick up other languages had i
started with vb.

Python is ideal for the determined beginner, but is a poor choice for
an unmotivated person who wants to be spoon fed.

Jimmy Retzlaff

ungelesen,
06.02.2003, 21:28:4206.02.03
an
o maj [oma...@talk21.com] wrote:

> Python is ideal for the determined beginner, but is a poor choice
> for an unmotivated person who wants to be spoon fed.

I'm not sure an "unmotivated person who wants to be spoon fed" is going
to develop into a very effective programmer anyway. In my estimation,
programming at any non-trivial level takes a fair amount of persistence
and initiative.

Jimmy

Laura Creighton

ungelesen,
07.02.2003, 03:10:0007.02.03
an
> Laura Creighton wrote:
>
> > +1 for QOTW -- and its been a good week ...
>
> How come nobody was making QOTW recommendations when I was doing the
> Python-URL! digests? :-)

You were doing such a good job we didn't think you needed any help.

Laura

Laura Creighton

ungelesen,
07.02.2003, 03:58:0507.02.03
an

Some bright people, who have always had excellent textbooks written for
their level of experience in the world at hand whenever they have had to
learn anything, miss out on developing these skills. They have never
had to work at learning anything ... it always slipped right into their
brains with no fuss whatsoever. For them, learning something becomes
only a matter of reading.

They don't know that this means that the authors of their books
and their teachers have been doing a truly excellent job at a really
hard task. Because they are indeed very, very, bright, it is
not surprising that they believe that the reason they learn well has
only to do with how intelligent they are.

Universities have to watch for these people. Some of them cannot adapt
to an environment where they are exposed to people who cannnot teach,
texbooks which are clear as mud, and a where a failed undergraduate
<them> is not considered a major exisential tragedy. After first term
exams, a significant subset of them jump off bridges and the like.

They could learn to program just fine. But first they have to learn how
to learn.

Laura


Michael Hudson

ungelesen,
07.02.2003, 07:38:4307.02.03
an
oma...@talk21.com (o maj) writes:

> Python is ideal for the determined beginner, but is a poor choice
> for an unmotivated person who wants to be spoon fed.

What is a good choice for such a person? The OP seemed to suggest C,
which is quite funny :-)

Cheers,
M.

--
> Emacs is a fashion statement.
No, Gnus is a fashion statement. Emacs is clothing. Everyone
else is running around naked.
-- Karl Kleinpaste & Jonadab the Unsightly One, gnu.emacs.gnus

Arthur

ungelesen,
07.02.2003, 08:29:0007.02.03
an
> > Python is ideal for the determined beginner, but is a poor choice
> > for an unmotivated person who wants to be spoon fed.
>
> I'm not sure an "unmotivated person who wants to be spoon fed" is going
> to develop into a very effective programmer anyway. In my estimation,
> programming at any non-trivial level takes a fair amount of persistence
> and initiative.

I think both points are quite true.

Which is why I found it so bizarre that there was a period of time here in
Python land where some of the most trivial of the challenges faced by anyone
approaching programming via Python were being given so much attention - and
which so much solemnity. Having felt that is one area that I could speak to
with some authority - and knowing quite well that I faced some real
challenges, but they were miles away from the issues on which there was so
much focus. Having developed some concrete ideas - from my experience - of
what might be done to tweak things to improve the Python learning experience
for those approaching it with persistance and initiative - well the time
seems to have past where that is a subject of much interest.

Art


Gernot Hillier

ungelesen,
07.02.2003, 11:52:5807.02.03
an
Hi!

Gernot Hillier wrote:
>> or move to Python 2.3a1 and hope the same bug doesn't plague your
>> platform semaphore implementation.
>
> Hmmm... I'll firstly try to get rid of the CommonC++ library and implement
> my threads myself using pthreads. I already had some very odd bugs which
> nailed down to only occur when using the CommonC++ library and which
> disappeared when I used libpthreads directly.

Ok, done that. Doesn't help either. :-((

--
Ciao,

Gernot

Carlos Ribeiro

ungelesen,
07.02.2003, 15:45:2507.02.03
an
On Friday 07 February 2003 12:38 pm, Michael Hudson wrote:
> oma...@talk21.com (o maj) writes:
> > Python is ideal for the determined beginner, but is a poor choice
> > for an unmotivated person who wants to be spoon fed.
>
> What is a good choice for such a person? The OP seemed to suggest C,
> which is quite funny :-)

Prozac :-)


Carlos Ribeiro
crib...@mail.inet.com.br

Mike Meyer

ungelesen,
07.02.2003, 20:56:0107.02.03
an
Laura Creighton <l...@strakt.com> writes:
> > How come nobody was making QOTW recommendations when I was doing the
> > Python-URL! digests? :-)
> You were doing such a good job we didn't think you needed any help.

Hey!

<mike
--
Mike Meyer <m...@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.

0 neue Nachrichten