"Fatal Python error: Couldn't create autoTLSkey mapping"

3,460 views
Skip to first unread message

Lisper

unread,
Oct 2, 2011, 11:35:47 AM10/2/11
to mod...@googlegroups.com
SLSIA pretty much.  This happens to me when I call subprocess.popen from within a wsgi script with mod_wsgi running in daemon mode.  It doesn't seem to matter what the subprocess actually does (I use 'whoami' for testing).  It's a very weird error because it is intermittent but not random.  It's 100% predictable and repeatable.  It happens exactly ever other invocation of the wsgi script.  When I call thread.get_ident() it turns out that there are two threads in my wsgi process.  One of them produces this error and the other one doesn't.  This is particularly weird because the default number of threads is supposed to be 15, so I don't understand how I can end up with only two.

I'm running a clean build of mod_wsgi 3.3 on OS X 10.6.8.

What does this error mean anyway?  What is autoTLS?  TLS normally means Transport Layer Security but I don't see how that could have anything to do with this problem.

Graham Dumpleton

unread,
Oct 2, 2011, 7:47:37 PM10/2/11
to mod...@googlegroups.com
TLS = Thread Local Storage

What Apache version are you using? The Apache supplied one or have you
built Apache from very recent Apache source code?

Graham

> --
> You received this message because you are subscribed to the Google Groups
> "modwsgi" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/modwsgi/-/FTE7A8WZ8msJ.
> To post to this group, send email to mod...@googlegroups.com.
> To unsubscribe from this group, send email to
> modwsgi+u...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/modwsgi?hl=en.
>

Ron Garret

unread,
Oct 2, 2011, 11:56:55 PM10/2/11
to mod...@googlegroups.com

On Oct 2, 2011, at 4:47 PM, Graham Dumpleton wrote:

> TLS = Thread Local Storage

Ah. That makes a lot more sense :-)

> What Apache version are you using? The Apache supplied one or have you
> built Apache from very recent Apache source code?

The built-in one.

Apache/2.2.17 (Unix) PHP/5.3.4 mod_ssl/2.2.17 OpenSSL/0.9.8r DAV/2 mod_wsgi/3.3 Python/2.7.2

rg

Graham Dumpleton

unread,
Oct 3, 2011, 4:36:56 AM10/3/11
to mod...@googlegroups.com
If you are using default of 15 request threads, you should actually be
seeing 18 threads in total in the mod_wsgi daemon process. One is the
original process thread which is waiting for process shutdown, two
background threads which monitor for liveness of the process and
shutdown triggers and then the 15 request threads.

How are you determining that there are two threads, a core stack dump?
If yes, I would like to see those stack dumps.

FWIW, I did see this issue recently, but was with a custom Apache
source build and Apple Python 2.6.5. Using Apache and Python supplied
by Apple, have no problems for same program that was causing the
issue. If you can provide a simple WSGI hello world that shows the
problem, then I can perhaps test it on my Apache/Python combo which I
know might be susceptible as still have it set up.

Graham

> --
> You received this message because you are subscribed to the Google Groups "modwsgi" group.

Ron Garret

unread,
Oct 3, 2011, 11:35:22 AM10/3/11
to mod...@googlegroups.com

On Oct 3, 2011, at 1:36 AM, Graham Dumpleton wrote:

> How are you determining that there are two threads, a core stack dump?

No, it's the result of calling thread.get_ident() within the application itself. See below.

> FWIW, I did see this issue recently, but was with a custom Apache
> source build and Apple Python 2.6.5. Using Apache and Python supplied
> by Apple, have no problems for same program that was causing the
> issue. If you can provide a simple WSGI hello world that shows the
> problem, then I can perhaps test it on my Apache/Python combo which I
> know might be susceptible as still have it set up.

Here you go. It's about as simple as it can be.


import subprocess, thread

def hello_world(environ, start_response):
s = subprocess.Popen(['whoami'], stdout=subprocess.PIPE).stdout.read()
start_response( '200 OK', [('Content-type', 'text/plain')])
return [str(thread.get_ident()), ' - ', s]

application = hello_world


Lisper

unread,
Oct 7, 2011, 8:26:12 AM10/7/11
to mod...@googlegroups.com
BTW, I get the exact same result on Linux.

Lisper

unread,
Oct 7, 2011, 11:13:47 AM10/7/11
to mod...@googlegroups.com
More info: it turns out that what is failing is the call to os.fork().  The system call itself succeeds, but every other time the child process terminates with the error in the subject line.

Lisper

unread,
Oct 7, 2011, 1:16:53 PM10/7/11
to mod...@googlegroups.com
I've dug into this some more, and this issue seems to be quite a rabbit hole.  The TL;DR version seems to be that calling fork() from a process that has threads is a Really Bad Idea (tm) and cannot be reasonably expected to work reliably.

If anyone wants details let me know.

Graham Dumpleton

unread,
Oct 7, 2011, 10:45:15 PM10/7/11
to mod...@googlegroups.com
On 8 October 2011 04:16, Lisper <ron.g...@gmail.com> wrote:
> I've dug into this some more, and this issue seems to be quite a rabbit
> hole.  The TL;DR version seems to be that calling fork() from a process that
> has threads is a Really Bad Idea (tm) and cannot be reasonably expected to
> work reliably.

If all the process is then doing is subsequent exec then it is
generally not a problem. It is when it wants to do mored complicate
stuff that it becomes an issue. So to blanket say it is a really bad
idea is plain wrong.

Sorry that haven't responded in this issue properly. Been tied up
doing other changes in mod_wsgi related to work that have been trying
to get on top of first. I will try and investigate it soon.

If someone wants to see if they can come up with a test program that
doesn't involve subprocess module and uses just fork/exec directly
then that would help.

The subprocess module has been a source of problems in the past with
assumptions it makes, such as flushing stdout/stderr after the fork,
ignoring the fact that there may be buffered data that parent would
also flush. End result is duplicated content in logs.

Graham

> If anyone wants details let me know.
>

> --
> You received this message because you are subscribed to the Google Groups
> "modwsgi" group.

> To view this discussion on the web visit

> https://groups.google.com/d/msg/modwsgi/-/ocxE7791-QgJ.

Graham Dumpleton

unread,
Oct 7, 2011, 11:45:37 PM10/7/11
to mod...@googlegroups.com
Hmmm, the local setup I thought I was getting this issue on, I am not,
The error on my local setup was:

[Fri Sep 23 15:05:13 2011] [notice] child pid 97541 exit signal Abort trap (6)
Fatal Python error: XXX block stack underflow

So, something entirely different.

If you can distill that test case down to exclude subprocess it would
thus help because if it is anything would suspect subprocess doing
something.

If you aren't already, try adding:

WSGIApplicationGroup %{GLOBAL}

into Apache configuration just for that WSGI application.

This issue if there is one is likely going to relate to subprocess and
forking in sub interpreters.

Graham

Ron Garret

unread,
Oct 8, 2011, 2:06:41 AM10/8/11
to mod...@googlegroups.com

On Oct 7, 2011, at 8:45 PM, Graham Dumpleton wrote:

> Hmmm, the local setup I thought I was getting this issue on, I am not,
> The error on my local setup was:
>
> [Fri Sep 23 15:05:13 2011] [notice] child pid 97541 exit signal Abort trap (6)
> Fatal Python error: XXX block stack underflow
>
> So, something entirely different.
>
> If you can distill that test case down to exclude subprocess it would
> thus help because if it is anything would suspect subprocess doing
> something.

I can reliably reproduce the problem thusly:

pid = os.fork()
if pid:
sys.stderr.write('Fork succeeded (PID=%s)\n' % pid)
return ['OK - ', pid]
else:
sys.stderr.write('Fork succeeded (child PID=%s)\n' % os.getpid())
os._exit(0)

Here's a small excerpt from my error log, the result of reloading the page four times:

[Fri Oct 07 08:15:15 2011] [error] Fork succeeded (PID=90595)
[Fri Oct 07 08:15:15 2011] [error] Fork succeeded (child PID=90595)
[Fri Oct 07 08:15:35 2011] [error] Fork succeeded (PID=90599)
Fatal Python error: Couldn't create autoTLSkey mapping
[Fri Oct 07 08:15:43 2011] [error] Fork succeeded (PID=90600)
[Fri Oct 07 08:15:43 2011] [error] Fork succeeded (child PID=90600)
[Fri Oct 07 08:15:45 2011] [error] Fork succeeded (PID=90601)
Fatal Python error: Couldn't create autoTLSkey mapping

> If you aren't already, try adding:
>
> WSGIApplicationGroup %{GLOBAL}
>
> into Apache configuration just for that WSGI application.

That made the problem go away.

> This issue if there is one is likely going to relate to subprocess and
> forking in sub interpreters.

I'm pretty sure the error is being generated by this code in pystate.py:

/* Reset the TLS key - called by PyOS_AfterFork.
* This should not be necessary, but some - buggy - pthread implementations
* don't flush TLS on fork, see issue #10517.
*/
void
_PyGILState_Reinit(void)
{
PyThreadState *tstate = PyGILState_GetThisThreadState();
PyThread_delete_key(autoTLSkey);
if ((autoTLSkey = PyThread_create_key()) == -1)
Py_FatalError("Could not allocate TLS entry");

/* re-associate the current thread state with the new key */
if (PyThread_set_key_value(autoTLSkey, (void *)tstate) < 0)
Py_FatalError("Couldn't create autoTLSkey mapping");
}

> If all the process is then doing is subsequent exec then it is
> generally not a problem. It is when it wants to do mored complicate
> stuff that it becomes an issue. So to blanket say it is a really bad
> idea is plain wrong.


Yes, but this is not a C program, this is a Python program. Unless the subprocess module is re-coded in C, it is not possible to call exec immediately after fork. After calling fork from Python, control is necessarily returned to the Python interpreter.

rg

Graham Dumpleton

unread,
Oct 8, 2011, 2:21:44 AM10/8/11
to mod...@googlegroups.com

This code doesn't exist in Python 2.7.1 code that I have on my box.

This must be a new change and whatever they are doing has broken
things for a fork done from a sub interpreter. So,could be a bug in
Python.

Querying the bug report for original issue and whether they checked
fix for sub interpreters. Often they don't bother.

Will do standalone test when get chance.

Graham

>> If all the process is then doing is subsequent exec then it is
>> generally not a problem. It is when it wants to do mored complicate
>> stuff that it becomes an issue. So to blanket say it is a really bad
>> idea is plain wrong.
>
>
> Yes, but this is not a C program, this is a Python program.  Unless the subprocess module is re-coded in C, it is not possible to call exec immediately after fork.  After calling fork from Python, control is necessarily returned to the Python interpreter.
>
> rg
>

> --
> You received this message because you are subscribed to the Google Groups "modwsgi" group.

Ron Garret

unread,
Oct 8, 2011, 12:45:45 PM10/8/11
to mod...@googlegroups.com

It was apparently introduced in 2.7.2.

> This must be a new change and whatever they are doing has broken
> things for a fork done from a sub interpreter. So,could be a bug in
> Python.

I tried reproducing the problem standalone but was unable to. It seems like it must be something mod_wsgi is doing because of the intermittent reproducibility. Also the fact that WSGIApplicationGroup %{GLOBAL} makes the problem go away is a pretty big clue.

For the record, I had WSGIApplicationGroup set to the name of my WSGIDaemonProcess. I have half a dozen WSGI applications running on the same machine. Each one has its own named WSGIDaemonProcess and a corresponding WSGIApplicationGroup set to the same name. But I don't actually understand what WSGIApplicationGroup is supposed to do, so this whole thing could be a configuration error on my part.

> Querying the bug report for original issue and whether they checked
> fix for sub interpreters. Often they don't bother.
>
> Will do standalone test when get chance.

Thanks.

rg

Graham Dumpleton

unread,
Oct 8, 2011, 9:08:05 PM10/8/11
to mod...@googlegroups.com
The only way to reproduce it is going to be wrote a C program which
embeds Python and manually initialises Python via C API and then
creates sub interpreter, manages sub interpreter thread states
correctly and then executes problem code in the context of that. It
isn't straight forward thing that most would be able to do. Sub
interpreters can be created from pure Python code so must use C code.
Is that what you have done so far to replicate it standalone?

Graham

Graham Dumpleton

unread,
Oct 8, 2011, 9:08:37 PM10/8/11
to mod...@googlegroups.com
On 9 October 2011 12:08, Graham Dumpleton <graham.d...@gmail.com> wrote:
> The only way to reproduce it is going to be wrote a C program which
> embeds Python and manually initialises Python via C API and then
> creates sub interpreter, manages sub interpreter thread states
> correctly and then executes problem code in the context of that. It
> isn't straight forward thing that most would be able to do. Sub
> interpreters can be created from pure Python code so must use C code.

Sub interpreters can not be created ....

Ron Garret

unread,
Oct 9, 2011, 12:59:22 AM10/9/11
to mod...@googlegroups.com

On Oct 8, 2011, at 6:08 PM, Graham Dumpleton wrote:

> The only way to reproduce it is going to be wrote a C program which
> embeds Python and manually initialises Python via C API and then
> creates sub interpreter, manages sub interpreter thread states
> correctly and then executes problem code in the context of that. It
> isn't straight forward thing that most would be able to do. Sub

> interpreters can [not] be created from pure Python code so must use C code.


> Is that what you have done so far to replicate it standalone?

No, I did a pure-python test. I thought that my mod_wsgi configuration wasn't using sub-interpreters because I'm running (I thought) in daemon mode with one process per wsgi application.

rg

Graham Dumpleton

unread,
Oct 9, 2011, 1:24:14 AM10/9/11
to mod...@googlegroups.com
Unless you have:

WSGIApplicationGroup %{GLOBAL}

you are running in a sub interpreter of the daemon process.

The %{GLOBAL} value to that directive says to use the main interpreter.

Graham

Graham Dumpleton

unread,
Oct 12, 2011, 2:48:41 AM10/12/11
to mod...@googlegroups.com
Here is the bug report have lodged against Python with potential fix.

http://bugs.python.org/issue13156

Basically your screwed if you want to use Python 2.7.2 or 3.2.1+ and
fork sub processes from application running in mod_wsgi sub
interpreters.

You are forced to use the main interpreter.

To work around the problem in mod_wsgi would require a very naughty
hack to force initialisation of auto thread state for main interpreter
and then manipulate its reference count manually up by one so it isn't
auto deleted.

Graham

Ademir Francisco da Silva

unread,
Oct 22, 2011, 2:46:10 PM10/22/11
to mod...@googlegroups.com
Em 07/10/2011 09:26, Lisper escreveu:
BTW, I get the exact same result on Linux. --
You received this message because you are subscribed to the Google Groups "modwsgi" group.
To view this discussion on the web visit https://groups.google.com/d/msg/modwsgi/-/NoRmhODnADAJ.

To post to this group, send email to mod...@googlegroups.com.
To unsubscribe from this group, send email to modwsgi+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/modwsgi?hl=en.

BTW ???

-- 
Ademir Francisco da Silva
Skype  ...: Ademir_Francisco_da_Silva

Ron Garret

unread,
Aug 9, 2012, 12:43:37 PM8/9/12
to mod...@googlegroups.com
I'm not sure this is the right place to ask this question because it's not really about modwsgi but this is the best place I know to find expertise about WSGI in general.

Yesterday I fired up some old code using the wsgiref server and got the following error:

"Hop-by-hop headers not allowed"

This turned out to be caused by my code including a "Connection: close" header in order to work around an old Safari bug. Trick is, the last time I ran this code under wsgiref it worked, and it hasn't changed. And when I run it under modwsgi it works.

So my question is: does anyone here know why wsgiref doesn't allow connection headers? And did this change recently? Looking at the wsgiref code it seems to reject Connection headers at least as far back as Python 2.6. My code is old, but it's not that old (less than two years). I'm pretty sure it has run successfully under Python2.6 if not 2.7. It contains WITH statements, so the last time I ran it could not have been under anything earlier than 2.6.

I'm baffled. Can anyone here shed any light on this?

Thanks,
rg

Graham Dumpleton

unread,
Aug 9, 2012, 11:16:24 PM8/9/12
to mod...@googlegroups.com, Web SIG
Probably better off asking on the Python WEB-SIG. I have cc'd this there.

http://www.python.org/community/sigs/current/web-sig/

Someone has probably felt that wsgiref implementation should somehow
be checking for things which aren't notionally allowed but which go
beyond just API usage checks. Checking for hop by hop headers should
possibly have been the job of the wsgiref.validator and not the server
in wsgiref.

I know of no other server which will outright error when a hop by hop
header is returned by an application, and as you note, there are
sometimes where it is useful to pass back Connection to ensure that
the web server/client drops the current connection and doesn't try and
maintain a keep alive connection.

Graham
Reply all
Reply to author
Forward
0 new messages