Lazy Initilisation of Python (mod_wsgi, django)

399 views

Skip to first unread message

venu k

unread,

Jun 19, 2013, 5:17:25 AM6/19/13

to mod...@googlegroups.com

Hi Graham,

Thank your for responding to my tweets. Sorry , i should have done posting my query here.

this is httpd.conf:

ServerName bingo
WSGIPythonHome /usr/local
WSGILazyInitialization on
#WSGIRestrictEmbedded on

This is bingo.conf file :

<VirtualHost *:80>
ServerName www.bingo.com
WSGIDaemonProcess venukulala-bingo user=venukulala group=venukulala processes=2 threads=5 python-eggs=/tmp/python-eggs/ python-path=/$
WSGIProcessGroup venukulala-bingo
WSGIScriptAlias / /var/www/bingoproject/bingo/wsgi.py

<Directory /var/www/bingoproject/>
        Order deny,allow
        Allow from all
</Directory>

Alias /static /usr/local/lib/python2.7/dist-packages/django/contrib/admin/static
<Directory/usr/local/lib/python2.7/dist-packages/django/contrib/admin/static >
        Order allow,deny
        Allow from all
        SetHandler None
        FileETag none
        Options FollowSymLinks
</Directory>

ErrorLog /var/log/apache2/bingo-error.log
LogLevel info
CustomLog /var/log/apache2/bingo-access.log combined
</VirtualHost>

i have tried changing the WSGIPythonHome to actual path in /usr/lib , also creating a virtual env and then pointing to this path ...didnt work.

These are the errors i see in the error.log ...

[Wed Jun 19 14:31:03 2013] [info] [client 10.74.152.157] mod_wsgi (pid=29245): Connect after WSGI daemon process restart, attempt #1.
[Wed Jun 19 04:01:03 2013] [info] mod_wsgi (pid=5253): Stopping process 'venukulala-bingo'.
[Wed Jun 19 04:01:03 2013] [info] mod_wsgi (pid=5253): Destroying interpreters.
[Wed Jun 19 04:01:03 2013] [info] mod_wsgi (pid=5253): Destroy interpreter 'www.bingo.com|'.
[Wed Jun 19 04:01:03 2013] [info] mod_wsgi (pid=5253): Cleanup interpreter ''.
[Wed Jun 19 04:01:03 2013] [info] mod_wsgi (pid=5253): Terminating Python.
[Wed Jun 19 14:31:03 2013] [info] mod_wsgi (pid=7585): Create interpreter 'www.bingo.com|'.
[Wed Jun 19 14:31:03 2013] [info] [client 10.74.152.157] mod_wsgi (pid=7585, process='venukulala-bingo', application='www.bingo.com|'): Loading WSGI script '/var/www/bingoproject/bingo/wsgi.py'.
[Wed Jun 19 04:01:03 2013] [info] mod_wsgi (pid=5253): Python has shutdown.
[Wed Jun 19 14:31:03 2013] [info] mod_wsgi (pid=7699): Attach interpreter '

Any help is greatly appreciated.

Thanks,

Venu

Graham Dumpleton

unread,

Jun 19, 2013, 7:01:11 AM6/19/13

to mod...@googlegroups.com

Brain dump mode on.

At this point I am a bit confused about what specifically you are having a problem with, but let me explain in detail how startup of processes and loading of applications occurs.

Firstly. when Apache starts up, in its parent process it will load the mod_wsgi module for Apache. This is linked to the Python runtime library and thus the Python library is also loaded.

Under mod_wsgi 3.X, the Python interpreter is not itself initialised in the Apache parent process. Prior to mod_wsgi 3.0 it was initialised in the Apache parent process, but due to how recent Python versions are implemented, initialising Python itself in the Apache parent process can result in memory growth in the Apache parent process when an Apache restart/reload is done. This is because Python doesn't clean up after itself properly when the interpreter is destroyed. In other words, it leaks memory.

The reason we even get into this situation with mod_wsgi 2.X is because on an Apache restart/reload, the mod_wsgi module is told to cleanup after itself. This in turn causes the Python interpreter to be destroyed. Apache then unloads the mod_wsgi module from the process as well as the Python library. After rereading the configuration and seeing that mod_wsgi is still needed, it will load mod_wsgi again and reinitialise the Python interpreter. The problem is that because the Python library was unloaded, when initialisation is done again, it is against freshly zeroed out memory. Thus the memory objects that Python leaves around and which it didn't delete properly, can't be reused when the interpreter is initialised again as would be the case if the interpreter was created/destroyed and created again in same process memory space.

Anyway, the point of describing this is to indicate that using mod_wsgi 2.X is probably not a good idea at this point and mod_wsgi 3.X should always be used. Unfortunately Ubuntu 10.04 LTS only provides mod_wsgi 2.8 and so many people still use it. The risk in using mod_wsgi 2.X is that if you do many restart/reloads of Apache, rather than a complete stop/start, then that parent process can grow.

Now, once the Apache parent process is setup, it will then for its child worker processes. These are the processes which then handle HTTP requests. The number of these is dictated by the MPM settings. Because these are forks of the parent, if that parent process does grow in size due to the above issues, then all the worker process will in turn be using more memory. Further reason why not to use mod_wsgi 2.X.

Although allowing Python to be initialised in the parent process is a bad idea now, the WSGILazyInitialization directive still exists. This defaults to On. When set to On, the Python interpreter will not be initialised in the parent. Thus the default behaviour in mod_wsgi 3.X. You could set it to Off and restore the mod_wsgi 2.X behaviour, but don't go there.

With the lazy initialisation of the Python interpreter, once those Apache child worker processes are forked, only then will the Python interpreter be initialised. In being initialised, the main Python interpreter context will be created. This is equivalent to having used the command line Python.

If you are using WSGIDaemonProcess directive and delegating your WSGI applications to run in those separate process groups, then no actual WSGI application will run in the Apache child worker processes. This means that you don't actually need to have the Python interpreter be initialised in the Apache child worker processes. Doing so will just waste CPU and slow down setup of the Apache child worker processes, delaying how long before they can start accepting HTTP requests.

Thus, if you are using WSGIDaemonProcess directive and always delegating WSGI applications to the daemon process groups, then set:

WSGIRestrictEmbedded On

This will prevent initialisation of the Python interpreter in the Apache child worker processes saving CPU and memory in those process.

I talk about this whole problem more in:

http://blog.dscpl.com.au/2009/11/save-on-memory-with-modwsgi-30.html

In setting this directive, you will also get an error occur if you managed to stuff up the configuration when using daemon process groups and hadn't actually delegated the WSGI application to run in the daemon process group you had created. This is actually really common because there are some stupid blog posts out there which are wrong.

In short, if you have:

WSGIDaemonProcess group-name

you must either be setting:

WSGIProcessGroup group-name

in an appropriate context to ensure the WSGI application when handling a request is done in the daemon process group.

If using mod_wsgi 3.X, you can also use the process-group option to WSGIScriptAlias. Thus:

WSGIScriptAlias / /some/path/wsgi.py process-group=group-name

If you don't have one of these being applied, then your WSGI application will still run in the Apache child worker processes, making the WSGIDaemonProcess directive pointless.

So, if only using daemon mode, make sure you set WSGIRestrictEmbedded and set it to On.

You can check whether a WSGI application is running in daemon mode by using test in:

http://code.google.com/p/modwsgi/wiki/CheckingYourInstallation#Embedded_Or_Daemon_Mode

You can also ensure you set:

LogLevel info

and mod_wsgi will log messages about when it is loading WSGI scripts and will tell you which process group it is loading them in. If you see an empty string for the process group, that is actually running in the Apache child worker process or what is referred to as embedded mode.

Now no matter whether running in embedded mode or daemon mode, by default, each mounted WSGI application, be they setup by WSGIScriptAlias or AddHandler/SetHandler will run in a separate interpreter within each process.

When Python is first initialised in a process, be that the Apache child worker processes of daemon mode processes, it will always create the main interpreter context, the one which is equivalent to what you get when running command line interpreter.

This main interpreter by default isn't actually used. Instead each WSGI application is run in a separate sub interpreter created for it within the same process. Although the main interpreter is created when the process is first forked and Python initialised, these sub interpreters are not. Instead these application specific sub interpreters are only created on demand when the first request comes in for a WSGI application.

This is done because by default it is not possible to know in advance when using AddHandler/SetHandler what the name of the sub interpreter context to create should be as one will not know what WSGI scripts map exist since the mapping to the WSGI script file in the file system is dynamic.

That WSGIProcessGroup exists and because there are also means of setting the process group dynamically through mod_rewrite rules also means that for WSGIScriptAlias you can't be sure what to create in advance.

End result is that the sub interpreter created for a specific WSGI application is only created on demand with the process handling the request the first time request is handled by that WSGI application. Thus the sub interpreter is lazily created and why you will see at 'info' logging level mod_wsgi saying the sub interpreter is only created on the first request.

If you are using daemon process groups and only one WSGI application runs for daemon process group, you can avoid this lazy creation of the sub interpreter by forcing the WSGI application to run in that main interpreter context created when Python was initialised in the process.

This is done using:

WSGIApplicationGroup %{GLOBAL}

or:

WSGIScriptAlias / /some/path/wsgi.py process-group=group-name application-group=%{GLOBAL}

Because it runs in the main interpreter is saves on the extra memory created for the sub interpreter. More importantly, it avoids problems with third party C extension modules for Python they aren't implemented correctly to work in sub interpreters.

So, if using daemon mode and delegating one WSGI application to a daemon process group, always set the application group (interpreter context) to %{GLOBAL}.

If you had multiple WSGI applications running in the same daemon process group, you can't necessarily force them to all run in the same main interpreter as some frameworks such as Django will not allow you to run multiple Django sites in the same interpreter. In that case you are better off delegating each Django instance to a separate daemon process group and then force use of the main interpreter.

If using embedded mode, seriously consider using daemon mode with one WSGI application to each daemon process group.

So that is how you can eliminate use of sub interpreters and the apparent lazy creation of them.

Next issue issue is the lazy loading of the WSGI script file itself and thus the lazy loading of your WSGI application on the first request. This is done lazily for same reasons as above, you just cannot know what WSGI applications may need to be loaded and into what process/interpreter until the request actually arrives.

There is also though a way of force preloading the WSGI script file.

If you are using mod_wsgi 3.X and you say:

WSGIScriptAlias / /some/path/wsgi.py process-group=group-name application-group=%{GLOBAL}

That is, you set both process-group and application-group options at the same time, it is saying in advance that that WSGI application will always run in that context no matter what other settings such as WSGIProcessGroup and WSGIApplicationGroup may say.

As a result, mod_wsgi will when it sees both options set on WSGIScriptAlias, will preload that WSGI script file when the process first starts rather than on the first request.

If you are using mod_wsgi 2.X, a bit more work has to be done as those options aren't accepted by WSGIScriptAlias on older mod_wsgi. In that case you need to use:

WSGIScriptAlias / /some/path/wsgi.py

WSGIProcessGroup group-name

WSGIApplicationGroup %{GLOBAL}

WSGIImportScript /some/path/wsgi.py process-group=group-name application-group=%{GLOBAL}

The WSGIImportScript is saying to preload that Python code file when the process start. The WSGIProcessGroup and WSGIApplicationGroup in the context the WSGIScriptAlias is applied then must match the same process and application group names given for WSGIImportImport. IOW matching static and dynamic and if they don't align, then you will end up loading the WSGI script file more than once in two distinct interpreter contexts and waste memory.

So, use mod_wsgi 3.X and the options to WSGIScriptAlias to avoid mistakes.

Finally although you can force preload the WSGI script file on process start rathe than first request, this doesn't mean your whole application will load. This is because some frameworks such as Django will only initialise themselves upon the first request occurring. Thus, to force the WSGI application to initialise and potentially load stuff, you would need to fake a web request against the WSGI application at the point the WSGI script file is being loaded. The easiest way to do this is to use WebTest.

For example, if you have:

import os
os.environ["DJANGO_SETTINGS_MODULE"] = "mysite.settings"

from django.core.wsgi import get_wsgi_application

application = get_wsgi_application()

at the end of the WSGI script file you would add:

from webtest import TestApp
testapp_wrapper = TestApp(application)

testapp_wrapper.get('/')

Now because Django can actually also lazily load parts of the application at the time the specific URLs are hit, then you may want to make requests in this way against a few key URLs to get important parts of your application loaded.

Enough. Brain dump mode off. Hopefully I didn't make too many mistakes in that. I have been getting too many queries about related stuff in recent times, so about time I got all that out so I can just refer to it all in one place.

Graham

--
You received this message because you are subscribed to the Google Groups "modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to modwsgi+u...@googlegroups.com.
To post to this group, send email to mod...@googlegroups.com.
Visit this group at http://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/groups/opt_out.

venu k

unread,

Jun 23, 2013, 12:10:09 AM6/23/13

to mod...@googlegroups.com

Hi Graham,

Thank you for you explanation. I was able to troubleshoot this is due to the lazy loading of django itself.

I have installed the admin in the settings.py.

Observation 1 :

in the mysite.conf file , first i didnt specify the path for static files related to admin module.

When open the site http://mysite/admin , the request is so fast , immediately i get the login page for Django Administration.

Observation 2 :

I copied the static files under /usr/local/lib/python2.7/django/contrib/static to the bingoapp/static folder.

In the .conf file , i specified the path for /static as bingoapp/static

The request is too slow. it takes of lot of time to load the login page.

Observation 3 :

This time , in the conf file , i directly specified the path for /static directory to /usr/local/lib/python2.7/django/contrib/static ,

compartively it is faster than "Observation 2 " but not as immediate as "Observation 1 "

But in both the observations , 2 & 3 , eventhough i specify the static directory , it doesnt serve the css files.

So , the issue may not be related to modwsgi .

Regards,

Venu

--
VK
+91 97317 33666

Reply all

Reply to author

Forward

0 new messages