Performance; PHP vs CGI vs mod_wsgi

1,014 views
Skip to first unread message

Joe

unread,
Sep 3, 2008, 7:59:02 PM9/3/08
to mod...@googlegroups.com
Hi,

I've just converted a PHP (Apache 2.2 mod_php5) application first to
Python CGI and then to mod_wsgi, so I decided to benchmark some 40-odd
representative pages because I wasn't very impressed by the observed
performance.

The conversion to CGI was almost a line-for-line translation of PHP to
Python. The PHP code did not use templates and had a few classes,
mostly to interface with a Postgres db, using standard PHP pg_xxx
calls. For Python, I used psycopg2. The conversion to mod_wsgi
replaced print's by accumulating the output, e.g., stream += 'more
stuff\n'. It also uses a very simplistic dispatcher that examines
PATH_INFO in a if/elif/else construct.

I used 'ab' for testing, invoking it with -n 100 and capturing the 'Time
per request (mean)'. I ran the tests twice to ensure the results were
comparable. The overall total for the PHP pages was about 14 seconds,
for CGI about 23 seconds and for WSGI about 35 seconds.

For the pages that do no database access, PHP took on average 6.6 ms,
CGI 123 ms and WSGI 113 ms. However, for these pages the WSGI results
were very uneven, with a low of 13.6 ms and a high of 246.6 ms, vs.
6.0-7.3 in PHP and 113.3-140.5 in CGI. OTOH, the WSGI results were
roughly correlated to the amount of text on each page.

For the pages that do a single db access, for a simple nav menu, PHP
used 132 ms, CGI 321 ms and WSGI 229 ms. For the pages that retrieve a
non-existent object (and also include the nav menu), the results were:
PHP 164 ms, CGI 364 ms, WSGI 165 ms. For the remainder of the pages,
which do multiple db retrievals, PHP took an average of 438 ms, CGI 642
ms and WSGI 1137 ms.

Based on what I had read about mod_wsgi, I had expected generally better
results than for CGI so I was surprised by the above. Since I don't
have much experience with Python web apps, I am wondering if the results
can be explained just by the simplistic dispatcher and string
concatenation or if there is something else I should be doing or
checking. Note the current WSGI app will not stay as-is, but I'd like
to understand what may be affecting performance.

Joe

Brett Hoerner

unread,
Sep 3, 2008, 8:10:25 PM9/3/08
to mod...@googlegroups.com
On Wed, Sep 3, 2008 at 6:59 PM, Joe <d...@freedomcircle.net> wrote:
> I've just converted a PHP (Apache 2.2 mod_php5) application first to
> Python CGI and then to mod_wsgi, so I decided to benchmark some 40-odd
> representative pages because I wasn't very impressed by the observed
> performance.

Can you please post the relevant parts of your Apache configuration,
especially under mod_wsgi? It really, really can affect any sort of
benchmarks if you're "doin' it wrong".

Brett

Graham Dumpleton

unread,
Sep 3, 2008, 8:15:23 PM9/3/08
to mod...@googlegroups.com
2008/9/4 Joe <d...@freedomcircle.net>:

>
> Hi,
>
> I've just converted a PHP (Apache 2.2 mod_php5) application first to
> Python CGI and then to mod_wsgi, so I decided to benchmark some 40-odd
> representative pages because I wasn't very impressed by the observed
> performance.
>
> The conversion to CGI was almost a line-for-line translation of PHP to
> Python. The PHP code did not use templates and had a few classes,
> mostly to interface with a Postgres db, using standard PHP pg_xxx
> calls. For Python, I used psycopg2. The conversion to mod_wsgi
> replaced print's by accumulating the output, e.g., stream += 'more
> stuff\n'.

Which is inefficient as every time you append to the string it needs
to reallocate the string and copy old contents to need and then append
extra text.

You should look at using StringIO instead:

import StringIO
output = StringIO.StringIO()

print >> output, 'more'
print >> output, 'more'

result = output.getvalue()

> It also uses a very simplistic dispatcher that examines
> PATH_INFO in a if/elif/else construct.

Not seeing what you have done, can't comment, but a long if/elif/else
construct wouldn't be efficient.

Sounds like you would have been better off using Apache to do dispatch
for URLs by having handlers for each URL in separate files. This would
be closer to what you had with PHP where each was in a separate file
as well.

One though would perhaps want to be ensuring that all URL handler WSGI
files were delegated to run in same Python interpreter instance rather
than default of using separate one.

> I used 'ab' for testing, invoking it with -n 100 and capturing the 'Time
> per request (mean)'.

Using a small number of requests like that will give unreliable
results for various reasons, Including activating Apache processes,
lazy loading of WSGI application etc etc.

I would never consider less than 3000-5000 and possibly more dependent
on the application being tested and you need to ensure that process
correctly primed.

> I ran the tests twice to ensure the results were
> comparable. The overall total for the PHP pages was about 14 seconds,
> for CGI about 23 seconds and for WSGI about 35 seconds.
>
> For the pages that do no database access, PHP took on average 6.6 ms,
> CGI 123 ms and WSGI 113 ms. However, for these pages the WSGI results
> were very uneven, with a low of 13.6 ms and a high of 246.6 ms, vs.
> 6.0-7.3 in PHP and 113.3-140.5 in CGI. OTOH, the WSGI results were
> roughly correlated to the amount of text on each page.

Which shows as I said that such a small number of requests can yield
quite unreliable results.

> For the pages that do a single db access, for a simple nav menu, PHP
> used 132 ms, CGI 321 ms and WSGI 229 ms. For the pages that retrieve a
> non-existent object (and also include the nav menu), the results were:
> PHP 164 ms, CGI 364 ms, WSGI 165 ms. For the remainder of the pages,
> which do multiple db retrievals, PHP took an average of 438 ms, CGI 642
> ms and WSGI 1137 ms.
>
> Based on what I had read about mod_wsgi, I had expected generally better
> results than for CGI so I was surprised by the above. Since I don't
> have much experience with Python web apps, I am wondering if the results
> can be explained just by the simplistic dispatcher and string
> concatenation or if there is something else I should be doing or
> checking. Note the current WSGI app will not stay as-is, but I'd like
> to understand what may be affecting performance.

Can you post some examples of your code. We can then evaluate it and
suggest better ways of doing things.

BTW, one also has to be careful about comparing PHP to Python as the
ways the hosting mechanisms work is quite different. For a discussion
of principle differences see:

http://blog.ianbicking.org/2008/01/12/what-php-deployment-gets-right/

Graham

Joe

unread,
Sep 3, 2008, 8:33:39 PM9/3/08
to mod...@googlegroups.com

Sorry, I meant to do that, but I hit Send before I remembered.

There's really not much to it:

LoadModule wsgi_module /usr/lib/apache2/modules/mod_wsgi.so

(This is mod_wsgi 2.1-2 from Debian)

# CGI testing
Alias /static/ "/var/www/pycgi/static/"
ScriptAlias /fccgi/ "/var/www/pycgi/"
AddHandler cgi-script .py

# mod_wsgi testing
WSGIScriptAlias /fcwsgi /var/www/pywsgi/fcdir.wsgi

There was a WSGIReloadMechanism Module while I was converting but I
removed it for the tests.

Joe

Graham Dumpleton

unread,
Sep 3, 2008, 8:40:55 PM9/3/08
to mod...@googlegroups.com
2008/9/4 Joe <d...@freedomcircle.net>:

Which indicates one script for all URLs. I am presuming you had one
script for each URL with PHP rather than doing dispatching within PHP.

As I said before, Apache/mod_wsgi can still do the dispatching for
you, like with PHP, and it is usually going to be quicker than you
doing it yourself.

> There was a WSGIReloadMechanism Module while I was converting but I
> removed it for the tests.

The default for WSGIReloadMechanism in embedded mode is 'Module' so
setting it explicitly wouldn't have made a difference and reloading is
still on. Having it on shouldn't affect the performance to any
noticeable degree anyway.

Now, what does you actual WSGI application script contain. If worried
that code doing work cant be shown, at least indicate how you are
doing main dispatching from application entry point and stuff out the
handler function code. Maybe leave one representative handler function
in there though.

Graham

Joe

unread,
Sep 3, 2008, 8:50:22 PM9/3/08
to mod...@googlegroups.com
Graham Dumpleton wrote:
> Sounds like you would have been better off using Apache to do dispatch
> for URLs by having handlers for each URL in separate files. This would
> be closer to what you had with PHP where each was in a separate file
> as well.
>

I was hoping to eventually move to a more intelligent dispatcher.

> One though would perhaps want to be ensuring that all URL handler WSGI
> files were delegated to run in same Python interpreter instance rather
> than default of using separate one.
>

I'm not quite sure I understand ("one though"?).

> Which shows as I said that such a small number of requests can yield
> quite unreliable results.
>

I understand, but this was just a proof-of-concept and get a general
idea of how it performs.

> Can you post some examples of your code. We can then evaluate it and
> suggest better ways of doing things.
>
> BTW, one also has to be careful about comparing PHP to Python as the
> ways the hosting mechanisms work is quite different. For a discussion
> of principle differences see:
>
> http://blog.ianbicking.org/2008/01/12/what-php-deployment-gets-right/
>

Thanks for your comments. I'll try StringIO and report further if
necessary.

Joe

Graham Dumpleton

unread,
Sep 3, 2008, 8:54:11 PM9/3/08
to mod...@googlegroups.com
2008/9/4 Joe <d...@freedomcircle.net>:

> Thanks for your comments. I'll try StringIO and report further if
> necessary.

If you want to work at a low level, ie., don't want to use one of the
big frameworks, might I suggest you look at:

http://werkzeug.pocoo.org/

Graham

Joe

unread,
Sep 3, 2008, 9:20:40 PM9/3/08
to mod...@googlegroups.com
Graham Dumpleton wrote:
> Which indicates one script for all URLs. I am presuming you had one
> script for each URL with PHP rather than doing dispatching within PHP.
>
> As I said before, Apache/mod_wsgi can still do the dispatching for
> you, like with PHP, and it is usually going to be quicker than you
> doing it yourself.
>

I guess I was misled by looking at things like Django/TG/Pylons/Routes
(and even Trac) that mostly do dispatching in the app. I assume that by
letting Apache/mod_wsgi do the dispatching you mean defining
WSGIScriptAlias for each partial path desired. After merging some PHP
files, I still have about 14 entry paths, which is doable, but less
flexible.

> The default for WSGIReloadMechanism in embedded mode is 'Module' so
> setting it explicitly wouldn't have made a difference and reloading is
> still on. Having it on shouldn't affect the performance to any
> noticeable degree anyway.
>
> Now, what does you actual WSGI application script contain. If worried
> that code doing work cant be shown, at least indicate how you are
> doing main dispatching from application entry point and stuff out the
> handler function code. Maybe leave one representative handler function
> in there though.
>

Here's roughly what it looks like (fcdir.wsgi and fcdir.py could of
course be merged):

--- fcdir.wsgi ---
import sys

run_path = '/var/www/pywsgi'
if run_path not in sys.path:
sys.path.insert(0, run_path)

import fcdir
application = fcdir.dispatch

--- fcdir.py ---
import index, module1, module2

def dispatch(environ, start_response):
# some config file stuff

path_info = environ['PATH_INFO']
if path_info == '' or path_info == '/' or path_info == '/index':
response = index.render(environ)
elif path_info[:7] == '/xxxx/':
xxxparam = path_info[7:]
response = module1.render(environ, xxxparam)
elif path_info == '/xxxxx':
response = module2.render(environ)
....
else:
start_response('404 Not Found', [])
return ['Path %s not implemented yet' % path_info]
start_response('200 OK', [('Content-type','text/html')])
return response

--- moduleX.py ---
...

def render(environ):
title = 'something'
stream = print_header(title, ...)
stream += page_body()
stream += print_footer()
return stream

As mentioned, this was a quick POC to verify that it was doable and
produced the correct results.

Joe

Graham Dumpleton

unread,
Sep 3, 2008, 10:15:05 PM9/3/08
to mod...@googlegroups.com
Your WSGI application entry point is coded wrongly. See comments
below, but at least make fix mentioned below.

2008/9/4 Joe <d...@freedomcircle.net>:


>
> Graham Dumpleton wrote:
>> Which indicates one script for all URLs. I am presuming you had one
>> script for each URL with PHP rather than doing dispatching within PHP.
>>
>> As I said before, Apache/mod_wsgi can still do the dispatching for
>> you, like with PHP, and it is usually going to be quicker than you
>> doing it yourself.
>>
>
> I guess I was misled by looking at things like Django/TG/Pylons/Routes
> (and even Trac) that mostly do dispatching in the app.

You weren't mislead, but Python people do like to do everything in
Python. Doing it all in Python does mean you can test outside of
Apache, which can be a benefit for many things.

I guessed that since you came from PHP background you may find the one
file per handler model more familiar. :-)

> I assume that by
> letting Apache/mod_wsgi do the dispatching you mean defining
> WSGIScriptAlias for each partial path desired. After merging some PHP
> files, I still have about 14 entry paths, which is doable, but less
> flexible.

You don't need a WSGIScriptAlias for each URL. There are few options
but will just explain one that seems to match best all of what you
appear to be wanting to do.

# Map to directory of WSGI script files.
Alias /fcwsgi/ /var/www/pywsgi/

<Directory /var/www/pywsgi>

# Map .wsgi extension to mod_wsgi.
AddHandler wsgi-script .wsgi

# Allow executable scripts in directory and multiviews so can
leave of .wsgi extension.
Options ExecCGI MultiViews
MultiviewsMatch Handlers

# Map request against directory or an unknown resource to index.wsgi.
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)$ /index.wsgi/$1 [QSA,PT,L]

# For all to run in same Python interpreter instance.
# Don't do this though if they can't coexist together.
WSGIApplicationGroup %{GLOBAL}

</Directory>

Your index.wsgi file in that directory would then contain:

import index

def application(environ, start_response):
response = index.render(environ)


start_response('200 OK', [('Content-type','text/html')])

return [response]

Take very close note here, your previous code was broken as you were
returning a string object from WSGI application rather than an array
containing single string object. Ie., you should have had:

return [response]

That you returned a string meant Apache/mod_wsgi was flushing after
each individual character in the string which would have caused bad
performance. Make just that change and you may find it works a lot
better.

Anyway, index.wsgi would get mapped for URLs:

/fcwsgi/
/fcwsgi/index
/fcwsgi/index.wsgi

Plus:

/fcwsgi/non-existant-resource.ext

That is will map to index.wsgi if URL didn't otherwise find a static
file of resource to handle request. This may not actually be desirable
in which case index.wsgi should filter out that case. Alternatively,
don't use the rewrite rules in configuration above, meaning that only
URLs:

/fcwsgi/index
/fcwsgi/index.wsgi

would work.

Have that and the above Apache configuration can actually be
simplified even further with WSGIScriptAlias being used to map a URL
mount point to a directory of scripts rather than a single one.

Now, your file xxxxx.wsgi would similarly contain:

import module2

def application(environ, start_response):
response = module2.render(environ)


start_response('200 OK', [('Content-type','text/html')])

return [response]

The URLs for it would be:

/fcwsgi/xxxxx
/fcwsgi/xxxxx.wsgi

Note though that for this and index.wsgi, they can also accept
additional path information:

/fcwsgi/xxxxx/extra/path/info
/fcwsgi/xxxxx.wsgi/extra/path/info

as the default is that Apache will allow extra path information.

Thus for your /xxxx/ case, script would be xxxx.wsgi and the handler
just needs to do the right thing.

If only certain URLs are supposed to accept additional path
information, Apache can be used to control it:

AcceptPathInfo Off

<Files xxxx.wsgi>
AcceptPathInfo On
</Files>

So, pushing that aspect of routing URLs on to Apache as well.

Since routing is now being put onto Apache, there may not be much
point in having the separation you have between the .wsgi file and the
.py file. Ie., how you have index.wsgi and index.py.

The model as described above now gets you closer to the file based
resource model of PHP where each file handles a single request and
with Apache being used to do routing.

Yes one can do things this way and for small scripts which need to be
super efficient having Apache do routing will be quicker than doing
dispatch in Python, but overall you may be better just going to Python
based routing from an existing toolkit/framework rather than trying to
roll your own.

For now at least, make that change to return array of strings, rather
than returning a string.

Graham

Joe

unread,
Sep 3, 2008, 10:15:02 PM9/3/08
to mod...@googlegroups.com
Joe wrote:
> --- fcdir.py ---
> import index, module1, module2
>
> def dispatch(environ, start_response):
> # some config file stuff
>
> path_info = environ['PATH_INFO']
> if path_info == '' or path_info == '/' or path_info == '/index':
> response = index.render(environ)
> elif path_info[:7] == '/xxxx/':
> xxxparam = path_info[7:]
> response = module1.render(environ, xxxparam)
> elif path_info == '/xxxxx':
> response = module2.render(environ)
> ....
> else:
> start_response('404 Not Found', [])
> return ['Path %s not implemented yet' % path_info]
> start_response('200 OK', [('Content-type','text/html')])
> return response
>

Now this is weird. I just realized that the response ought to be a
list, so I changed the last line accordingly and now I'm seeing results
similar to or even better than PHP.

Joe

Brett Hoerner

unread,
Sep 3, 2008, 11:28:51 PM9/3/08
to mod...@googlegroups.com
If you were giving a string when an iterable (list) was expected, it
would have iterated over each of the characters, one at a time. I'm
not sure what mod_wsgi does at that point, but if it were pushing out
one character at a time with a syscall, that would definitely have
hurt.

for x in "this is a test":
print x

t
h
i
s

i
s

a

t
e
s
t

Brett

Graham Dumpleton

unread,
Sep 4, 2008, 12:02:46 AM9/4/08
to mod...@googlegroups.com
2008/9/4 Brett Hoerner <bretth...@gmail.com>:

>
> If you were giving a string when an iterable (list) was expected, it
> would have iterated over each of the characters, one at a time. I'm
> not sure what mod_wsgi does at that point, but if it were pushing out
> one character at a time with a syscall, that would definitely have
> hurt.

It is worse than that, as each character is pushed one at a time
through the Apache output filter brigade, thus even more overhead than
simply calling write() with a single character at a time.

Graham

Reply all
Reply to author
Forward
0 new messages