I've just converted a PHP (Apache 2.2 mod_php5) application first to Python CGI and then to mod_wsgi, so I decided to benchmark some 40-odd representative pages because I wasn't very impressed by the observed performance.
The conversion to CGI was almost a line-for-line translation of PHP to Python. The PHP code did not use templates and had a few classes, mostly to interface with a Postgres db, using standard PHP pg_xxx calls. For Python, I used psycopg2. The conversion to mod_wsgi replaced print's by accumulating the output, e.g., stream += 'more stuff\n'. It also uses a very simplistic dispatcher that examines PATH_INFO in a if/elif/else construct.
I used 'ab' for testing, invoking it with -n 100 and capturing the 'Time per request (mean)'. I ran the tests twice to ensure the results were comparable. The overall total for the PHP pages was about 14 seconds, for CGI about 23 seconds and for WSGI about 35 seconds.
For the pages that do no database access, PHP took on average 6.6 ms, CGI 123 ms and WSGI 113 ms. However, for these pages the WSGI results were very uneven, with a low of 13.6 ms and a high of 246.6 ms, vs. 6.0-7.3 in PHP and 113.3-140.5 in CGI. OTOH, the WSGI results were roughly correlated to the amount of text on each page.
For the pages that do a single db access, for a simple nav menu, PHP used 132 ms, CGI 321 ms and WSGI 229 ms. For the pages that retrieve a non-existent object (and also include the nav menu), the results were: PHP 164 ms, CGI 364 ms, WSGI 165 ms. For the remainder of the pages, which do multiple db retrievals, PHP took an average of 438 ms, CGI 642 ms and WSGI 1137 ms.
Based on what I had read about mod_wsgi, I had expected generally better results than for CGI so I was surprised by the above. Since I don't have much experience with Python web apps, I am wondering if the results can be explained just by the simplistic dispatcher and string concatenation or if there is something else I should be doing or checking. Note the current WSGI app will not stay as-is, but I'd like to understand what may be affecting performance.
On Wed, Sep 3, 2008 at 6:59 PM, Joe <d...@freedomcircle.net> wrote: > I've just converted a PHP (Apache 2.2 mod_php5) application first to > Python CGI and then to mod_wsgi, so I decided to benchmark some 40-odd > representative pages because I wasn't very impressed by the observed > performance.
Can you please post the relevant parts of your Apache configuration, especially under mod_wsgi? It really, really can affect any sort of benchmarks if you're "doin' it wrong".
> I've just converted a PHP (Apache 2.2 mod_php5) application first to > Python CGI and then to mod_wsgi, so I decided to benchmark some 40-odd > representative pages because I wasn't very impressed by the observed > performance.
> The conversion to CGI was almost a line-for-line translation of PHP to > Python. The PHP code did not use templates and had a few classes, > mostly to interface with a Postgres db, using standard PHP pg_xxx > calls. For Python, I used psycopg2. The conversion to mod_wsgi > replaced print's by accumulating the output, e.g., stream += 'more > stuff\n'.
Which is inefficient as every time you append to the string it needs to reallocate the string and copy old contents to need and then append extra text.
You should look at using StringIO instead:
import StringIO output = StringIO.StringIO()
print >> output, 'more' print >> output, 'more'
result = output.getvalue()
> It also uses a very simplistic dispatcher that examines > PATH_INFO in a if/elif/else construct.
Not seeing what you have done, can't comment, but a long if/elif/else construct wouldn't be efficient.
Sounds like you would have been better off using Apache to do dispatch for URLs by having handlers for each URL in separate files. This would be closer to what you had with PHP where each was in a separate file as well.
One though would perhaps want to be ensuring that all URL handler WSGI files were delegated to run in same Python interpreter instance rather than default of using separate one.
> I used 'ab' for testing, invoking it with -n 100 and capturing the 'Time > per request (mean)'.
Using a small number of requests like that will give unreliable results for various reasons, Including activating Apache processes, lazy loading of WSGI application etc etc.
I would never consider less than 3000-5000 and possibly more dependent on the application being tested and you need to ensure that process correctly primed.
> I ran the tests twice to ensure the results were > comparable. The overall total for the PHP pages was about 14 seconds, > for CGI about 23 seconds and for WSGI about 35 seconds.
> For the pages that do no database access, PHP took on average 6.6 ms, > CGI 123 ms and WSGI 113 ms. However, for these pages the WSGI results > were very uneven, with a low of 13.6 ms and a high of 246.6 ms, vs. > 6.0-7.3 in PHP and 113.3-140.5 in CGI. OTOH, the WSGI results were > roughly correlated to the amount of text on each page.
Which shows as I said that such a small number of requests can yield quite unreliable results.
> For the pages that do a single db access, for a simple nav menu, PHP > used 132 ms, CGI 321 ms and WSGI 229 ms. For the pages that retrieve a > non-existent object (and also include the nav menu), the results were: > PHP 164 ms, CGI 364 ms, WSGI 165 ms. For the remainder of the pages, > which do multiple db retrievals, PHP took an average of 438 ms, CGI 642 > ms and WSGI 1137 ms.
> Based on what I had read about mod_wsgi, I had expected generally better > results than for CGI so I was surprised by the above. Since I don't > have much experience with Python web apps, I am wondering if the results > can be explained just by the simplistic dispatcher and string > concatenation or if there is something else I should be doing or > checking. Note the current WSGI app will not stay as-is, but I'd like > to understand what may be affecting performance.
Can you post some examples of your code. We can then evaluate it and suggest better ways of doing things.
BTW, one also has to be careful about comparing PHP to Python as the ways the hosting mechanisms work is quite different. For a discussion of principle differences see:
Brett Hoerner wrote: > On Wed, Sep 3, 2008 at 6:59 PM, Joe <d...@freedomcircle.net> wrote:
>> I've just converted a PHP (Apache 2.2 mod_php5) application first to >> Python CGI and then to mod_wsgi, so I decided to benchmark some 40-odd >> representative pages because I wasn't very impressed by the observed >> performance.
> Can you please post the relevant parts of your Apache configuration, > especially under mod_wsgi? It really, really can affect any sort of > benchmarks if you're "doin' it wrong".
Sorry, I meant to do that, but I hit Send before I remembered.
> Brett Hoerner wrote: >> On Wed, Sep 3, 2008 at 6:59 PM, Joe <d...@freedomcircle.net> wrote:
>>> I've just converted a PHP (Apache 2.2 mod_php5) application first to >>> Python CGI and then to mod_wsgi, so I decided to benchmark some 40-odd >>> representative pages because I wasn't very impressed by the observed >>> performance.
>> Can you please post the relevant parts of your Apache configuration, >> especially under mod_wsgi? It really, really can affect any sort of >> benchmarks if you're "doin' it wrong".
> Sorry, I meant to do that, but I hit Send before I remembered.
Which indicates one script for all URLs. I am presuming you had one script for each URL with PHP rather than doing dispatching within PHP.
As I said before, Apache/mod_wsgi can still do the dispatching for you, like with PHP, and it is usually going to be quicker than you doing it yourself.
> There was a WSGIReloadMechanism Module while I was converting but I > removed it for the tests.
The default for WSGIReloadMechanism in embedded mode is 'Module' so setting it explicitly wouldn't have made a difference and reloading is still on. Having it on shouldn't affect the performance to any noticeable degree anyway.
Now, what does you actual WSGI application script contain. If worried that code doing work cant be shown, at least indicate how you are doing main dispatching from application entry point and stuff out the handler function code. Maybe leave one representative handler function in there though.
Graham Dumpleton wrote: > Sounds like you would have been better off using Apache to do dispatch > for URLs by having handlers for each URL in separate files. This would > be closer to what you had with PHP where each was in a separate file > as well.
I was hoping to eventually move to a more intelligent dispatcher.
> One though would perhaps want to be ensuring that all URL handler WSGI > files were delegated to run in same Python interpreter instance rather > than default of using separate one.
I'm not quite sure I understand ("one though"?).
> Which shows as I said that such a small number of requests can yield > quite unreliable results.
I understand, but this was just a proof-of-concept and get a general idea of how it performs.
> Can you post some examples of your code. We can then evaluate it and > suggest better ways of doing things.
> BTW, one also has to be careful about comparing PHP to Python as the > ways the hosting mechanisms work is quite different. For a discussion > of principle differences see:
Graham Dumpleton wrote: > Which indicates one script for all URLs. I am presuming you had one > script for each URL with PHP rather than doing dispatching within PHP.
> As I said before, Apache/mod_wsgi can still do the dispatching for > you, like with PHP, and it is usually going to be quicker than you > doing it yourself.
I guess I was misled by looking at things like Django/TG/Pylons/Routes (and even Trac) that mostly do dispatching in the app. I assume that by letting Apache/mod_wsgi do the dispatching you mean defining WSGIScriptAlias for each partial path desired. After merging some PHP files, I still have about 14 entry paths, which is doable, but less flexible.
> The default for WSGIReloadMechanism in embedded mode is 'Module' so > setting it explicitly wouldn't have made a difference and reloading is > still on. Having it on shouldn't affect the performance to any > noticeable degree anyway.
> Now, what does you actual WSGI application script contain. If worried > that code doing work cant be shown, at least indicate how you are > doing main dispatching from application entry point and stuff out the > handler function code. Maybe leave one representative handler function > in there though.
Here's roughly what it looks like (fcdir.wsgi and fcdir.py could of course be merged):
--- fcdir.wsgi --- import sys
run_path = '/var/www/pywsgi' if run_path not in sys.path: sys.path.insert(0, run_path)
import fcdir application = fcdir.dispatch
--- fcdir.py --- import index, module1, module2
def dispatch(environ, start_response): # some config file stuff
> Graham Dumpleton wrote: >> Which indicates one script for all URLs. I am presuming you had one >> script for each URL with PHP rather than doing dispatching within PHP.
>> As I said before, Apache/mod_wsgi can still do the dispatching for >> you, like with PHP, and it is usually going to be quicker than you >> doing it yourself.
> I guess I was misled by looking at things like Django/TG/Pylons/Routes > (and even Trac) that mostly do dispatching in the app.
You weren't mislead, but Python people do like to do everything in Python. Doing it all in Python does mean you can test outside of Apache, which can be a benefit for many things.
I guessed that since you came from PHP background you may find the one file per handler model more familiar. :-)
> I assume that by > letting Apache/mod_wsgi do the dispatching you mean defining > WSGIScriptAlias for each partial path desired. After merging some PHP > files, I still have about 14 entry paths, which is doable, but less > flexible.
You don't need a WSGIScriptAlias for each URL. There are few options but will just explain one that seems to match best all of what you appear to be wanting to do.
# Map to directory of WSGI script files. Alias /fcwsgi/ /var/www/pywsgi/
<Directory /var/www/pywsgi>
# Map .wsgi extension to mod_wsgi. AddHandler wsgi-script .wsgi
# Allow executable scripts in directory and multiviews so can leave of .wsgi extension. Options ExecCGI MultiViews MultiviewsMatch Handlers
# Map request against directory or an unknown resource to index.wsgi. RewriteEngine On RewriteCond %{REQUEST_FILENAME} !-f RewriteRule ^(.*)$ /index.wsgi/$1 [QSA,PT,L]
# For all to run in same Python interpreter instance. # Don't do this though if they can't coexist together. WSGIApplicationGroup %{GLOBAL}
</Directory>
Your index.wsgi file in that directory would then contain:
Take very close note here, your previous code was broken as you were returning a string object from WSGI application rather than an array containing single string object. Ie., you should have had:
return [response]
That you returned a string meant Apache/mod_wsgi was flushing after each individual character in the string which would have caused bad performance. Make just that change and you may find it works a lot better.
Anyway, index.wsgi would get mapped for URLs:
/fcwsgi/ /fcwsgi/index /fcwsgi/index.wsgi
Plus:
/fcwsgi/non-existant-resource.ext
That is will map to index.wsgi if URL didn't otherwise find a static file of resource to handle request. This may not actually be desirable in which case index.wsgi should filter out that case. Alternatively, don't use the rewrite rules in configuration above, meaning that only URLs:
/fcwsgi/index /fcwsgi/index.wsgi
would work.
Have that and the above Apache configuration can actually be simplified even further with WSGIScriptAlias being used to map a URL mount point to a directory of scripts rather than a single one.
Now, your file xxxxx.wsgi would similarly contain:
as the default is that Apache will allow extra path information.
Thus for your /xxxx/ case, script would be xxxx.wsgi and the handler just needs to do the right thing.
If only certain URLs are supposed to accept additional path information, Apache can be used to control it:
AcceptPathInfo Off
<Files xxxx.wsgi> AcceptPathInfo On </Files>
So, pushing that aspect of routing URLs on to Apache as well.
Since routing is now being put onto Apache, there may not be much point in having the separation you have between the .wsgi file and the .py file. Ie., how you have index.wsgi and index.py.
The model as described above now gets you closer to the file based resource model of PHP where each file handles a single request and with Apache being used to do routing.
Yes one can do things this way and for small scripts which need to be super efficient having Apache do routing will be quicker than doing dispatch in Python, but overall you may be better just going to Python based routing from an existing toolkit/framework rather than trying to roll your own.
For now at least, make that change to return array of strings, rather than returning a string.
>> The default for WSGIReloadMechanism in embedded mode is 'Module' so >> setting it explicitly wouldn't have made a difference and reloading is >> still on. Having it on shouldn't affect the performance to any >> noticeable degree anyway.
>> Now, what does you actual WSGI application script contain. If worried >> that code doing work cant be shown, at least indicate how you are >> doing main dispatching from application entry point and stuff out the >> handler function code. Maybe leave one representative handler function >> in there though.
> Here's roughly what it looks like (fcdir.wsgi and fcdir.py could of > course be merged):
> --- fcdir.wsgi --- > import sys
> run_path = '/var/www/pywsgi' > if run_path not in sys.path: > sys.path.insert(0, run_path)
Now this is weird. I just realized that the response ought to be a list, so I changed the last line accordingly and now I'm seeing results similar to or even better than PHP.
If you were giving a string when an iterable (list) was expected, it would have iterated over each of the characters, one at a time. I'm not sure what mod_wsgi does at that point, but if it were pushing out one character at a time with a syscall, that would definitely have hurt.
> Now this is weird. I just realized that the response ought to be a > list, so I changed the last line accordingly and now I'm seeing results > similar to or even better than PHP.
> If you were giving a string when an iterable (list) was expected, it > would have iterated over each of the characters, one at a time. I'm > not sure what mod_wsgi does at that point, but if it were pushing out > one character at a time with a syscall, that would definitely have > hurt.
It is worse than that, as each character is pushed one at a time through the Apache output filter brigade, thus even more overhead than simply calling write() with a single character at a time.
>> Now this is weird. I just realized that the response ought to be a >> list, so I changed the last line accordingly and now I'm seeing results >> similar to or even better than PHP.