wsgi design question

11 views
Skip to first unread message

wmiller

unread,
Oct 22, 2008, 9:17:49 AM10/22/08
to modwsgi
just starting out with mod_wsgi on apache and I'm wondering, is it
less than ideal to have separate *.wsgi files for each URL versus just
one root *.wsgi file that parses and handles all the URLs for a
website?

In the sandbox environment I'm working in I have a separate *.wsgi
file for each URL because it's an easy way to get started. I'm
guessing for performance and resource reasons, it won't scale very
well - multiple Python interpretors running?

I'd prefer to avoid using frameworks or middleware and just stay as
low level and basic as possible, e.g. for each URL a simple file like
this with additional code/functionality depending on requirements:

def application(environ, start_response):
start_response('200 OK', [('content-type', 'text/html')])
HTML = "<html><body>%s</body></html>"
HTML %= "Hello world!"
return [HTML]


-Walter

Clodoaldo Pinto Neto

unread,
Oct 22, 2008, 11:00:08 AM10/22/08
to mod...@googlegroups.com
2008/10/22 wmiller <walter...@gmail.com>:

>
> just starting out with mod_wsgi on apache and I'm wondering, is it
> less than ideal to have separate *.wsgi files for each URL versus just
> one root *.wsgi file that parses and handles all the URLs for a
> website?
>
> In the sandbox environment I'm working in I have a separate *.wsgi
> file for each URL because it's an easy way to get started. I'm
> guessing for performance and resource reasons, it won't scale very
> well - multiple Python interpretors running?
>
> I'd prefer to avoid using frameworks or middleware and just stay as
> low level and basic as possible, e.g. for each URL a simple file like
> this with additional code/functionality depending on requirements:

You will need at least a low level tool box like Werkzeug otherwise it
will be too much low level. It includes a URL routing system and
request and response objects.

Regards, Clodoaldo

Carl Nobile

unread,
Oct 22, 2008, 11:03:56 AM10/22/08
to mod...@googlegroups.com
The *.wsgi file is just a hook into your actual code and should be no more that 15 or 20 lines of code. I've seen some that were less than 5 lines of code. This hook file should call your framework or whatever you are using to handle your site. I do all URI parsing in my code not the hook file.

-Carl

Graham Dumpleton

unread,
Oct 23, 2008, 2:17:15 AM10/23/08
to mod...@googlegroups.com
2008/10/23 wmiller <walter...@gmail.com>:

There are a few issues here to consider.

1. The default of a separate sub interpreter per WSGI application will
mean more overall memory use. Provided that the applications can
coexist, this is easily solved by delegating them to all run in the
same sub interpreter using the WSGIApplicationGroup. For example, to
delegate them to run in the main Python interpreter (the first created
by Python), use:

WSGIApplicationGroup %{GLOBAL}

Alternatively, if you have multiple sites, then use:

WSGIApplicationGroup %{SERVER}

This will mean that all WSGI scripts in that virtual host site would
run in same sub interpreter.

2. Apache URL dispatch using file based resource matching, ie.,
AddHandler and lots of .wsgi files, should be a lot faster than doing
URL dispatch in Python code using Routes or some other Python based
URL matcher.

Apache also provides MultiViews matching allowing you to drop .wsgi
extension in URLs, so URLs can still be clean.

Although it can be a bit more complicated that a Python base URL
routing solution, mod_rewrite rules can be used to do a lot of trick
stuff with mapping path info back to GET style request parameters if
you want to use REST style URL semantics, with segments of URLs
becoming form input parameters.

The Files directive in Apache also means you can do much finer grained
control over the configuration of URLs than what a Python framework
may allow easily.

For example, you could set AcceptPathInfo to Off as default, and only
set it to On for those resources which actually allow passing of path
info.

Similarly, the Limit method can be used default to only allow GET
requests and then for specific resources allow over request methods as
appropriate.

Various frameworks don't filter either these things properly and any
request method is possible with path info being able to be supplied
even if not desired, with the latter then stuffing up relative links
in pages.

Other things that can be controlled on per resource basis is things
like LimitRequestBody for POST requests.

The resource approach also means you can mix WSGI scripts in the same
directory as static media and not have to separate them out into a
separate area. This allows everything to be localised with one
directory potentially being a self contained application.

All this acknowledges that Apache itself is a web application
framework with a lot of power in its own right and not just a hopping
off point to a pure Python world for your application. Personally I
think Apache gets underutilised by Python people. When Apache 2.4
comes out with mod_session and support for single sign on across
applications this will be more evident.

What would sort of be nice is if Python web application component
providers would recognise that Apache has some good to offer rather
than spurn it. In particular, it would be nice if Python auth/session
mechanisms would provide a way of being used that is compatible with
how mod_session works. This way one could use a Python session
management mechanism if running Python standalone application, but
then defer to mod_session in context of Apache, thereby allowing
single sign on to work. Adopting the mod_session style interfacing
method would also allow mix and match of Python components in a Python
only application, whereas at the moment they all do things there own
ways and come as monolithic solutions to the problem. If no one else
sees this, and I have time, I'll probably end up bringing out my own
auth/session components that work in this way as well as modify
mod_wsgi to integrate in with mod_session so that Python code can acts
as session store.

3. The low level nature of WSGI is also an issue and packages like
Paste, WebOb and Werkzeug can help there if you want to work at that
level.

Overall I personally find working down at this Apache resource level
with WSGI scripts for each major URL entry point of a single
application quite attractive. This doesn't mean I wouldn't use some
URL mapping in Python, but rather than using a really high level and
potentially inefficient system like routes, I would use purpose built
WSGI components for mapping that do a very specific job. That way they
can be more efficient.

I can't give examples right now as stuff I was playing with is on
another machine. I also hadn't really had time to translate the stuff
I was doing in this respect from mod_python to WSGI properly so its
all a bit of a message and possibly not ready for public viewing.

In some respects it comes down to personal tastes, like myself, some
like tinkering at the lower levels in a more hands on approach, others
prefer big frameworks which do a lot of stuff for you, but also
potentially then put a straight jacket on you and force you to do
things their way.


Graham

wmiller

unread,
Oct 23, 2008, 9:28:54 AM10/23/08
to modwsgi
On Oct 23, 2:17 am, "Graham Dumpleton" <graham.dumple...@gmail.com>
wrote:
> 2008/10/23 wmiller <walter.mil...@gmail.com>:
and I was so close to giving up entirely on the low-level multi wsgi
file approach in favor of exclusively using middleware like Werkzeug.
It's good to hear the alternative is at least feasible. I imagine
there's probably a sweet spot combination of wsgi middleware and low-
level multi wsgi file approach leveraging what Apache does best and
what wsgi middleware does best.

The preference is to keep the entire website within one sub
interpretor so thanks for the WSGIApplicationGroup %{GLOBAL} tip.

My httpd.conf file specifies *.htm files as the wsgi-script files,
i.e. "AddHandler wsgi-script .htm". This just makes it easier for my
html editors to edit the files since the template and the wsgi file
are one and the same for now. I'm thinking in the very near future to
separate them into a template file and an application file, each pair
with the same name but different extension, e.g. index.tpl and
index.htm.

Looking forward to mod_session support in Apache. We're currently
trying to use mod_auth_digest but it's not clear to me how to
integrate that with a wsgi application. In the meantime, we're using
session cookies which seems to be working pretty well so far.

-Walter

MilesTogoe

unread,
Oct 23, 2008, 10:17:19 AM10/23/08
to mod...@googlegroups.com
I guess I don't understand why. Assembler code is more low level than C
but C is still low enough and gains you so much more productivity that
few program at the assembler level any more - I think of Werkzeug as C
in this analogy - it gives you so much but doesn't carry the often
unnecessary baggage of higher level stuff.

wmiller

unread,
Oct 23, 2008, 11:40:07 AM10/23/08
to modwsgi
mostly because using wsgi middleware and frameworks feels to me a
little like implementing a file system in a database instead of just
using the file system of the OS. Mapping URLs to files using Python
instead of using Apache is a good example of more moving parts, more
complexity, and less transparency. Because we're still using Python
to do the heavy lifting it's not as low level as it might seem.


Carl Nobile

unread,
Oct 23, 2008, 12:15:31 PM10/23/08
to mod...@googlegroups.com
If you're doing REST, which is what I do, you need to keep in mind that REST is very URI intensive. Trying to define all of them in apache config files is an impossible task. Remember these are URIs not URLs. URIs point to a virtual object not a physical one and with a large database you can have millions of them. Trying to do these in apache even using REGEX and rewrite rules will never be as dynamically able to handle the number of URIs you could have in a REST web service, doing it in code is a much more robust way to do it.

If you are really mapping files to URLs then maybe REST isn't what you need.

-Carl

wmiller

unread,
Oct 23, 2008, 3:11:55 PM10/23/08
to modwsgi

On Oct 23, 12:15 pm, "Carl Nobile" <carl.nob...@gmail.com> wrote:
> If you're doing REST, which is what I do, you need to keep in mind that REST
> is very URI intensive. Trying to define all of them in apache config files
> is an impossible task. Remember these are URIs not URLs. URIs point to a
> virtual object not a physical one and with a large database you can have
> millions of them. Trying to do these in apache even using REGEX and rewrite
> rules will never be as dynamically able to handle the number of URIs you
> could have in a REST web service, doing it in code is a much more robust way
> to do it.
>
> If you are really mapping files to URLs then maybe REST isn't what you need.
>
> -Carl

why do they have to be mutually exclusive? A single physical html
file can have RESTful properties and serve as a basis for looking up
many items in a database:

http://example.com/index.html?order_no=4

The above has all the qualities of being RESTful. It points to a
stateless resource and it's cacheable.

Carl Nobile

unread,
Oct 23, 2008, 3:27:49 PM10/23/08
to mod...@googlegroups.com
Actually no, that would not be REST if you do everything from one URL. You need to read RESTful Web Services by Leonard Richardson & Sam Ruby ISBN: 978-0-596-52926-0. This book has become the RESTful bible. Also read RFC-2616 (http://www.faqs.org/rfcs/rfc2616.html) the HTTP 1.1 standard document it explains the use of the HTTP methods, DELETE, GET, HEAD, OPTION, POST,  and PUT. There are two more but are not used in REST. You will also be using a lot more status codes besides 200, 404, 500.

-Carl
--
-------------------------------------------------------------------------------
Carl J. Nobile (Software Engineer)
carl....@gmail.com
-------------------------------------------------------------------------------

wmiller

unread,
Oct 23, 2008, 3:48:59 PM10/23/08
to modwsgi
I think it depends on how you define a stateless cacheable resource.
I would like someone to clearly and explicitly demonstrate how a web
URL with a query string does not have these RESTful properties.

e.g. http://en.wikipedia.org/wiki/Representational_State_Transfer

"One can argue that any sufficiently rich command line interface can
be considered "ReSTful" ... in that a fully qualified command line and
set of switches/options and arguments can access any accessible
application state. A web URL with query string is effectively a sort
of command line with arguments."
> carl.nob...@gmail.com
> --------------------------------------------------------------------------- ----

Carl Nobile

unread,
Oct 23, 2008, 4:12:03 PM10/23/08
to mod...@googlegroups.com
Yes, but a command line with arguments is not RESTful. This is REST:

/v1/users/  -> GET, HEAD, OPTION, PUT
/v1/users/{user name}/ -> DELETE, HEAD, GET, OPTION, PUT

You break down the URIs into business units, something that the CEO of the company can understand not something that only developers understand. You don't send all sorts of info that goes into an SQL query on a command line. You setup the URIs to handle all the different USE CASES that are necessary to do the work you need done. I'm not saying you cannot use CGI variables, but they are not used like they used to be. Yes, this is more work for the developer, but makes your web services loosely  coupled and more idempotent allowing them to be used for many different things. REST is used when you are building horizontal layers (new way) not vertical silos (old way).

If you are using HTML pages in a REST environment it is probably not REST or at best it's a REST/RPC hybrid.

If you want to continue this chat find my email on my web site and email me: http://www.tetrasys.homelinux.org/

-Carl
--
-------------------------------------------------------------------------------

Carl J. Nobile (Software Engineer)
carl....@gmail.com
-------------------------------------------------------------------------------

wmiller

unread,
Oct 23, 2008, 5:00:40 PM10/23/08
to modwsgi
I think we're going to have to agree to disagree here. In addition to
the quote from the Wikipedia article below, a quick google of Amazon's
SimpleDB show them defining URLs with query strings as RESTful:

http://docs.amazonwebservices.com/AmazonSimpleDB/2007-11-07/DeveloperGuide/index.html?MakingRESTRequests.html

not to say that Wikipedia or Amazon are necessisarily right but I
haven't seen any definitions that say empirically that URLs with query
strings are not RESTful.

On Oct 23, 4:12 pm, "Carl Nobile" <carl.nob...@gmail.com> wrote:
> Yes, but a command line with arguments is not RESTful. This is REST:
>
> /v1/users/  -> GET, HEAD, OPTION, PUT
> /v1/users/{user name}/ -> DELETE, HEAD, GET, OPTION, PUT
>
> You break down the URIs into business units, something that the CEO of the
> company can understand not something that only developers understand. You
> don't send all sorts of info that goes into an SQL query on a command line.
> You setup the URIs to handle all the different USE CASES that are necessary
> to do the work you need done. I'm not saying you cannot use CGI variables,
> but they are not used like they used to be. Yes, this is more work for the
> developer, but makes your web services loosely  coupled and more idempotent
> allowing them to be used for many different things. REST is used when you
> are building horizontal layers (new way) not vertical silos (old way).
>
> If you are using HTML pages in a REST environment it is probably not REST or
> at best it's a REST/RPC hybrid.
>
> If you want to continue this chat find my email on my web site and email me:http://www.tetrasys.homelinux.org/
>
> -Carl
>
>
>
> On Thu, Oct 23, 2008 at 3:48 PM, wmiller <walter.mil...@gmail.com> wrote:
>
> > I think it depends on how you define a stateless cacheable resource.
> > I would like someone to clearly and explicitly demonstrate how a web
> > URL with a query string does not have these RESTful properties.
>
> > e.g.http://en.wikipedia.org/wiki/Representational_State_Transfer

Clodoaldo Pinto Neto

unread,
Oct 23, 2008, 7:01:37 PM10/23/08
to mod...@googlegroups.com
2008/10/23 Graham Dumpleton <graham.d...@gmail.com>:

All this number 2 item sounds like a mod_route recipe. I wish
something like that existed as i'm no Apache expert and i think
mod_rewrite is a bit cryptic. Perhaps i should bite the bullet and
really learn it. The problem is that my memory is not very good and at
the time i come back to Apache to setup a new virtual host, 3 to 6
months later, i would have to learn it again.

How to avoid lots of the same imports in a multiple entry point
application? Isn't it like importing the same single entry point code
to many wsgi scripts? The tail wagging the dog? Sorry if these
questions sound so basic but since the OP posted as a design thread i
thought i could ask them. If you have samples please post them.

Regards, Clodoaldo

Carl Nobile

unread,
Oct 23, 2008, 9:27:43 PM10/23/08
to mod...@googlegroups.com
I never said that a query in your URI/URL is not RESTful what I did say is that they are used differently in a RESTful architecture. Read the book, really it will help tremendusly with the understanding of REST.

-Carl
--
-------------------------------------------------------------------------------

Carl J. Nobile (Software Engineer)
carl....@gmail.com
-------------------------------------------------------------------------------
Reply all
Reply to author
Forward
0 new messages