Good habits / best practices for routing wsgi endpoints

Dev Mukherjee

okunmadı,

31 Eki 2015 20:58:2831.10.2015

alıcı mod...@googlegroups.com

Hi all,

The following is more a best practices question.

We've been developing WSGI apps for a while, and also maintain a REST server micro-framework. Our applications like everything else in the Python world is made of micro-frameworks. We would typically use something like webapp2 to serve out "pages" and then build APIs using prestans.

Both frameworks provide routers, and we end up having routes in Apache like

Alias /assets/ /srv/app/static/assets/

Alias /js/ /srv/app/static/js/

WSGIScriptAliasMatch ^/api/(.*) /srv/app/wsgi/api.wsgi

WSGIScriptAliasMatch ^/(.*) /srv/app/wsgi/app.wsgi

where the two WSGI endpoints point to routers provided by the two frameworks.

Most examples (on mod_wsgi docs) and from seeing configuration of mod_wsgi-express and frameworks like werkzeug seem to suggest that a WSGI app should have a single WSGI endpoint, and then perhaps use a middleware to wrap/dispatch the routes?

Is there a correct way of addressing this? Any thoughts / experiences?

If middlwares are the solution, any suggestions on where / which frameworks to look at?

Many thanks for sparing your time.

Graham Dumpleton

okunmadı,

1 Kas 2015 05:05:461.11.2015

alıcı mod...@googlegroups.com

On 1 Nov 2015, at 11:58 am, Dev Mukherjee <dev...@gmail.com> wrote:

Hi all,

The following is more a best practices question.

We've been developing WSGI apps for a while, and also maintain a REST server micro-framework. Our applications like everything else in the Python world is made of micro-frameworks. We would typically use something like webapp2 to serve out "pages" and then build APIs using prestans.

Both frameworks provide routers, and we end up having routes in Apache like

Alias /assets/ /srv/app/static/assets/
Alias /js/ /srv/app/static/js/

WSGIScriptAliasMatch ^/api/(.*) /srv/app/wsgi/api.wsgi
WSGIScriptAliasMatch ^/(.*) /srv/app/wsgi/app.wsgi

where the two WSGI endpoints point to routers provided by the two frameworks.

I actually high discourage the use of WSGIScriptAliasMatch as it can do unexpected things as far as it effect on the relationship between SCRIPT_NAME and PATH_INFO. There is generally very little need for it.

The configuration about could be done as:

WSGIScriptAlias /api/ /srv/app/wsgi/app2.wsgi

WSGIScriptAlias / /srv/app/wsgi/app1.wsgi

BTW, I presume you meant them to refer to different WSGI script files.

Also, for '/api/‘, that would result in SCRIPT_NAME being ‘/api’, with remainder of URL being in PATH_INFO. In other words, the WSGI application will not see itself as notionally being mounted at the root of the site.

If you wanted both WSGI application to think they were mounted at the root of the site, you would use:

WSGIScriptAlias /api/ /srv/app/wsgi/app2.wsgi/api/

WSGIScriptAlias / /srv/app/wsgi/app1.wsgi

Note that not sure if need to trailing slash on end of last argument of first line. Shouldn’t harm anything if not needed I don’t think, but do check.

I don’t remember how WSGIScriptAliasMatch does the breaking up between SCRIPT_NAME and PATH_INFO and where it can causes surprises. Thus why suggest it be avoided unless no choice.

Most examples (on mod_wsgi docs) and from seeing configuration of mod_wsgi-express and frameworks like werkzeug seem to suggest that a WSGI app should have a single WSGI endpoint, and then perhaps use a middleware to wrap/dispatch the routes?

If you are talking about taking two distinct WSGI applications, implementing micro services, which you so happen to just want to appear under different URL (sub URLs) of the one site, I completely disagree with the idea that you must composite them together by using a WSGI middleware that grafts them into the same process.

The use of WSGI middleware to graft together what are really distinct service end points is really a result of the limitations of any WSGI server used. In Apache/mod_wsgi you don’t need to do that as you can handle it at the Apache level, plus use rely on mod_wsgi to then separate the separate WSGI applications into separate Python interpreter namespaces in the same process, or better still, have them run in separate daemon process groups.

WSGIDaemonProcess api process=3 threads=3

WSGIDaemonProcess main processes=2 threads=2

WSGIScriptAlias /api/ /srv/app/wsgi/app2.wsgi process-group=api application-group=%{GLOBAL}

WSGIScriptAlias / /srv/app/wsgi/app1.wsgi process-group=main application-group=%{GLOBAL}

The reason using separate daemon process groups for each is so much better than using a WSGI middleware within one interpreter context is that you can then separately control the number of processes/threads used for each.

This flexibility is very important because those different WSGI applications, web UI and REST API may have entirely different profiles for the amount of traffic, whether they are CPU or I/O bound, length of response times etc. To think they are the same and bundle them in the one process means you aren’t likely going to be as readily tune the WSGI server to the best that you might.

That said, even within one WSGI application or another, there can be widely different requirements as well. In that case, even further dividing up the URL name space so that you separate work done across daemon process groups based on things like CPU usage, response times etc can help. This is something that is impossible to do with a WSGI server such as gunicorn by itself.

I have talked about this idea of breaking up applications vertically and sending URLs into different process groups so they can be tuned for their respective workload. You can find what I had to say in:

http://blog.dscpl.com.au/2014/02/vertically-partitioning-python-web.html

Is there a correct way of addressing this? Any thoughts / experiences?

If middlwares are the solution, any suggestions on where / which frameworks to look at?

I don’t think middlewares are necessary the solution. I strongly believe partitioning should be managed at a higher level.

This doesn’t mean you can’t use such middleware and the Paste libraries has a configuration file driven approach for grafting together WSGI applications at different sub URL contexts so they can all run in one process.

I would only use such WSGI middlewares as a fall back though when you need to run it all in one process as part of development, or if you had no other choice because you were deployed to a host service that didn’t give you the flexibility to use a decent WSGI server, or provide some means at their routing layer to direct traffic for different sub URLs to different backends.

Even with such WSGI middleware in place you can still use the above with Apache/mod_wsgi to map the different URLs into different processes, although now that you are using WSGI middleware for grafting, it means you end up importing potentially dead code into processes as they will have the part of the URL namespace you aren’t handling loaded as well, unless the WSGI middleware were smart enough to do lazy loading, which usually they aren’t.

The next level above doing separation with Apache/mod_wsgi alone is if you are using Docker to bundle up separate WSGI applications. In this case you use Apache as purely a front end to proxy then to different Docker containers with each WSGI application running in them, where in the Docker containers you can use mod_wsgi-express.

I talk about that topic in:

http://blog.dscpl.com.au/2015/06/proxying-to-python-web-application.html

http://blog.dscpl.com.au/2015/07/redirection-problems-when-proxying-to.html

http://blog.dscpl.com.au/2015/07/using-apache-to-start-and-manage-docker.html

As far as now handling this at the level of a PaaS, the typical PaaS doesn’t provide such support.

Amusingly older types of hosting services such as WebFaction can, but Heroku and OpenShift 2 cannot.

Next generation PaaS offerings coming out such as OpenShift 3 (based around Docker and Kubernetes) will allow you to use it handle vertically separating WSGI applications under sub URLs of the same host name.

On OpenShift 3 for example you can deploy your two separate WSGI applications and when you expose the service using a route, when specifying a hostname, you can also specify a path. The routing layer of OpenShift will for HTTP requests then handle passing through requests under the different URL namespaces for you. This means you don’t need to set up Apache to be doing such proxying.

OpenShift 3 has some really interesting capabilities around handling of many micro services. This is not just related to routing and exposing them under the one site at different URLs, but also the fact that each micro service can run independently, with different CPU and memory resources allocated to them. This way you can adjust the resources allocated to the actual amount used for your tuned WSGI server and application.

You don’t therefore have situation you do with current generation PaaS where you get this fixed bucket of resources and you never use it all. You either try and screw around all the time with your WSGI server processes/threads to try and fill the space, or you give up, waste resources when adding more instances.

With OpenShift, you tune your WSGI server and application as best can, then set CPU and memory based on what that uses. When you need to scale, you simply create more replicas. You don’t have wasted CPU and memory as your allocation is a more accurate depiction of what is used. Thus when you scale you can fit more instances in from your global allocation of CPU and memory.

So the important difference here is that next generation PaaS has your CPU and memory allocation per project. Not per instance. That way you can divide up the allocation how you see fit. This need not even be restricted to a single WSGI application as within the one project you can run more than one service, api, main, database etc, and they take from the project level bucket of CPU and memory. You this have maximum flexibility.

Of course monitoring becomes even more important in this than it has in the past. If you don’t have good monitoring, you are going to lack the ability to properly tune your application and WSGI server, understand what resources they do use, and so make the most of the new flexibility to break up resources.

Anyway, hopefully you understand this ramble.

Graham

Dev Mukherjee

okunmadı,

22 Kas 2015 17:10:4122.11.2015

alıcı mod...@googlegroups.com

On Sun, Nov 1, 2015 at 9:05 PM, Graham Dumpleton <graham.d...@gmail.com> wrote:

On 1 Nov 2015, at 11:58 am, Dev Mukherjee <dev...@gmail.com> wrote:

where the two WSGI endpoints point to routers provided by the two frameworks.

I actually high discourage the use of WSGIScriptAliasMatch as it can do unexpected things as far as it effect on the relationship between SCRIPT_NAME and PATH_INFO. There is generally very little need for it.

The configuration about could be done as:

WSGIScriptAlias /api/ /srv/app/wsgi/app2.wsgi
WSGIScriptAlias / /srv/app/wsgi/app1.wsgi

Thanks for pointing that out :-)

How would I go about configuring something similar in mod_wsgi-express? Or just point me to documentation and I can take it from there.

BTW, I presume you meant them to refer to different WSGI script files.

Anyway, hopefully you understand this ramble.

Makes perfect sense. Thanks for taking the time to write such a detailed response.

Graham Dumpleton

okunmadı,

24 Kas 2015 18:41:3224.11.2015

alıcı mod...@googlegroups.com

On 23 Nov 2015, at 9:10 AM, Dev Mukherjee <dev...@gmail.com> wrote:

On Sun, Nov 1, 2015 at 9:05 PM, Graham Dumpleton <graham.d...@gmail.com> wrote:

On 1 Nov 2015, at 11:58 am, Dev Mukherjee <dev...@gmail.com> wrote:

where the two WSGI endpoints point to routers provided by the two frameworks.

I actually high discourage the use of WSGIScriptAliasMatch as it can do unexpected things as far as it effect on the relationship between SCRIPT_NAME and PATH_INFO. There is generally very little need for it.

The configuration about could be done as:

WSGIScriptAlias /api/ /srv/app/wsgi/app2.wsgi
WSGIScriptAlias / /srv/app/wsgi/app1.wsgi

Thanks for pointing that out :-)

How would I go about configuring something similar in mod_wsgi-express? Or just point me to documentation and I can take it from there.

The intent with mod_wsgi-express is that it is primarily intended for running a single WSGI application in one daemon process group.

To that end, the preferred setup if needing to host multiple WSGI applications within the same URL namespace of one hostname, is to use nginx or some other proxy in front of mod_wsgi-express. The front end would then be configured to route requests just for that host and the subset of URLs to the appropriate mod_wsgi-express instance. To ensure that original request details get through to the WSGI application properly, mod_wsgi-express has various options to say what are the trusted proxy headers and proxy so that request details can be fixed up.

The reason mod_wsgi-express is going down this path is that a primary reason that mod_wsgi-express was created was as the basis for a much simpler way of running WSGI applications inside of Docker with a curated configuration thereby avoiding general problem that Apache isn’t set up correctly for Python. As well as mod_wsgi-express I also have Docker images I have been working on to develop a best of bread Docker solution for hosting Python web applications. They go well beyond the official Docker Python images as far as what are best practices that should be used and using techniques to ensure everything works properly and you don’t open yourself up to security issues.

That all said, there are two ways that one can still introduce additional WSGI applications so that mod_wsgi-express can host more than one WSGI application.

The first can be used where you have a primary WSGI application but just need to add some small additional WSGI scripts to perform minor tasks. In this approach, the additional WSGI applications run in the same process space as the existing primary WSGI application. Because of that this can only be used where all the WSGI applications will not interfere with each other. That is, you couldn’t use this to host two Django instances by itself.

For this you would run a command like:

mod_wsgi-express start-server --document-root htdocs —add-handler .wsgi loader.py site/wsgi.py

The ‘—add-handler’ argument allows one to specify a WSGI application to be passed the request when static files in the docs directory are requested which have a specific extension. This can be used to create special dynamic handlers to process static resource requests.

In this case we are actually going to use a handler which loads up the WSGI script file and executes the WSGI application it contains.

The loader.py file for this is:

import sys

import imp

import hashlib

def application(environ, start_response):

script = environ['SCRIPT_FILENAME']

name = '_script_%s' % hashlib.md5(script).hexdigest()

# Check if module exists.

if name in sys.modules:

module = sys.modules[name]

else:

# Doesn't so may need to load it.

try:

imp.acquire_lock()

# Check if module exists again now that have lock.

if name not in sys.modules:

# Load script file as module.

module = imp.new_module(name)

module.__file__ = script

with open(script, 'r') as fp:

code = compile(fp.read(), script, 'exec',

dont_inherit=True)

exec(code, module.__dict__)

sys.modules[name] = module

else:

module = sys.modules[name]

finally:

imp.release_lock()

application = getattr(module, 'application')

return application(environ, start_response)

A URL would then by default for second application be something like:

/subapp.wsgi

One can if need be do some stuff so that .wsgi extension isn’t in URL, but that still the —include-file option which is mentioned below for second way.

The second way of doing things is provide your own Apache configuration snippet and use a more traditional configuration to add in an extra WSGI application. Doing it this way you can create an extra daemon process group and delegate it to run in that. Thus technically you could run multiple Django instances.

For this you would run a command line:

mod_wsgi-express start-server —include-file extra.conf site/wsgi.py

In extra.conf you would then have:

WSGIDaemonProcess extra-app

WSGIScriptAlias /suburl /Users/graham/Projects/mod_wsgi/tests/environ.wsgi \

process-group=extra-app application-group=%{GLOBAL}

Order allow,deny

Allow from all

</Directory>

You are obviously then back to ensuring you set up the daemon process group properly if the defaults for mod_wsgi module aren’t appropriate. The mod_wsgi-express main application daemon process group had a lot of overrides applied for timeouts and things to make it more robust that default Apache module settings.

For example, generated mod_wsgi-express config for main daemon process group has something like:

WSGIDaemonProcess localhost:8000 \

display-name='(wsgi:localhost:8000:502)' \

home='/Users/graham/Projects/mod_wsgi' \

threads=5 \

maximum-requests=0 \

python-path='' \

python-eggs='/tmp/mod_wsgi-localhost:8000:502/python-eggs' \

lang='en_AU.UTF-8' \

locale='en_AU.UTF-8' \

listen-backlog=100 \

queue-timeout=45 \

socket-timeout=60 \

connect-timeout=15 \

request-timeout=60 \

inactivity-timeout=0 \

deadlock-timeout=60 \

graceful-timeout=15 \

eviction-timeout=0 \

shutdown-timeout=5 \

send-buffer-size=0 \

receive-buffer-size=0 \

response-buffer-size=0 \

server-metrics=Off

Just note that the main Apache child processes configuration is based of the processes/threads used for the main WSGI application. If extra applications in separate daemon processes got a lot of traffic, you would want to use —max-clients option to ensure that the Apache child processes were given more capacity for proxying requests to the now multiple daemon process groups. By default Apache child process worker threads is something like 1.5 * (processes*threads), with a minimum floor of 10 so don’t starve things for static file requests.

So hope that gives you some things to think about it and can still talk about it off line if want.

Graham

Tümünü yanıtla

Yazarı yanıtla

Yönlendir