Ideas

Lateef Jackson

unread,

Oct 26, 2009, 12:33:26 PM10/26/09

to magnum-py

I wrote an experimental server call frisky (http://bitbucket.org/
lateefj/frisky/wiki/Home) I wanted to share some of the ideas an
features I implemented for I was just about to reimplement the proof
of concept into something useful. However I have found this project to
have already done this and would rather collaborate.
json configuration file:
The reason I was using a json configuration file was that it is so
easy to generate. The deployments I have worked on used template
system for deployment configuration and I thought it was key to have a
format that was easy to generate. Also the files are easy to edit and
are not wordy like XML.
pre (WSGI) process request limits:
IMO Apache does a lot of things well. One of the things I really liked
was the ability to add requests limits. In my case it was fastcgi
processes however this applies to WSGI. When an application is in the
wild for a while it tends to pile up with third party dependencies and
a lot of code that may have unnoticeable memory leaks or just
eventually fail, manifest themselves only in production. Side note:
Since I knew I was going to reimplement I never used a code change
monitor I just set the request limit to 1 in development. The downside
is that static file changes didn't need a new process to be created.
Beaker caching:
Pros / cons I describe in this blog post:
http://blog.hackingthought.com/2009/08/integrated-beaker-cache-with-frisky.html
The performance key was to have the "master process" or "event
process" look for cached items. However this was a performance hack
because I couldn't seem to get multiprocessing to perform well but
that is probably more my fault than the library.
WSGI application routing:
I never implemented this but the idea is configuration of a url to
WSGI application thus hosting more than a single application in a
server.
IPC vs Pyevent:
As I have mentioned IPC seemed to not consistently work fast on all
platforms. It came to me that maybe the best way to communicate to the
back end processes is to use something like pyevent (epoll, select,
kqueue) instead of IPC. It seems like there is something obvious about
how this would not work but I can't put my finger on it.

Sorry for the download very excited to see other sharing the same
vision!
Cheers,
Lateef

gattis

unread,

Oct 27, 2009, 7:02:37 PM10/27/09

to magnum-py

Thanks for the ideas Lateef.

1) Configuration Files

I hate XML as well. If we are to go with a writable/parseable file, I
agree we should use JSON. The only alternative is to keep the
configuration file in Python itself, which is easier to use since you
don't have to learn any new syntax, but harder to write if we end up
with a GUI tool that writes out to a config file when you save.

2) Per-process Request Limits

Completely agree... I'll add this to the feature requests page. In
the future we may want to get really smart with policing the app layer
and have options to restart a process after it grows to a certain
amount of memory or has a certain number of files open.

3) Beaker Caching

I think we should make use of the Magnum shared memory pool for built-
in caching (check out shared.py). We just need an mmap-based shared
memory dictionary object class, and you should be able to get/set from
the child processes without any serialization, which should be super-
fast and only one copy is held in (shared) memory.

4) WSGI routing

This is already possible! Your config.py would look like this:

HANDLER_CLASS = magnum.http.dispatch.DispatchWrapper({
"/app_one/": magnum.http.wsgi.WSGIWrapper
(AppOne.handlers.WSGIHandler()),
"/app_two/": magnum.http.wsgi.WSGIWrapper
(AppTwo.handlers.WSGIHandler())
})

5) Pyevent

I think we should probably move to a libevent wrapper like pyevent for
connections to the front-end (to make Magnum compatible with Windows/
OSX/BSD). Connections to the back-end processes would ideally be done
over shared memory (to be as fast as possible), but are currently done
over pipes via the multiprocessing module's Queue object. We can hook
up libevent on both sides of the pipe if it makes a difference in the
benchmarks.

On Oct 26, 12:33 pm, Lateef Jackson <lateef.jack...@gmail.com> wrote:
> I wrote an experimental server call frisky (http://bitbucket.org/
> lateefj/frisky/wiki/Home) I wanted to share some of the ideas an
> features I implemented for I was just about to reimplement the proof
> of concept into something useful. However I have found this project to
> have already done this and would rather collaborate.
> json configuration file:
> The reason I was using a json configuration file was that it is so
> easy to generate. The deployments I have worked on used template
> system for deployment configuration and I thought it was key to have a
> format that was easy to generate. Also the files are easy to edit and
> are not wordy like XML.
> pre (WSGI) process request limits:
> IMO Apache does a lot of things well. One of the things I really liked
> was the ability to add requests limits. In my case it was fastcgi
> processes however this applies to WSGI. When an application is in the
> wild for a while it tends to pile up with third party dependencies and
> a lot of code that may have unnoticeable memory leaks or just
> eventually fail, manifest themselves only in production. Side note:
> Since I knew I was going to reimplement I never used a code change
> monitor I just set the request limit to 1 in development. The downside
> is that static file changes didn't need a new process to be created.
> Beaker caching:

> Pros / cons I describe in this blog post:http://blog.hackingthought.com/2009/08/integrated-beaker-cache-with-f...

Lateef Jackson

unread,

Oct 29, 2009, 6:51:20 AM10/29/09

to magn...@googlegroups.com

Caching note:

I agree shared memory would be faster however I was mainly thinking in a clustered deployed environment. The only reason I would use a different approach is if using a caching system like memcached (large amounts of memory needed for cache, multiple nodes for redundancy/performance). Beaker dog piling only supports single host and decreases performance. However all the applications I have worked on the disk IO (DB) was so much slower than the web server and after the cache is cleared without dog piling locks the data store was over loaded. Food for thought.

Per-process Request Limits

I am on the same page with this. I have been thinking about if the WSGI process configuration needs to look more like fastCGI configurations. So if you are going to run a WSGI process for an application then configure it for # processes, memory and maybe even cpu time.

Pyevent

libevent + shared memory sound great! This would be a fun hack. My hesitation would be dependencies on third party module :( I think it would be nice to use pyevent as an optimization if it is installed. Maybe a sprint down the road.

I will put some time in today getting things working on OS X and see if I can run my existing test apps and see how it benchmarks.

--
Lateef Jackson
Phone: 1.704.835.0112

Reply all

Reply to author

Forward