On Sun, Mar 15, 2009 at 19:09, Carl Byström <cgby...@gmail.com> wrote:
>
> I've poked/looked around the Smisk framework and I wonder how it
> handles concurrency. From what I can tell, it's plain-old per-process
> concurrency?
That is correct. Seldom is processor time the constraining factor in
the kind of applications Smisk is built for. If you require heavy math
or similar, Smisk can scale over several machines, widening the
bottleneck.
>
> While Smisk may be very performant, how do you handle blocking I/O
> operations? Will you need to spawn multiple processes to handle
> multiple outstanding I/O calls?
Blocking I/O should be avoided (this can be done using asynchronous
mechanisms like kqueue, epoll, poll, select or libraries like
libevent). However sometimes you can not avoid performing blocking I/O
when serving a request. In those cases your Smisk application should
run in as many instances as required. Many server support dynamic
spawning of FastCGI processes. For example LigHTTPd supports this
(using the "max-procs" parameter in the lighttpd configuration[1]) as
well as mod_fastcgi for Apache (the "-processes" option can be set to
set minimum amount of instances[2]) and mod_fcgid for Apache which
provides a fairly extensive set of configuration[3] for controlling
how instances are spawned.
Smisk itself handles blocking I/O (like accept and file session I/O)
by releasing the Python Global Interpreter Lock, as required by the
Python C-API specification. This is of low relevance in most cases,
but if your application is threaded and other threads perform I/O,
true concurrence will occur.
>
> Anyway, I love the idea about Smisk and it being open-source. Another
> (somewhat similar) solution I've been looking at is nginx + mod_wsgi.
> WSGI actually supports asynchronous requests so implementing a
> coroutine based application would yield quite good concurrency and
> performance (assuming nginx handles the HTTP deserialization natively)
Coroutines are very interesting. The FastCGI protocol also supports
asynchronous I/O (multiple requests being handled asynchronous over a
single FastCGI connection a.k.a. connection multiplexing). However the
cost of adding support for multiplexing and concurrent transactions to
Smisk is very high as Smisk utilizes many global objects (for instance
to save time, Smisk reuses a single Request object). Also in the
exposed API Smisk provides the global objects app, request and
response – many parts of the API and existing application uses these
objects.
Most parts of the Smisk core is however prepared for concurrent
processing (most parts of the FastCGI transaction mechanisms, except
from the aforementioned shared Request and Response instances).
However, another approach to this "problem" might be to add dynamic
forking of Smisk itself. See src/Application.c[4] for details.
If anyone is interested in adding connection multiplexing support to
Smisk, please be my guest! Fork the repository at GitHub[5] and we can
later merge in any contributions.
Smisk aims to provide a reasonable compromise between performance,
usability and areas of application. Looking for high HTTP handling?
Write something in C using the HTTP-library of libevent[6]. Looking
for a wide array of features and functionality? Django[7] provides
really good means for creating web services.
I hope you got answers to your questions!
> /Carl
>
[1] http://redmine.lighttpd.net/wiki/lighttpd/Docs:ModFastCGI
[2] http://www.fastcgi.com/mod_fastcgi/docs/mod_fastcgi.html#FastCgiServer
[3] http://fastcgi.coremail.cn/doc.htm
[4] http://github.com/rsms/smisk/blob/v1.1.4/src/Application.c#L278
[5] http://github.com/rsms/smisk/fork
[6] http://www.monkey.org/~provos/libevent/
[7] http://www.djangoproject.com/
--
Rasmus Andersson
I hope you got answers to your questions!
I think your message got through clearly. Smisk do use the
lowest-level part of libfcgi, thus the core supports connection
multiplexing, but the way Smisk is used would mess things up.
>
> Blocking I/O to the file system might be just fine, however, blocking socket
> I/O is bit of a pain. How do you handle that at Spotify? Lengthy database
> queries for example.
We simply don't do lengthy database queries. If you do have slow
queries, consider adding caching like memcached or bsddb in
smisk.ipc[1] or move the data to something faster than a RDBMS, like
Tokyo Cabinet[2][3] and Tokyo Tyrant[4].
However as I mentioned before, avoiding blocking I/O is impossible in
most situations relying on outside sources like databases. In those
cases you should run as many Smisk instances as needed to handle the
amount of (projected or real) traffic. Load balancing and
close-to-horizontal scaling is the coolest parts of FastCGI and Smisk!
[1] http://python-smisk.org/docs/current/library/smisk.ipc.html
[2] http://tokyocabinet.sourceforge.net/index.html
[3] http://github.com/rsms/tc
[4] http://tokyocabinet.sourceforge.net/tyrantdoc/