GSoC Intro, Update, and Proposal -- HTTP & WSGI Support Improvements

6 views
Skip to first unread message

ccahoon

unread,
Apr 24, 2009, 7:47:57 PM4/24/09
to Django developers
Hello there django-dev, my name is Chris Cahoon. I'm a junior at the
University of Maryland, Baltimore County (UMBC), heading towards a BS
in CS. There's a good amount of auto-biography in my GSoC proposal, so
I'll leave that for later. I was accepted to GSoC with Malcolm
Tredinnick as my mentor, with the proposal I have included below.

This week has been pretty busy at school, but I have started putting
together an app for testing purposes, to get myself back into full
Django mode after a few months of schoolwork. I decided to go through
the tutorial and docs another time to solidify my grasp of how things
fit together. I plan choose a ticket to address during the bonding
period sometime within the next week.

Here is my (very slightly revised) proposal:
The problem

There is a wide array of accepted tickets related to HTTP and WSGI
support in Django. While some of them are minor, there are a few
features that require a good deal of consideration, planning, and a
fairly large set of changes to the code, but could add useful features
and performance improvements for Django. There are a few tickets in
particular that could be addressed. These tickets combined represent
the need for additional attention paid to HTTP support in Django.

Improving sendfile support will increase the efficiency of Django's
sending files on some servers. WSGI proxying could allow Django to
wrap other WSGI apps in proxy. These and some other persistent and
controversial tickets need to be resolved to allow further enhancement
of HTTP and WSGI support.

Proposal

Ticket #2131 has received a lot of attention, but the patches
currently attached do not address some larger issues in serving static
files over HttpResponse. So far, the code seems to support only the X-
Sendfile header for efficient file transfer (though the name of the
header is configurable). The task of efficiently transfering static
files needs to take into account the variety of webservers Django is
deployed on, including the possibility that a specialized header for
efficient file transfer is not available. Django needs to have a
framework in place to support as many of these webservers as possible,
and to be able to fall back to a normal (inefficient) method of
transfering static files in the case that there is no efficient method
available.

Ticket #8927 is a feature request to allow proxy running of WSGI apps
from within Django. mod_wsgi maintainer grahamd mentions that its use
should be limited to fairly specific instances, such as to "easily
wrap a separate WSGI application within Django session based login
mechanism." On similar lines, djc says "One other thing where this
came up is with hosting a Mercurial application (push/pull interface)
within Django (that is, same url-space, same authentication/
authorization). That kind of thing would be massively easier if Django
provided some bits to do this kind of thing." This would essentially
allow Django to share some services and authentication with other WSGI
appications. The use-cases mentioned should not be infrequent as
Django use becomes more widespread.

Tickets #6527, #7581, and #2504 deal with issues in HttpResponse. They
all have code but decisions need to be made about how to keep
middleware from interfering with or breaking HttpResponses in certain
instances, and about sending out content via iterators (e.g. what
servers/modules support chunked transfer encoding, how do we handle
those that do not?). These have been on the table for a significant
period of time and it would be good to either finalize the code that
has been created for it or come up with alternative solutions.

Many of these tickets are controversial and need further consideration
for design. I plan to gain as full an understanding as possible of
relevant standards, modules, and work already done on these tickets
during the interim period. Concurrently, I will participate in
discussions on django-dev and with relevant parties about factors that
need to be taken into account. Once an acceptable level of consensus
has been reached, I will set about implementing the code and tests for
the tickets mentioned. The goal will be to maintain compatibility with
currently usable webservers and modules while enhancing Django's
capabilities in the areas mentioned above.

My general goal is to gain a really thorough understanding of this
part of the codebase so that, as I am working on these tickets and
testing the webservers, I can get a better feel for what else needs to
be done that might not be in the tracker. But putting this into a
timeline requires me to have a better feel for how much time other
issues will take up.

Other tickets I would like to look into: 6267, which seems simple
enough. It is a good example of a slight quirk in HTTP support that
would be better fixed. 5076 has patches but could use cleanup and some
additional design. 9054 has patches that are reported functional, but
the authors consider them 'hacks' and 'temporary fixes' – I'd like to
spend time on this and create a permanent, better solution (or
determine that the ones that are there are Good Enough). Malcolm has
made suggestions for 10190 that I can work on.

The use of X-FORWARDED-HOST headers is at issue in 9064. Since I will
be testing with multiple servers already, I should be in a good
position for verifying server support for these non-standard headers
(or deciding whether to discontinue their use). I think this is a
major area that I could be of use in: pushing for a decision on what
sorts of non-standard headers should be used, and what kind of
fallback mechanisms we need.

Ticket 10554 shows that the Python Cookie module can be a limiting
factor. It would be interesting to try out jdunck's plan or look
around for other solutions.

I would like to verify that the work done on 6880 and 10188 is of
quality or make improvements to it.

In short, I would like to clean out the HTTP module's tickets in the
tracker as much as possible, and if I encounter other issues along the
way, attempt to address those as well. I predict that as I get more
familiar with the code base I will get better at identifying problems
and coming up with possible solutions, and can work through some of
these tickets fairly quickly.

Though this proposal may seem vague, a great amount of the work on
some of these tickets is determining how Django should behave, not in
implementing the behavior. There are patches for most of the tickets,
but they are all in need of review and in some cases may need to be
discarded. I consider this planning to be a large and difficult part
of the project. My intention is to become well-versed in the
technologies I will be dealing with so that the HTTP and WSGI support
can get more attention than they currently receive. Particularly with
the assistance of a mentor, I am confident I can accomplish this task.

Timeline

Now - SoC begins (May 23) - Discuss tickets on Trac and django-dev,
and with mentor. Get up to speed on various parts of HTTP core, HTTP
and WSGI specs, mod_python and mod_wsgi, etc. Get various webservers
set up on home computer to play with so I will be more familiar with
them when working with the varied support for headers.
05/23 - 05/26 - I take a birthday trip to Ohio for Ultimate Frisbee
nationals. Work I get done here will be reading/communicating on the
road and further solidification of design plans.
05/27 - 06/12 - Ticket #2131 Apache support inside a framework/API
that makes additional server support easy. Augment tests concurrently
and try to keep as many regression tests clean as possible.
06/13 - 06/20 - Continue work on #2131. Additional server support.
Scope out other sample WSGI apps to wrap in Django for #8927.
06/20 - 07/04 - Ticket #8927. Firmly establish API then fill holes for
a sample app. Make sure to have good tests for the new functionality.
07/05 - 07/20 By now, design decisions about #6527, #7581, and #2504
should be done. Implement those here.
07/21 - 08/10 Documentation, bug fixes, ensure code is Pythonic and
Django-fied. Patches are on tickets and hopefully being tested by
community members.
08/11 - 08/17 Cleanup and bugfixes.

In addition, if tickets are moving more smoothly than expected, I can
move forward to the tickets mentioned above and go for a more thorough
cleanout of the tracker's HTTP support bugs.

I plan to devote at least 25 hours per week on the problem. If I see
that I am not reaching milestones I have set on my timeline, I will
augment the amount of time I spend on the problem and consult with my
mentor to ensure that I am not running into roadblocks that have
already been surmounted.

About me

I am in my sixth semester at university, on track to get my BS in
Computer Science in the fall of 2010. I enjoy reading (hard science
fiction, literary fiction, history, popular science, articles on
software, and whatever cognition papers I can sort of understand),
watching off-the-air science fiction television (X-Files, Firefly,
Battlestar Galactica), playing Ultimate, and programming, among other
things.

I built my first, very shoddy website when I was twelve, and began
using server-side scripting languages a few years later, modifying
bulletin boards and building websites for clients as well as for
personal projects. I took programming classes in high school and after
school at a local community college. I began using Python in high
school while working at a medical record copy service. I built client
and 'multithreaded' server scripts to facilitate joining PDF medical
records images without locking up the web servers, as well as various
other scripts to automate cover page generation and printing of stored
medical records. Once I got to university, I have used Python on a
variety of school projects (parsers for NLP, an atmospheric data
visualization app, implementations of various graph algorithms for
game playing in AI, and others), as well as for personal enjoyment. I
have also used a variety of other non-Python languages at work and
school, in various client and web-based contexts.

I began using Django on my own machine out of curiosity in the middle
of 2008. While building an e-commerce site using Django and Satchmo, I
went to a Django pre-1.0 sprint in College Park, Maryland, and, with
significant assistance, submitted a patch to give Atom Feeds timezone
awareness (changeset 8216). I also submitted a minor patch to Satchmo
to fix a divide-by-zero error in the product rating code (Satchmo
changeset 1372). The ecommerce site never got off the ground (graduate
school got in the way of the project leader, and we decided to
postpone it), but I gained a great appreciation for Django. Having
patches accepted into two separate codebases has given me the
confidence to delve deeper into the code of various projects to
understand how pieces fit together. The sprint made me feel like part
of the community, and made me want to make further contributions.

Graham Dumpleton

unread,
Apr 24, 2009, 8:53:42 PM4/24/09
to Django developers
That shouldn't be the case. Supporting X-Sendfile was I recall
actually an after thought. The original intent was to be able to make
use of mod_python req.sendfile() or WSGI wsgi.file_wrapper mechanisms.

Graham

Jacob Kaplan-Moss

unread,
Apr 24, 2009, 10:04:41 PM4/24/09
to django-d...@googlegroups.com
On Fri, Apr 24, 2009 at 7:53 PM, Graham Dumpleton
<Graham.D...@gmail.com> wrote:
> That shouldn't be the case. Supporting X-Sendfile was I recall
> actually an after thought. The original intent was to be able to make
> use of mod_python req.sendfile() or WSGI wsgi.file_wrapper mechanisms.

Right; I think the idea should be to support "fast file sending" using
some sort of pluggable mechanism that works with req.sendfile,
wsgi.file_wrapper, or whatever happens to work best for the particular
installation.

Jacob

Carl Meyer

unread,
Apr 27, 2009, 10:37:11 AM4/27/09
to Django developers
Hi Chris,

On Apr 24, 7:47 pm, ccahoon <chris.cah...@gmail.com> wrote:
> Tickets #6527, #7581, and #2504 deal with issues in HttpResponse. They
> all have code but decisions need to be made about how to keep
> middleware from interfering with or breaking HttpResponses in certain
> instances, and about sending out content via iterators (e.g. what
> servers/modules support chunked transfer encoding, how do we handle
> those that do not?).

#9163 might be worth brief consideration in this category too. The
name and category make it look like primarily a contrib-CSRF issue,
but there are more general design questions there about how content-
modifying middlewares interact with ETag headers; whose responsibility
is it to keep an ETag up-to-date?

Carl
Reply all
Reply to author
Forward
0 new messages