[GSoC] Django Template Compilation rev.2

104 views
Skip to first unread message

xtrqt

unread,
Mar 29, 2011, 11:18:03 PM3/29/11
to django-d...@googlegroups.com
Django Template Compilation rev.2
=================================
About Me 
~~~~~~~~ 
I'm student of last year of Technical University of Lodz, Poland on faculty
of electronic engineering and computer science, while now in parallel I'm
doing my second diplom of electronic engineering on Polytech de Nantes in
France.I've been using python for 8 years, and after getting totally
frustrated withphp and it's frameworks, I decided to choose something else
for doing my webdev gigs. I can say I'm with django from 0.96 version. I
was always reader never commiter, maybe I was too scared to develop
somethingand than made people believe that your idea is right. I decided to
change that,and take part of django - the framework that made me like web
development ;)I know that it's a bit late to prove my qualities, but I
believe I can succeedin this project. I hope research I've made for this
proposal will convince you ;)

Background 
~~~~~~~~~~ 
It is one year since Alex Gaynor published the first version of this
proposal. I've tried to track code that Alex reviewed and also after deep
analyze of  existing django code itself, I found that improvement in the
field of template generation can be huge, not only by simple compilation
of them.

Plan 
~~~~ 
Compile Django templates into Python functions, cache produced code to
speed up template generation. Optionally compile templates to machine code.
 
Rationale 
~~~~~~~~~ 
Still the Django template language exists at a level above the Python 
interpreter, and interprets Django templates. As it was agreed this makes 
template generation slow. We could optimise this process in two ways. 
Preprocessing template source as much as possible (in our case we do 
compilation), and reducing time required to access this preprocessed date 
(my idea is to allow compiled code to be cached) 
 
Method 
~~~~~~ 
As proposal from the last year, haven't been rejected, and nothing has been 
implemented in that matter from that time. I will let me cite Alex Gaynor's 
method which I fully support and want to implement.

Alex Gaynor wrote:
    Templates will be compiled by turning each template into a series of
    functions, one per block (note that the base template from which other
    templates extend is a single function, not one per block). This is
    accomplished by taking the tree of ``Node`` objects which currently
    exist  for a template and translating it into an alternate
    representation that  more closely mirrors the structure of Python
    code, but which still has the semantics of the template language. For
    example, the new tree for a loop  using the ``{% for %}`` tag would
    become a for loop in Python, plus  assignments to set up the ``{{
    forloop }}`` variable that the ``{% for %}`` tag provides. The
    semantics of Python code is that variables assigned in a for loop
    exist beyond the loop itself, including the looping variable. Django
    templates, however, pop the top layer from the context stack at the
    end of a for loop. This intermediate representation uses the scoping
    of  Django templates.
    
    After an intermediate representation is created a compiler is invoked
    which translates the IR into Python code. This handles the details of
    Django template scoping, spilling variables in the event of conflicts,
    and calling template tag functions. An important feature of Django
    templates is that users can write template tags which have access to
    the full context, including the ability to modify the context.  In
    order to maintain  backwards compatibility with existing template
    tags, we must create  a template context object whenever an
    uncompilable template tag is used, and mirror any changes made to the
    context in the function's locals.

    This presents a complication, as we forfeit the speed benefits of a
    compiled template (lookups of a Python local are a single index in a C
    array) and must perform a dictionary lookup for every variable.
    Unfortunately, mirroring a context dictionary back into local
    variables  requires maintaining a dictionary of arbitrary names to
    values, which  can't be efficiently implemented with Python's locals
    (use of ``exec``  causes locals to degrade to dictionary lookups).
    Furthermore, constructing a dictionary of the full context requires
    additional effort.

    To provide an optimal solution we must know which variables a given
    template tag needs, and which variables it can mutate. This can be
    accomplished by attaching a new class attribute to ``Nodes`` and
    passing only those values to the class, (instead of the full context
    dictionary).  Subsequently, we would only need to mirror a few given
    values into the  locals, and since these are known names, we avoid
    degrading the local  lookups into dictionaries. Old-style ``Nodes``
    will continue to work,  but in a less efficient manner.


As we all know, there is a need to keep every thing backward compatible, so
as it was mentioned in orginal proposal. We should develop that as a new
custom loader, which would use all performance benefits of compilation. And
during  some period of time every one could decide to use old-backend, or
new-backend  with the need to update code for custom nodes. Like it was
done with  newform/oldform change. In this point we can also imagine
compatibility mode  in which generation of old nodes would trigger old
template generation.


I belive that templates should be processed with this steps.

    1. Parsing string representation
    2. Creating AST.

In fact first two steps are made so far with already existing template 
engine. NodeList is kind of AST. 
    
    3. Creating IR (Intermediate Representation) for AST
    4. Generating Python code and inline compilation
    
Creating IR from AST should allow further optimizations, like reducing
dead code.

    5. Optional: Cython or Psyco compilation 
    
Since we are generating Python code from scratch we could generate it with
Cython language extension to define some variables as simpler `C` types.
This  would allow us to compile them to machine code. Other way could be
incorporating Psyco. As we also need to take care of restricted environment
users like `GoogleAppEngine`, this feature would be optional.

    6. Caching resulting code
    
For now the default Django behavior is not cache template NodeList. These
objects are created each time we call view. For greater speed improvement
we should consider, that templates doesn't change during server execution.
Of  course somekind of API to reload/auto-reload cache should be
implemented.

    7. Code is being fetched from cache, and with context variable it 
       generates page.
    
Not much to comment, but we should notice that every page reload is only a 
matter of executing point number 7.

Regarding Armin Ronacher's proposal, I believe that for now, Django should
still contribute to it's own well known template system. Building
compilation mechanism dedicated for use with Django allows Django community
have greater  control over architecture and the way it works. Django was
always [as long as I remember] 'battery included' framework. In my opinion
building template engine on the base of external library in external
repository could begin the process  of dividing Django into small
independent blocks, which in some point  of time, will stop to play nicely
with each other. Also important here is matter of tracking and fixing bugs,
providing patches and taking responsibility for  any problems.

Building pluggable infrastructure of application is important, but template 
module is still much core component, and should be developed inside the 
community.


Alex Gaynor's example
~~~~~~~~~~~~~~~~~~~~~
The following are some examples of what I'd expect a compiled template to 
look like: 
.. sourcecode:: html+django 
    {% for i in my_list %} 
        {% if i|divisibleby:2 == 0 %} 
            {{ i }} 
        {% endif %} 
    {% endfor %} 
.. sourcecode:: python 
    def templ(context, divisibleby=divisibleby): 
        my_list = context.get("my_list") 
        _loop_len = len(my_list) 
        result = [] 
        for forloop, i in enumerate(my_list): 
            forloop = { 
                "counter0": forloop, 
                "counter": forloop+1, 
                "revcounter": _loop_len - i, 
                "revcounter0": _loop_len - i - 1, 
                "first": i == 0, 
                "last": (i == _loop_len - 1), 
            } 
            if divisibleby(i, 2) == 0: 
                result.append(force_unicode(i)) 
        return "".join(result) 
For comparison here is the performnace of these 2:: 
    >>> %timeit t.render(Context({"my_list": range(1000)})) 
    10 loops, best of 3: 38.2 ms per loop 
    >>> %timeit templ(Context({"my_list": range(1000)})) 
    100 loops, best of 3: 3.63 ms per loop 
That's a 10-fold improvement! 

Timeline 
~~~~~~~~ 
 * 1 week -- develop a benchmark suite of templates and tests for comparing
   compatibility
 * 3 weeks -- develop the frontend portion of this, code which translates 
   Django's included template tags into the IR. 
   * 1 week -- developing the internal IR generation API. 
   * 2 weeks -- hooking up all of Django's template tags to actually use 
     it. 
 * 4 weeks -- develop the backend code generator.  This takes the IR and 
   translates it into Python, including handling the semantic changes. 
   * 2 weeks -- basic code generation support. Does nothing but generate 
     code that looks exactly like what's already executed, this means 
     variable lookups are still lookups in a ``Context`` dictionary. 
   * 2 weeks -- optimize known names into local variables at the python 
       level. 
 * 2 weeks -- time set aside for dealing with bugs, corner cases, and 
   anything else. 
 * 1 week -- Explore possibility for additional optimizations, eliminating 
   duplicate values (for example removing unused ``{{ forloop }}`` 
   variables), allowing an external app to provide "type" data to IR nodes 
   such that variable lookups could be resolved as indexing vs attribute 
   lookup at compile time. 
 * 1 week -- Explore speed gain with use of machine code compilation 
   (psyco/cython) and use of different backend's for storing/caching 
   compiled code objects.
   
   
Goals 
~~~~~ 
As with any good project we need some criteria by which to measure success: 
 * Successfully compile complete (real world) templates. 
 * Speed up templates. For reference purposes, Jinja2 is about 10-20x 
   faster than Django templates.  My goal is to come within a factor of 
   2-3 of this. 
 * Complete backwards compatibility. 
 * Develop a complete porting guide for old-style template tags to minimize
   any pain in the transition. 

   
As I wrote in the title this is Alex Gaynor proposal revision 2. It is not 
completly new idea. After analyzing everything, I just wanted to add few 
things from me. And as I read django-developers group yesterday, this 
project is quite popular, which worries me to some point, because I'm 
determined to work on that this summer. So now, how should I convince you
to hand me this job ;)?

Please post any questions or comments I'll be glad to reply.

Contact with me, by standard means
email: jan.rzepecki (at) gmail (dot) com
jabber/gtalk: same as above
irc: i've just started to idle everyday on #django and #django-dev on nick 
`xtrqt` ( it is also my nick on django tracker.)

PS I'm sorry for starting third thread on that issue, but I thought that 
one-thread-one-proposal is fair to everyone applying.

Jonathan Slenders

unread,
Mar 30, 2011, 11:06:04 AM3/30/11
to Django developers

> Alex Gaynor wrote:
>
>     ... The
>     semantics of Python code is that variables assigned in a for loop
>     exist beyond the loop itself, including the looping variable. Django
>     templates, however, pop the top layer from the context stack at the
>     end of a for loop. This intermediate representation uses the scoping
>     of  Django templates.

You should map the context altering template tags to Python scopes,
like

from:
{% with a as b %} ... {{ b }} {% endwith %}

to:

def __(b):
print b
__(a)

this may be a little overhead, but I think it's the best way to
overcome the side-effects of assignments.



> Alex Gaynor's example
> ~~~~~~~~~~~~~~~~~~~~~
> .. sourcecode:: html+django
>     {% for i in my_list %}
>         {% if i|divisibleby:2 == 0 %}
>             {{ i }}
>         {% endif %}
>     {% endfor %}
> .. sourcecode:: python
>     def templ(context, divisibleby=divisibleby):
>         my_list = context.get("my_list")
>         _loop_len = len(my_list)
>         result = []
>         for forloop, i in enumerate(my_list):
>             forloop = {
>                 "counter0": forloop,
>                 "counter": forloop+1,
>                 "revcounter": _loop_len - i,
>                 "revcounter0": _loop_len - i - 1,
>                 "first": i == 0,
>                 "last": (i == _loop_len - 1),
>             }
>             if divisibleby(i, 2) == 0:
>                 result.append(force_unicode(i))
>         return "".join(result)

In the perfect world, you would be able to know at compile-time which
variables are used in the inner scope. (Custome template tags don't
report which context variables they access. Imho, they should only be
able to access variables which have been passed as template tag
parameters, but they often access 'request.user' and other variables
without asking.)
In this example, you know for sure that the {{ forloop }} variable has
not been used, so there's no need to build this object.

>     def templ(context, divisibleby=divisibleby):
>         my_list = context.get("my_list")
>         _loop_len = len(my_list)
>         result = []
>         for i in my_list:
>             if i % 2 == 0:
>                 result.append(i)
>         return "".join(map(force_unicode, result))

I translated the devisibleby filter by "i % 2 == 0". This should be
possible. Allow template filters to be implemented as a built-in which
renders python code. Doing this for all the core template tags may
significantly improve speed. My implementation of the divisibleby
filter is: [1]

> @register_native_template_filter('divisibleby')
> def divisibleby(generator, subject, arg):
> """ {{ var|divisibleby:3 }} """
> return '(%s %% %s == 0)' % (subject, generator.convert_variable(arg))

[1] https://github.com/citylive/django-template-preprocessor/blob/master/src/template_preprocessor/render_engine/render.py#L1388

Further, don't forget to support the {% filter %} tag scenario. I
don't know whether using a 'result' variable is the best way. But if
it is, you could turn a filter tag into the following code and nest in
your other generated code. So, it'll override 'result' in the inner
scope.

> def _f():
> result = []
> ...
> return my_filter''.join(result))
> result.append(_f())

Isn't context.my_list faster than context.get("my_list") ?




It's interesting to see someone else taking the challange of writing a
template-to-python compiler. :)
Cheers,
Jonathan

akaariai

unread,
Mar 30, 2011, 12:32:44 PM3/30/11
to Django developers

On Mar 30, 6:18 am, xtrqt <jan.rzepe...@gmail.com> wrote:
>     def templ(context, divisibleby=divisibleby):
>         my_list = context.get("my_list")
>         _loop_len = len(my_list)
>         result = []
>         for forloop, i in enumerate(my_list):
>             forloop = {
>                 "counter0": forloop,
>                 "counter": forloop+1,
>                 "revcounter": _loop_len - i,
>                 "revcounter0": _loop_len - i - 1,
>                 "first": i == 0,
>                 "last": (i == _loop_len - 1),
>             }
>             if divisibleby(i, 2) == 0:
>                 result.append(force_unicode(i))
>         return "".join(result)
> For comparison here is the performnace of these 2::
>     >>> %timeit t.render(Context({"my_list": range(1000)}))
>     10 loops, best of 3: 38.2 ms per loop
>     >>> %timeit templ(Context({"my_list": range(1000)}))
>     100 loops, best of 3: 3.63 ms per loop
> That's a 10-fold improvement!

I did a little test by adding localize(i) in there. On my computer the
time went to around 25ms. For datetimes the time needed is somewhere
around 100ms. If you could inline the localize(i) call for the integer
case you would get back to around 4ms, as it doesn't actually do
anything else than return force_unicode(i)... So, when designing
template compilation it is essential to see how the localization stuff
could be made faster, else much of the benefit will be lost. It seems
that at least for this test case localization uses over 50% of the
time, so there would be bigger gain in making localization faster than
in making compiled templates.

- Anssi
Reply all
Reply to author
Forward
0 new messages