[GSOC] Template Compilation

Alex Gaynor

unread,

Mar 3, 2010, 6:29:31 PM3/3/10

to django-d...@googlegroups.com

Django Template Compilation
===========================

About Me
~~~~~~~~

I'm a sophomore computer science student at Rensselaer Polytechnic Institute.
I'm a frequent contributor to Django (including last year's successful multiple
database GSoC project) and other related projects; I'm also a commiter on both
`Unladen Swallow <http://code.google.com/p/unladen-swallow/>`_ and
`PyPy <http://codespeak.net/pypy/dist/pypy/doc/>`_.

Background
~~~~~~~~~~

I've spent quite a bit of time reviewing alternate template implementations,
particularly Spitfire and Jinja, as well as speaking with people like Armin
Ronacher (the author of Jinja). I'm also involved in multiple alternative VMs
for Python, so I'm extremely familiar with the performance characteristics of
various Python operations.

Plan
~~~~

Compile Django templates into Python functions.

Rationale
~~~~~~~~~

Currently the Django template language exists at a level above the Python
interpreter, and interprets Django templates. This leads to Django templates
being particularly slow compared to implementations of template languages that
compile templates into Python code.

Method
~~~~~~

Templates will be compiled by turning each template into a series of functions,
one per block (note that the base template from which other templates extend is
a single function, not one per block). This is accomplished by taking the tree
of ``Node`` objects which currently exist for a template and translating it
into an alternate representation that more closely mirrors the structure of
Python code, but which still has the semantics of the template language. For
example, the new tree for a loop using the ``{% for %}`` tag would become a for
loop in Python, plus assignments to set up the ``{{ forloop }}`` variable that
the ``{% for %}`` tag provides. The semantics of Python code is that variables
assigned in a for loop exist beyond the loop itself, including the looping
variable. Django templates, however, pop the top layer from the context stack
at the end of a for loop. This intermediate representation uses the scoping of
Django templates.

After an intermediate representation is created a compiler is invoked which
translates the IR into Python code. This handles the details of Django template
scoping, spilling variables in the event of conflicts, and calling template tag
functions.

An important feature of Django templates is that users can write template tags
which have access to the full context, including the ability to modify the
context. In order to maintain backwards compatibility with existing template
tags, we must create a template context object whenever an uncompilable
template tag is used, and mirror any changes made to the context in the
function's locals.

This presents a complication, as we forfeit the speed benefits of a compiled
template (lookups of a Python local are a single index in a C array) and must
perform a dictionary lookup for every variable. Unfortunately, mirroring a
context dictionary back into local variables requires maintaining a dictionary
of arbitrary names to values, which can't be efficiently implemented with
Python's locals (use of ``exec`` causes locals to degrade to dictionary
lookups). Furthermore, constructing a dictionary of the full context requires
additional effort.

To provide an optimal solution we must know which variables a given template
tag needs, and which variables it can mutate. This can be accomplished by
attaching a new class attribute to ``Nodes`` and passing only those values to
the class, (instead of the full context dictionary). Subsequently, we would
only need to mirror a few given values into the locals, and since these are
known names, we avoid degrading the local lookups into dictionaries. Old-style
``Nodes`` will continue to work, but in a less efficient manner.

Alternate ideas
---------------

James Bennett has suggested making template compilation something that is
opt-in by use of a custom template loader. The major advantage of this approach
is that we can require all ``Nodes`` to specify the names they use, and thus
not have to deal with mirroring the locals back and forth, which drastically
reduces the possibility of performance regressions. I am in favor of going
with this proposal as it gives us a clear transition between the current
implementation and the future (see next section).

Long Term
---------

In the short term almost all template tags will be old-style (consider the
number of tags using the "as X" idiom to modify the context). However, a long
term goal should be to move to exclusively compiled template tags. This has
several benefits:

* Fewer codepaths to maintain
* No chance for differing semantics (this follows from the first)
* More consistant performance.

One of the critical goals of this project will therefore be to develop the API
for template tags in a manner where old-style tags can easily migrate to the
new form.

Examples
~~~~~~~~

The following are some examples of what I'd expect a compiled template to look
like:

.. sourcecode:: html+django

{% for i in my_list %}
{% if i|divisibleby:2 == 0 %}
{{ i }}
{% endif %}
{% endfor %}

.. sourcecode:: python

def templ(context, divisibleby=divisibleby):
my_list = context.get("my_list")
_loop_len = len(my_list)
result = []
for forloop, i in enumerate(my_list):
forloop = {
"counter0": forloop,
"counter": forloop+1,
"revcounter": _loop_len - i,
"revcounter0": _loop_len - i - 1,
"first": i == 0,
"last": (i == _loop_len - 1),
}
if divisibleby(i, 2) == 0:
result.append(force_unicode(i))
return "".join(result)

For comparison here is the performnace of these 2::

>>> %timeit t.render(Context({"my_list": range(1000)}))
10 loops, best of 3: 38.2 ms per loop
>>> %timeit templ(Context({"my_list": range(1000)}))
100 loops, best of 3: 3.63 ms per loop

That's a 10-fold improvement!

Timeline
~~~~~~~~

* 1 week -- develop a benchmark suite of templates, composed of real world
templates as well as microbenchmarks, as well as simple scripts for regular
testing of compatibility and performance. On this particular point I'm
hoping the community will be able to help with providing example templates
and fixture data for benchmarking and compatibility testing.
* 3 weeks -- develop the frontend portion of this, code which translates
Django's included template tags into the IR.
* 1 week -- developing the internal IR generation API.
* 2 weeks -- hooking up all of Django's template tags to actually use it.
* 5 weeks -- develop the backend code generator. This takes the IR and
translates it into Python, including handling the semantic changes.
* 2 weeks -- basic code generation support. Does nothing but generate
code that looks exactly like what's already executed, this means variable
lookups are still lookups in a ``Context`` dictionary.
* 3 weeks -- optimize known names into local variables at the python level,
based on speaking with Armin Ronacher this is critical to getting good
performance. This basically involves implementing something similar to a
compiler's register allocator, except we aren't constrained by number of
names, but rather by what the names are.
* 2 weeks -- time set aside for dealing with bugs, corner cases, and anything
else.
* 1 week -- Explore possibility for additional optimizations, eliminating
duplicate values (for example removing unused ``{{ forloop }}`` variables),
allowing an external app to provide "type" data to IR nodes such that
variable lookups could be resolved as indexing vs attribute lookup at
compile time.

Goals
~~~~~

As with any good project we need some criteria by which to measure success:

* Successfully compile complete (real world) templates.
* Speed up templates. For reference purposes, Jinja2 is about 10-20x faster
than Django templates. My goal is to come within a factor of 2-3 of this.
* Complete backwards compatibility.
* Develop a complete porting guide for old-style template tags to minimize any
pain in the transition.

Any thoughts, feedback, cometary, Nobel prize nominations, and death
threats are welcome!
Alex

--
"I disapprove of what you say, but I will defend to the death your
right to say it." -- Voltaire
"The people's good is the highest law." -- Cicero
"Code can always be simpler than you think, but never as simple as you
want" -- Me

Russell Keith-Magee

unread,

Mar 4, 2010, 3:41:42 AM3/4/10

to django-d...@googlegroups.com

On Thu, Mar 4, 2010 at 7:29 AM, Alex Gaynor <alex....@gmail.com> wrote:
> Django Template Compilation
> ===========================

First up, this looks like a solid proposal Alex. Anything that speeds
up template generation is good, and if your indicative stats turn out
to be representative of real-world performance, it would be a major
gain for Django.

I have a couple of quick concerns that pop out. In all likelihood,
they aren't actually problems, just areas that require clarification
in your proposal.

> Templates will be compiled by turning each template into a series of functions,
> one per block (note that the base template from which other templates extend is
> a single function, not one per block).

...

> After an intermediate representation is created a compiler is invoked which
> translates the IR into Python code. This handles the details of Django template
> scoping, spilling variables in the event of conflicts, and calling template tag
> functions.

You don't ever say this explicitly, but I presume that the plan is to
do this as an in-memory construction -- that is, the intermediate
representation is handled in-memory, not as a python file on disk.
You're proposing to writing a (2 pass?) compiler, not a code
generator, right?

> To provide an optimal solution we must know which variables a given template
> tag needs, and which variables it can mutate. This can be accomplished by
> attaching a new class attribute to ``Nodes`` and passing only those values to
> the class, (instead of the full context dictionary). Subsequently, we would
> only need to mirror a few given values into the locals, and since these are
> known names, we avoid degrading the local lookups into dictionaries. Old-style
> ``Nodes`` will continue to work, but in a less efficient manner.

The term "less efficient" needs clarification here -- less efficient
than a compiled node that has been specifically optimized for
compilation, or less efficient than existing template rendering? It's
completely understandable if a node doesn't render as fast as it's
optimized and compiled counterparts, but I would consider it a failure
if a compiled template resulted in a slowdown when compared to
existing non-compiled rendering speed.

> Alternate ideas
> ---------------
>
> James Bennett has suggested making template compilation something that is
> opt-in by use of a custom template loader. The major advantage of this approach
> is that we can require all ``Nodes`` to specify the names they use, and thus
> not have to deal with mirroring the locals back and forth, which drastically
> reduces the possibility of performance regressions. I am in favor of going
> with this proposal as it gives us a clear transition between the current
> implementation and the future (see next section).

I concur. The template system is a criticial part of enough Django
stacks that I'd like to see this as an explicit opt-in feature.

> Long Term
> ---------
>
> In the short term almost all template tags will be old-style (consider the
> number of tags using the "as X" idiom to modify the context). However, a long
> term goal should be to move to exclusively compiled template tags. This has
> several benefits:
>
> * Fewer codepaths to maintain
> * No chance for differing semantics (this follows from the first)
> * More consistant performance.
>
> One of the critical goals of this project will therefore be to develop the API
> for template tags in a manner where old-style tags can easily migrate to the
> new form.

I'd like to see a little more elaboration here - exactly what do you
mean by an "API for template tags"? Do you have any initial thoughts
on what this would constitute this API? What sort of problems need to
be addressed or solved? Does this have any crossover with the eternal
pony to "make template tags easier to write?"

> Goals
> ~~~~~
>
> As with any good project we need some criteria by which to measure success:
>
> * Successfully compile complete (real world) templates.
> * Speed up templates. For reference purposes, Jinja2 is about 10-20x faster
> than Django templates. My goal is to come within a factor of 2-3 of this.
> * Complete backwards compatibility.
> * Develop a complete porting guide for old-style template tags to minimize any
> pain in the transition.

Looks like an interesting proposal, and certainly one with aims that
make it attractive for Django's core (assuming you can deliver on the
performance gains you suggest are possible).

Yours,
Russ Magee %-)

Alex Gaynor

unread,

Mar 4, 2010, 11:26:29 AM3/4/10

to django-d...@googlegroups.com

My plan was an in-memory construction, but there's no reason once we
have the infrastructure in place that we couldn't generate templates
to disk. As for a compiler vs. a code generator, a compiler *is* a
code generator, so I'm not sure I see the distinction being made.

>> To provide an optimal solution we must know which variables a given template
>> tag needs, and which variables it can mutate. This can be accomplished by
>> attaching a new class attribute to ``Nodes`` and passing only those values to
>> the class, (instead of the full context dictionary). Subsequently, we would
>> only need to mirror a few given values into the locals, and since these are
>> known names, we avoid degrading the local lookups into dictionaries. Old-style
>> ``Nodes`` will continue to work, but in a less efficient manner.
>
> The term "less efficient" needs clarification here -- less efficient
> than a compiled node that has been specifically optimized for
> compilation, or less efficient than existing template rendering? It's
> completely understandable if a node doesn't render as fast as it's
> optimized and compiled counterparts, but I would consider it a failure
> if a compiled template resulted in a slowdown when compared to
> existing non-compiled rendering speed.
>

less efficient than existing rendering, for that specific node. This
is because the Context dictionary would have to be rematerialized for
each and every old-style node in the template.

Right now the runtime API for template tags is the render() method of
nodes. This takes the current context, and can mutate it however it
wants, read any data from the context, and then return some string.
Now, every template tag I've ever seen knows at compile time what
values will be read and written at compile now, however there's no way
for the template system to know what these names are. Therefore we
want to transition to a runtime API that can provide this info to the
system.

>> Goals
>> ~~~~~
>>
>> As with any good project we need some criteria by which to measure success:
>>
>> * Successfully compile complete (real world) templates.
>> * Speed up templates. For reference purposes, Jinja2 is about 10-20x faster
>> than Django templates. My goal is to come within a factor of 2-3 of this.
>> * Complete backwards compatibility.
>> * Develop a complete porting guide for old-style template tags to minimize any
>> pain in the transition.
>
> Looks like an interesting proposal, and certainly one with aims that
> make it attractive for Django's core (assuming you can deliver on the
> performance gains you suggest are possible).
>
> Yours,
> Russ Magee %-)
>

> --
> You received this message because you are subscribed to the Google Groups "Django developers" group.
> To post to this group, send email to django-d...@googlegroups.com.
> To unsubscribe from this group, send email to django-develop...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.

Russell Keith-Magee

unread,

Mar 4, 2010, 6:58:28 PM3/4/10

to django-d...@googlegroups.com

On Fri, Mar 5, 2010 at 12:26 AM, Alex Gaynor <alex....@gmail.com> wrote:
> On Thu, Mar 4, 2010 at 3:41 AM, Russell Keith-Magee
> <freakb...@gmail.com> wrote:
>> On Thu, Mar 4, 2010 at 7:29 AM, Alex Gaynor <alex....@gmail.com> wrote:

>>> Templates will be compiled by turning each template into a series of functions,
>>> one per block (note that the base template from which other templates extend is
>>> a single function, not one per block).
>> ...
>>> After an intermediate representation is created a compiler is invoked which
>>> translates the IR into Python code. This handles the details of Django template
>>> scoping, spilling variables in the event of conflicts, and calling template tag
>>> functions.
>>
>> You don't ever say this explicitly, but I presume that the plan is to
>> do this as an in-memory construction -- that is, the intermediate
>> representation is handled in-memory, not as a python file on disk.
>> You're proposing to writing a (2 pass?) compiler, not a code
>> generator, right?
>>
>
> My plan was an in-memory construction, but there's no reason once we
> have the infrastructure in place that we couldn't generate templates
> to disk. As for a compiler vs. a code generator, a compiler *is* a
> code generator, so I'm not sure I see the distinction being made.

Ok - sure - a compiler is a code generator. What I'm driving at is
that you're not proposing that we move to a Cheetah-style model. The
output of the compiler isn't something the end user should ever need
to see, and won't ever see under normal circumstances. The compiled
output isn't something the user will (under normal conditions) have
the opportunity of modifying and editing, and they won't need to have
makefiles or some similar construct to make sure their compiled
templates are up to date. Django template use isn't going to devolve
into "import mytemplate".

>>> To provide an optimal solution we must know which variables a given template
>>> tag needs, and which variables it can mutate. This can be accomplished by
>>> attaching a new class attribute to ``Nodes`` and passing only those values to
>>> the class, (instead of the full context dictionary). Subsequently, we would
>>> only need to mirror a few given values into the locals, and since these are
>>> known names, we avoid degrading the local lookups into dictionaries. Old-style
>>> ``Nodes`` will continue to work, but in a less efficient manner.
>>
>> The term "less efficient" needs clarification here -- less efficient
>> than a compiled node that has been specifically optimized for
>> compilation, or less efficient than existing template rendering? It's
>> completely understandable if a node doesn't render as fast as it's
>> optimized and compiled counterparts, but I would consider it a failure
>> if a compiled template resulted in a slowdown when compared to
>> existing non-compiled rendering speed.
>>
>
> less efficient than existing rendering, for that specific node. This
> is because the Context dictionary would have to be rematerialized for
> each and every old-style node in the template.

That's what I expected; I just wanted to make sure we were on the same page.

Yours,
Russ Magee %-)

Alex Gaynor

unread,

Mar 4, 2010, 7:02:29 PM3/4/10

to django-d...@googlegroups.com

On Thu, Mar 4, 2010 at 6:58 PM, Russell Keith-Magee

That's correct, although as I said there's no reason someone couldn't
utilize the compilation APIs to write them out to disk and do "import
mytemplate".

>>>> To provide an optimal solution we must know which variables a given template
>>>> tag needs, and which variables it can mutate. This can be accomplished by
>>>> attaching a new class attribute to ``Nodes`` and passing only those values to
>>>> the class, (instead of the full context dictionary). Subsequently, we would
>>>> only need to mirror a few given values into the locals, and since these are
>>>> known names, we avoid degrading the local lookups into dictionaries. Old-style
>>>> ``Nodes`` will continue to work, but in a less efficient manner.
>>>
>>> The term "less efficient" needs clarification here -- less efficient
>>> than a compiled node that has been specifically optimized for
>>> compilation, or less efficient than existing template rendering? It's
>>> completely understandable if a node doesn't render as fast as it's
>>> optimized and compiled counterparts, but I would consider it a failure
>>> if a compiled template resulted in a slowdown when compared to
>>> existing non-compiled rendering speed.
>>>
>>
>> less efficient than existing rendering, for that specific node. This
>> is because the Context dictionary would have to be rematerialized for
>> each and every old-style node in the template.
>
> That's what I expected; I just wanted to make sure we were on the same page.
>
> Yours,
> Russ Magee %-)
>

SmileyChris

unread,

Mar 4, 2010, 8:01:46 PM3/4/10

to Django developers

Would whitespace handling be identical to the current template system?

Alex Gaynor

unread,

Mar 4, 2010, 8:07:17 PM3/4/10

to django-d...@googlegroups.com

On Thu, Mar 4, 2010 at 8:01 PM, SmileyChris <smile...@gmail.com> wrote:
> Would whitespace handling be identical to the current template system?
>

> --
> You received this message because you are subscribed to the Google Groups "Django developers" group.
> To post to this group, send email to django-d...@googlegroups.com.
> To unsubscribe from this group, send email to django-develop...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
>
>

Yes, it would. All whitespace just becomes TextNodes, which just
become _result.append calls. I removed all the whitespace from my
benchmarking, because I didn't want to figure out how to place it in
the pure python version, but the real implementation it will be
trivially correct.

Jonathan S

unread,

Mar 5, 2010, 2:54:05 PM3/5/10

to Django developers

Just a few thoughts, this is my idea and I'm not an expert at
compilers, so forgive me if I'm somewhere wrong.

(1) For me, personally, I think the scoping of variables should be
kept inside a block.

{% with "x" as x %}
{% with "y" as x %}
{% endwith %}
{{ x }}
- you would print "y", because you don't pop the context, while
the current template engine would print "x" (no? i didn't test it)
{% endwith %}

(2) Another issue will occur during a rendering error. What if a
template tag throws an exception? The traceback will show the trace
through the generated code, instead of the original templates.

(3) You're code snippets in the examples you are giving are not
equivalent. The second omits at least four print statements, and what
is say in (4) below. So, the timing is not really worth that much.

(4) How would you compile {{ a.b }} ? I think this can be interpreted
as either a.b(), a[b], or a.b
Python is typed dynamically. So, everytime you want to read out
{{ i }}, this should be changed to:
var_i = i() if callable(i) else i

(5) Be sure that custom template tags, like {% filter escape %} ... {%
endfilter %} still work. A separate result[], to capture the content
of this tag, and apply the filter, is required.

(6) I think that the code generator / compiler you are planning to
write, will be hard to maintain and hard to debug. Because compilers
need to evaluate several ways of how your source code could be written
in the destination language, it should decide which route will mostly
lead to the fasted path at runtime... The dynamic behavior of python
will make it difficult. Compare that to the very clean template.Node
structure, which is understandable for Django developers.

Correct me if I'm somewhere wrong, but I think that the slowness of
the current Django template engine is caused mostly because of the
pushing/popping of contexts and the expensive Variable.resolve. I
think that precompiling the templates to python code and further into
binary, would at most result in 2x the performance.You're proposing to
change the context behavior, are you planning to change the resolve
method also, in order to improve the performance? That and giving up
on the custom template tags, makes me think this is just another,
maybe better, but not really compatible template language.

At the positive side: some things can be improved by the preprocessor.
{% spaceless %} ... {% endspaceless %} can be executed at runtime.

b.t.w. I wrote a compiler for a Python based template language once.
It worked pretty well, also by compiling the template to python
(*.pyc) binary files. (The project is almost dead now, because I
started using Django now for most of my websites.)
http://code.google.com/p/python-pages/

Alex Gaynor

unread,

Mar 5, 2010, 10:29:00 PM3/5/10

to django-d...@googlegroups.com

On Fri, Mar 5, 2010 at 2:54 PM, Jonathan S <jonathan...@gmail.com> wrote:
> Just a few thoughts, this is my idea and I'm not an expert at
> compilers, so forgive me if I'm somewhere wrong.
>
> (1) For me, personally, I think the scoping of variables should be
> kept inside a block.
>
> {% with "x" as x %}
> {% with "y" as x %}
> {% endwith %}
> {{ x }}
> - you would print "y", because you don't pop the context, while
> the current template engine would print "x" (no? i didn't test it)
> {% endwith %}
>

Yes, that's how python scoping works, however as I specifically noted
there's no reason we can't spill variable names to ensure that the
semantics remain unchanged.

> (2) Another issue will occur during a rendering error. What if a
> template tag throws an exception? The traceback will show the trace
> through the generated code, instead of the original templates.
>

I haven't given a ton of thought to this but if
settings.TEMPLATE_DEBUG we can insert debug markers into the source
(before each node starts rendering set an attribute about hte current
tag int he template is being rendered) then if an exception is raised
we can combine this bookkeeping data with the existing node structure.

> (3) You're code snippets in the examples you are giving are not
> equivalent. The second omits at least four print statements, and what
> is say in (4) below. So, the timing is not really worth that much.
>

The versions of those templates I did benchmarks on both omitted
whitespace, they were completely equivalent as far as I can tell.

> (4) How would you compile {{ a.b }} ? I think this can be interpreted
> as either a.b(), a[b], or a.b
> Python is typed dynamically. So, everytime you want to read out
> {{ i }}, this should be changed to:
> var_i = i() if callable(i) else i
>

Variable lookups become calls to the runtime, no reason to reinvent
the semantics.

> --
> You received this message because you are subscribed to the Google Groups "Django developers" group.
> To post to this group, send email to django-d...@googlegroups.com.
> To unsubscribe from this group, send email to django-develop...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
>
>

Alex

Reply all

Reply to author

Forward