GSoC 2015: Template engine optimisation

310 views
Skip to first unread message

Oleksii Oleksenko

unread,
Mar 11, 2015, 12:52:55 PM3/11/15
to django-d...@googlegroups.com
Hi,

My name is Oleksii Oleksenko, I'm a master student in Distributed System Engineering at TU Dresden (Germany) and I want to participate in GSoC by contributing into your project. I decided to apply to Django because Python is my main and favorite programming language and I work mostly in the field of Web. 

Among all ideas, 'Template engine optimization' is most interesting for me. I have some questions regarding this task. 

1. Do I have to consider only django.template or any other modules may influence template rendering too?

2. What should I use for profiling? Is any profiling library is common for django (like cProfile)? Or I can use anything I want?

 3. Considering benchmark, is there any special cases (e.g. some parts of template language) that are known to be especially slow? Or is it my task to find them?

 Thanks,
 Oleksii

Oleksii Oleksenko

unread,
Mar 11, 2015, 5:40:39 PM3/11/15
to django-d...@googlegroups.com
Also, I would like to know, if I understand this task correctly. Here is how I see the implementation.

It will consist of the following pars:
  • Test suite 
    • write templates for all types of template constructions: 
      • variables 
      • filters 
      • inherited templates 
      • method calls 
      • etc 
      • all combinations of them. 
    • write contexts for each of these templates 
  • Profiling 
    • run profiler (cProfile?) on all of these tests. As I see, basic algorithm (pseudo code) will look like this: 
for context, template in zip(contexts, templates):
    t = Template(template)
    c = Context(context)
    start profiler
    t.render(c)
    stop profiler
    store profiling results
  • when I have all results, I'll have to analyse them and try to optimize rendering.
Is it correct?

Thanks,
Oleksii


среда, 11 марта 2015 г., 17:52:55 UTC+1 пользователь Oleksii Oleksenko написал:

Shai Berger

unread,
Mar 11, 2015, 5:53:06 PM3/11/15
to django-d...@googlegroups.com
On Wednesday 11 March 2015 23:40:39 Oleksii Oleksenko wrote:
> - Profiling
> - run profiler (cProfile?) on all of these tests. As I see, basic
> algorithm (pseudo code) will look like this:
>
> for context, template in zip(contexts, templates):
> t = Template(template)
> c = Context(context)
> start profiler
> t.render(c)
> stop profiler
> store profiling results
>

The profiler can be useful when you're trying to optimize a given piece of
code, but it also changes your measurements; the most obvious example is,
since the profiler "takes notes" on every function call, code with a lot of
function calls will look worse in the profiler than it really is. So, for
identifying which piece of code to optimize -- for your very initial
performance analysis -- I would just use timeit.

HTH,
Shai.

Preston Timmons

unread,
Mar 12, 2015, 12:31:58 AM3/12/15
to django-d...@googlegroups.com
Hi Oleksii,

I found that cProfile isn't that helpful when rendering templates. There are a lot
of function calls and the output is too verbose to really reveal where Django
spends it's time.

Also, keep in mind that rendering is only one step of the template cycle, and
usually only a small part of it. There are these steps to consider:

* Template loading
* Lexing, parsing and compiling
* Rendering

Here are some recent benchmarks I've done on the template engine:


From what I can tell, if we compare Django templates to Jinja2, which are
considered quite fast, the biggest visible difference doesn't come because
Jinja2 has a faster parser or renderer. It's because it maintains an internal
cache. Jinja2 only recompiles templates when it has to.

Depending how things go with ticket #15053, internal caching might become
part of Django, though. If that's so, your proposal will need to hone in on 
identifying other specific areas you think performance can be improved.

The Django parser and lexer are parts that could be completely rewritten,
for example, while easily maintaining backward compatibility. Changing the
rendering layer is much more difficult because multiple 3rd-party libraries
depend on the Node class.

If you're serious about working on this, I suggest digging into the benchmarks,
identifying an area that can be improved, and providing a proposal for how
you think it can be made faster.

Good luck.

Preston

Curtis Maloney

unread,
Mar 12, 2015, 9:30:55 AM3/12/15
to django-d...@googlegroups.com
I have convinced my self [with absolutely no hard evidence, just familiarity with the code] that the template engine is overly cautious when it comes to ensuring values are strings and are escaped properly.

After a while I believe layers and layers of caution have accrued, and nobody is sure any more where these have overlapped excessively.

I believe a motivated person could create a call graph and find at least some cases where some of these cases could be removed to the benefit of rendering performance.

My prior experience with the template engine has convinced me there is little benefit to optimising the _parsing_ of templates, as anyone concerned about template parsing speeds should generally be using the caching template loader.  [Yes, I realise there are other cases, but I don't believe there is a lot left to optimise in the parsing -- happy to be proven wrong]

--
Curtis


--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/c16befb5-c12d-43cc-84a2-6931c245b8a5%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Preston Timmons

unread,
Mar 12, 2015, 10:57:35 AM3/12/15
to django-d...@googlegroups.com
After a while I believe layers and layers of caution have accrued, and nobody is sure any more where these have overlapped excessively.

Do you have examples of which layers these are?

Escaping seems to happen in Variable, VariableNode, FilterExpression, and render_value_in_context. I don't see a lot of work being done twice there.

If you think the escape implementation is slow, though, it wouldn't be hard to simply remove that and benchmark with no escape code running. That would at least reveal the theoretical limit to which the escape code could be improved.

On another note, in my benchmark I see the difference to render a somewhat complex template between Django and Jinja2 at 8-9 times.

From one run, just grabbing the minimum time to render a template:

Django: 9.08e-05
Jinja2: 1.38e-05

The big difference here is because Django uses a recursive node-based renderer, whereas Jinja2 just translates the template into Python. That means a lot less overhead when rendering happens.

Optimizing Django rendering means either:

1) Identifying the nodes which are a bottleneck and reducing the work they do
2) Replacing node rendering with something that's faster.

Option 2 probably has the biggest opportunity for gain, but it would require some real creativity to maintain backwards-compatibility with existing tags.

Preston

Sam Cooke

unread,
Mar 12, 2015, 11:39:26 AM3/12/15
to django-d...@googlegroups.com
I've done a couple of days of investigation into template performance recently trying to speed up our site and my main takeaway was that there was no silver bullet - no particular node taking up all of the time. I was mostly trying to optimise a particularly complicated template we render a lot in a loop so I was modifying the template, Django code and making custom tags to see what would make a difference rather than trying to decipher profiles. We use the cached template loader so compile time wasn't really considered.

We tried the following things and none of them made more than a couple of percent of difference each:

 - we made a cut down {% url %} tag that just does what we need - the built in tag can handle a lot more at a performance cost
 - I grouped {% with %} statements together - ideally grouping into {% include ... a=b c=d %} - to avoid extra layers of context
 - I ditched TextNodes that just contained whitespace (both by removing the whitespace and by automatically removing the Nodes on compile) - it's easy for these to build up when you have code like the following and every extra node slows things down a bit (whitespace is sometimes meaningful so we would have only implemented this for particular bits of code)

{% if whatever %}
   {{ my_var }}
{% endif %}

 - I also tried commenting out (or replacing with "return '' ") chunks of the template engine code and our template and it just seemed that the more I commented out, the faster it ran - no particular jumps in speed, just  a gradual change as more was removed. Escaping was one of the first things I commented out and it made a surprisingly small amount of difference.

I'd be very happy to be proven wrong but thought it was worth sharing my findings since - particularly in the context of Preston's suggestion that we might find a bottleneck - I don't think there is one particular bottleneck.

Our "solution" for now has been to speed up the processors in our servers and investigate switching to pypy - we'll probably be looking at Jinja2 once we upgrade to 1.8 as well.

Sam

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.

Preston Timmons

unread,
Mar 12, 2015, 1:06:55 PM3/12/15
to django-d...@googlegroups.com
I've done a couple of days of investigation into template performance recently trying to speed up our site and my main takeaway was that there was no silver bullet - no particular node taking up all of the time. I was mostly trying to optimise a particularly complicated template we render a lot in a loop so I was modifying the template, Django code and making custom tags to see what would make a difference rather than trying to decipher profiles. We use the cached template loader so compile time wasn't really considered.

Thanks, Sam. That's helpful information.

I'd be interested to know the template you used, or at least one representative of the template you used. Templates that highlight real-world pain points would help to benchmark the things that matter.

Did you apply all the internal Django changes you mentioned cumulatively? If so, what was your overall rendering time improvement? How did the speedups from changing nodes compare to the speedups from modifying the engine? It sounds like the engine changes were a factor because of multiple calls to the include tag.

While the changes you mentioned don't sound like ones that could go into Django, they do shed light on what can potentially be done or not for optimization.

Preston

Sam Cooke

unread,
Mar 13, 2015, 7:21:30 AM3/13/15
to django-d...@googlegroups.com
Preston - I'll send the template to you directly, I'm not sure how useful it will be so I don't want to spend time checking if it's fine for public consumption unnecessarily.

The test template we were using to test the performance was a simple:

{% for item in item_list %}{% include "item.html" %}{% endfor %}

Where "item.html" is the complex one (for a real world example, it's the "cards" you can see in the chart on https://www.mixcloud.com/tag/house/) and item_list is a list of 200 items (with all of the database queries done in the view). Averaging runs after the first run (which would include the compile) it was taking close to 1s per render. With the non-breaking changes (i.e. not commenting out necessary engine or template code) I managed to get it down to just under 0.9s. In both cases there was an overhead of around 0.2s (render time if I ran with the entire template commented out). Running the test with pypy (after warming it up) the render time was 0.5s. Sorry I don't have more exact numbers - this is just the headline notes I took away from the task.

As I removed chunks of template or engine I just found that anything I removed made a couple of percent of difference. The first thing I tried to optimise was the include tag and it only made a small difference - I even tried merging the "for", "include" and "with" tags into a new tag to avoid the 2 unnecessary contexts created by the two inner tags - my rough memory was that made a 3-4% difference in this particular case and we decided not to use it due to the added complexity of our code.

Most of our page render times on www.mixcloud.com are proportional to the number of cards we have on the page.

Sam

Aymeric Augustin

unread,
Mar 13, 2015, 4:40:50 PM3/13/15
to django-d...@googlegroups.com
2015-03-13 12:21 GMT+01:00 Sam Cooke <sdc...@gmail.com>:
The test template we were using to test the performance was a simple:
{% for item in item_list %}{% include "item.html" %}{% endfor %}

Bad luck -- including a template in a loop is one of the known pathological
performance cases of the DTL :-(

--
Aymeric.

Sam Cooke

unread,
Mar 13, 2015, 5:33:11 PM3/13/15
to django-d...@googlegroups.com
Even with the cached template loader? Before realising it was unnecessary one of the first things I did was hack something together that took the nodelist from the include and stuck it straight into the for loop's nodelist at compile time and it didn't make much difference.

Sam

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.

Oleksii Oleksenko

unread,
Mar 22, 2015, 4:33:08 PM3/22/15
to django-d...@googlegroups.com

Hi,

I've run this benchmark (https://github.com/prestontimmons/templatebench) on my laptop and received following results – http://pastebin.com/kkyTcfi7.

From what I see, internal caching mainly solves a problem with parsing. However, rendering is still an issue, so I tried to figure out what slows rendering down. In order to do that I've built built a call graph for rendering a template with many includes (source - http://pastebin.com/jjF1kME8). Resulting graph is attached. From this graph I see that significant part of rendering is devoted to building templates, not rendering them (calls to django.template.engine.Engine.get_template).

So here goes the question. Does it suppose to behave like that? And why it happens?


Thanks,

Oleksii

Oleksii Oleksenko

unread,
Mar 22, 2015, 4:35:19 PM3/22/15
to django-d...@googlegroups.com
Graph

воскресенье, 22 марта 2015 г., 21:33:08 UTC+1 пользователь Oleksii Oleksenko написал:
pycallgraph.png
Reply all
Reply to author
Forward
0 new messages