site optimization / closure compiler / pipeline

31 views
Skip to first unread message

Laurent Savaete

unread,
May 4, 2012, 12:40:45 PM5/4/12
to ductus-d...@googlegroups.com
I took a look at html5boilerplate (h5bp) and related optimisations.
Below is a summary of what I learnt, the not-so-tech intro is probably
worth reading for anyone interested :)

The problem is to reduce the amount of data sent to the browser so
that the page loads faster, and also make scripts run faster (so that
the "app" feels more responsive, and our users don't fall asleep
looking at an hourglass when they click on something)

Here's a few things we should be doing to improve page serving and
loading time to a "state of the art" status:

- minify javascript and CSS files: essentially remove all spaces,
comments, shorten variable names, and optionally run a few
optimizations so the files get to the user faster, and javascript
possibly executes faster
- concatenate JS and CSS files into a minimal number of files. Each
request to the webserver has overhead, so fewer requests mean less
overhead.
- possibly make use of a CDN. Some files like jQuery are used on many
many sites, so instead of serving it ourselves, we can let someone
else's server do it for us (think google for instance). That saves us
bandwidth so we can serve something else, and often visitors will
already have the file in question in their browser cache from some
other site, so they don't even need to wait for that one!
- optimise images. According to the guys behind h5bp, that's possibly
the biggest saver. Some programs can reduce image file sizes quite
drastically without humans actually seeing the difference.
- use versioning so that browsers can cache aggressively, and we don't
fear that users will see outdated versions of the site. Basically, it
relies on adding a version number to the filenames of (almost)
everything, so that we're sure what the user is getting. Call it
cachebusting if you wanna sound cool.

So that's for the theory. Now there are tons of ways to implement
this. </"not-so-tech" intro>

h5bp does all of that in its build script, unfortunately, it's only
aimed at "simple" sites. Because we use django, it's a bit useless to
us as such.
BUT lots of other people have had this problem before and developed
"asset managers" specifically for django.

They are essentially an application we include in our django app
(ductus), and then tell it:
- "this is a dev site": don't do anything
- "this is a prod site": do all the optimisation above.
Because they're made specifically for django, they're aware of all the
tricks required to work well with it, so I reckon we should use one of
them.
Now the next question is: which one?

Here's a list of packages
http://djangopackages.com/grids/g/asset-managers/ that we could use.

Of all those, 2 seem interesting in my opinion (ie: big enough,
maintained, documented and providing most of the features we'd want):

- django-pipeline:
http://django-pipeline.readthedocs.org/en/latest/index.html (very
active dev, but assets have to be described in settings.py, which
seems a bit odd)

- django_compressor:
http://django_compressor.readthedocs.org/en/latest/ (a bit less active
dev, but better doc and an interesting concept of including
optimisation settings directly in django templates. The user base
seems to be the biggest, though that's not very reliable numbers)

they both integrate closure compiler natively (which requires a jvm to
run, so we'd need one on the server) and YUI for CSS/JS. So I'll try
them both out, and see what comes out (since that seems to be the
consensus answer to "closure compiler or YUI?" around the web)

I'll go ahead and play a bit with django_compressor now.

Laurent Savaete

unread,
May 6, 2012, 10:08:26 PM5/6/12
to ductus-d...@googlegroups.com
What I've learnt so far while testing django_compressor.

- it's fairly easy to setup. Broadly speaking, a few lines in
settings.py, some {% compress %} tags in templates around the JS/CSS
you want compressed/combined, and the next request to a page will
automatically run the filters/compressors, cache the output, and do
all the work, so that users get efficient content served. Also, it
does not impact the way we develop, and we don't need to put any
minified files in the git repository.

An example of template tag, from choice.html:
{% compress js %}
<script type="text/javascript" src="{% static
"ductus/modules/flashcards/js/choice.js" %}"></script>
<script type="text/javascript" src="{% static
"ductus/common/js/jQuery.jPlayer.2.1.0/jquery.jplayer.min.js"
%}"></script>
<script type="text/javascript" src="{% static
"ductus/common/js/audio_player.js" %}"></script>
{% endcompress %}
will turn into:
<script type="text/javascript" src="/static/CACHE/js/504776bdaa18.js"></script>

The resulting filename is a hash based on the content of the
compressed files, so a new filename will be created when we deploy new
code.

- to actually make use of the compression system, we need to setup a
(django) caching mechanism. Just a simple "file on disk" system is
good enough to start with. Best practice is to setup nginx to serve
the cached compressed files with an "Expires max" header, since we
know that filenames will change whenever the content changes.

- the other option (django-pipeline) will essentially provide the same
function. The difference being that compressor defines which files to
compress/combine in the templates (ie: where we call them), whereas
pipeline does that job in settings.py, which seems harder to implement
as we'd have to sync puppet and ductus repositories. With compressor,
we just define how things would happen in the ductus code, and
settings.py just says whether we do it or not.

About closure compiler:

- it has 3 different levels of optimisation. One removes whitespace
only. The second "simple" (default) shortens local variable names in
functions. The advanced one is really what we want to use (see below).
(https://developers.google.com/closure/compiler/docs/compilation_levels
for details) It does not deal with CSS, we need YUI compressor for
that.

- YUI and closure compiler (CC in simple mode) are pretty equivalent
in terms of results on our code. (viewing a choice lesson, we get a
~10% discount on data transferred, media excluded). The real big
change is if we use CC's advanced mode, where we seem to hit ~30%
reduction, essentially because CC will look into jquery/ui and wipe
out any function that we do not use, BUT...

- our javascript code will require a bit of cleanup/optimisation to
allow for closure compiler's (CC) advanced optimisation mode.
Basically, tricks to tell the compiler which names it can crunch and
which ones are called in other files. As it is now, CC breaks the
code, which is anticipated, as explained in the docs. (full story at
https://developers.google.com/closure/compiler/docs/api-tutorial3)

- it's a java app, so we'd need java installed on the server. An
alternative is to rely on the google API which may be more efficient
than running the app ourselves. YUI compressor (for CSS) runs on java
too, although there are alternatives but they don't seem to be
maintained much.

Optmisation in general:

- google and yahoo give a lot of advice on best practices (good news
is: they seem to agree on things!) and firefox/chrome extensions to
help point out ways to improves pages (pagespeed or YSlow)

- from looking at things, jquery-ui is 30 to 50% of the weight of our
pages (excluding media). I'll look into getting rid of as much of it
as possible.

- the more doc I read, the more ideas for optimising I get, so I don't
think we need to go over the top with it. Deploying compressor,
setting up "cache forever" in nginx and getting rid of (obvious)
useless JS/CSS would be more than enough for a stage 1. When we start
feeling the limits of this, we can think of stage 2.

Enough for this round.

Jim Garrison

unread,
May 6, 2012, 10:42:55 PM5/6/12
to ductus-d...@googlegroups.com
On 05/06/12 19:08, Laurent Savaete wrote:
> What I've learnt so far while testing django_compressor.
>
> - it's fairly easy to setup. Broadly speaking, a few lines in
> settings.py, some {% compress %} tags in templates around the JS/CSS
> you want compressed/combined, and the next request to a page will
> automatically run the filters/compressors, cache the output, and do
> all the work, so that users get efficient content served. Also, it
> does not impact the way we develop, and we don't need to put any
> minified files in the git repository.
>
> An example of template tag, from choice.html:
> {% compress js %}
> <script type="text/javascript" src="{% static
> "ductus/modules/flashcards/js/choice.js" %}"></script>
> <script type="text/javascript" src="{% static
> "ductus/common/js/jQuery.jPlayer.2.1.0/jquery.jplayer.min.js"
> %}"></script>
> <script type="text/javascript" src="{% static
> "ductus/common/js/audio_player.js" %}"></script>
> {% endcompress %}
> will turn into:
> <script type="text/javascript" src="/static/CACHE/js/504776bdaa18.js"></script>
>
> The resulting filename is a hash based on the content of the
> compressed files, so a new filename will be created when we deploy new
> code.

Does this play nicely with the new CachedStaticFilesStorage in Django
1.4? They seem to offer similar functionality, but this one is more
aggressive being that it turns several javascript files into one.

https://docs.djangoproject.com/en/dev/ref/contrib/staticfiles/#cachedstaticfilesstorage

In particular, they aren't going to step on each others' toes, are they?
That would be my only concern.

Regarding java, I don't mind it being part of our build system,
especially if it works with openjdk. The benefits clearly outweigh the
costs.

[snip]

> - from looking at things, jquery-ui is 30 to 50% of the weight of our
> pages (excluding media). I'll look into getting rid of as much of it
> as possible.

Sad! And we've created a custom build of jquery-ui that only uses
certain features of it! (unless you changed something somewhere... :)

Also, re django-pipeline, can you explain the bit about it requiring us
to keep ductus and puppet in sync?

These options seem nice, but I always imagined that first we would
simply have a build script that compiles things during the deploy
process, instead of being done on the fly. But if doing it on the fly
is even easier, then that makes me very happy.

Jim Garrison

unread,
May 6, 2012, 11:53:50 PM5/6/12
to ductus-d...@googlegroups.com
On 05/04/12 09:40, Laurent Savaete wrote:
> - possibly make use of a CDN. Some files like jQuery are used on many
> many sites, so instead of serving it ourselves, we can let someone
> else's server do it for us (think google for instance). That saves us
> bandwidth so we can serve something else, and often visitors will
> already have the file in question in their browser cache from some
> other site, so they don't even need to wait for that one!

It shouldn't be too hard to find CDN-as-a-service. I'm not sure what
gandi offers, but both Rackspace CloudFiles and Amazon S3 allow you to
host your static assets on a CDN.

Speaking of which, if we're dynamically generating our compressed files
using django-compressor or django-pipeline, then it may be more
difficult to further optimize them by using a CDN if we choose. On the
other hand, if we are deploying things with a build script that results
in a bunch of static files, we can more easily (in the future) send
those static assets directly to a hosting service that allows CDN access.

I'm a bit skeptical of relying on Google since it would involve allowing
them to log which users come to our site any time a user doesn't already
have jQuery downloaded and cached. More important from our users'
perspective, there have been instances in the past where Google has
botched the CDN while attempting to upgrade jQuery, and then half the
internet was down for about an hour or so. See, e.g.,
<http://news.ycombinator.com/item?id=2398095>

> - optimise images. According to the guys behind h5bp, that's possibly
> the biggest saver. Some programs can reduce image file sizes quite
> drastically without humans actually seeing the difference.

Yes, in fact we should put optimized images directly into git. I've
been using optipng on everything there, with good results.

Laurent Savaete

unread,
May 7, 2012, 6:21:49 AM5/7/12
to ductus-d...@googlegroups.com
> Speaking of which, if we're dynamically generating our compressed files
> using django-compressor or django-pipeline, then it may be more
> difficult to further optimize them by using a CDN if we choose.  On the
> other hand, if we are deploying things with a build script that results
> in a bunch of static files, we can more easily (in the future) send
> those static assets directly to a hosting service that allows CDN access.

compressor (and pipeline though it's not as nicely documented)
provides integration with S3. So all we'd have to do would be to
declare that we use S3, put our credentials in the conf somewhere, and
that's it. Have a look at
http://django_compressor.readthedocs.org/en/latest/remote-storages/
for details. That doesn't seem to be a problem, but let me know if you
see anything I missed.

> I'm a bit skeptical of relying on Google since it would involve allowing
> them to log which users come to our site any time a user doesn't already
> have jQuery downloaded and cached.  More important from our users'
> perspective, there have been instances in the past where Google has
> botched the CDN while attempting to upgrade jQuery, and then half the
> internet was down for about an hour or so.  See, e.g.,
> <http://news.ycombinator.com/item?id=2398095>

I agree on the loging point, but I wouldn't expect us to do a better
job than google at keeping webservers up :) Also, looking at your
link, we most likely wouldn't be hit by the problem: to allow caching,
you need to point to specific versions of jquery, not "latest" (which
can't have "cache forever" headers).
But let's ditch google jquery anyway.

>> - optimise images. According to the guys behind h5bp, that's possibly
>> the biggest saver. Some programs can reduce image file sizes quite
>> drastically without humans actually seeing the difference.
>
> Yes, in fact we should put optimized images directly into git.  I've
> been using optipng on everything there, with good results.

I'll run it on all files we currently have in the repo as a one shot,
and we'll try to run it on every new stuff we insert then.

Laurent Savaete

unread,
May 7, 2012, 7:59:19 AM5/7/12
to ductus-d...@googlegroups.com
> Does this play nicely with the new CachedStaticFilesStorage in Django
> 1.4?  They seem to offer similar functionality, but this one is more
> aggressive being that it turns several javascript files into one.
>
> https://docs.djangoproject.com/en/dev/ref/contrib/staticfiles/#cachedstaticfilesstorage
>
> In particular, they aren't going to step on each others' toes, are they?
>  That would be my only concern.

It turns out that the guy who wrote compressor
(https://github.com/jezdez/) also wrote CachedStaticFilesStorage. Just
that gives me confidence that he's done a clean job.
More rationally, CachedStaticFilesStorage will act when we run
collectstatic, and add a copy of each file with a hash appended to the
name.
We can then run "manage.py compress" (it forces compression of files
when we deploy them, as opposed to waiting for a user to actually
request them through the web, so no user needs to wait for the
compilers to run) from fabric.
Compress will take the output of collectstatic, ie: look into the
templates that now contains lines like "{% static %} filename.hash.js"
and run the compression operation on those files, and spit out a
compressed file instead. So the intermediate files will be useless,
but harmless. A cleaner way to do things would be to tell post_process
(in cachedStaticFilesStorage) to ignore those files that we're going
to compress, but that might not be worth the effort.
So I don't think there's anything to fear there.

> Regarding java, I don't mind it being part of our build system,
> especially if it works with openjdk.  The benefits clearly outweigh the
> costs.

indeed. I'm running it on openjdk6 on my box, it seems happy with it :)

>> - from looking at things, jquery-ui is 30 to 50% of the weight of our
>> pages (excluding media). I'll look into getting rid of as much of it
>> as possible.
>
> Sad!  And we've created a custom build of jquery-ui that only uses
> certain features of it!  (unless you changed something somewhere... :)

the only change I made was to upgrade jquery-ui looking at what
options we actually need, and drop 3 more, thereby dropping size from
~130 to ~100kb. Our pages weigh between approx. 200 and 300kb before
media. Good news is, closure compiler in advanced optimisation mode
actually drops some useless bits of code, so that (jquery+jqueryui+our
js) weighs less than 100kb in the end.

> Also, re django-pipeline, can you explain the bit about it requiring us
> to keep ductus and puppet in sync?

pipeline does the same job as compressor, but instead of specifying
which files to compress/combine in the templates, you do that in
settings.py, where you'd say "combine choice.js and audio_player.js
into choice.min.js" for instance.
Now if we change files in the ductus code, say we add fancystuff.js or
rename choice.js into view.js (in the templates), we need to update
settings.py to reflect that.
Granted, we could do that in the ductus settings.py (instead of puppet
settings.py), but I still feel that would be 2 files to sync, as
opposed to just one file. Not a dramatic issue, but I find it more
error prone.

> These options seem nice, but I always imagined that first we would
> simply have a build script that compiles things during the deploy
> process, instead of being done on the fly.  But if doing it on the fly
> is even easier, then that makes me very happy.

As I mentioned above (and forgot to include in my first email), we can
add a line to fabfile.py to get the compression done as we deploy, to
avoid having a user waiting for the compiler to run. What I like about
this solution is that it's all packaged and made specifically for
django (by a guy who's committing to django master), so it's probably
done better than if we did it ourselves :)

Jim Garrison

unread,
May 7, 2012, 8:29:29 PM5/7/12
to ductus-d...@googlegroups.com
On 05/07/12 03:21, Laurent Savaete wrote:
>> Speaking of which, if we're dynamically generating our compressed files
>> using django-compressor or django-pipeline, then it may be more
>> difficult to further optimize them by using a CDN if we choose. On the
>> other hand, if we are deploying things with a build script that results
>> in a bunch of static files, we can more easily (in the future) send
>> those static assets directly to a hosting service that allows CDN access.
>
> compressor (and pipeline though it's not as nicely documented)
> provides integration with S3. So all we'd have to do would be to
> declare that we use S3, put our credentials in the conf somewhere, and
> that's it. Have a look at
> http://django_compressor.readthedocs.org/en/latest/remote-storages/
> for details. That doesn't seem to be a problem, but let me know if you
> see anything I missed.

Great to hear. That all looks good to me.

>> I'm a bit skeptical of relying on Google since it would involve allowing
>> them to log which users come to our site any time a user doesn't already
>> have jQuery downloaded and cached. More important from our users'
>> perspective, there have been instances in the past where Google has
>> botched the CDN while attempting to upgrade jQuery, and then half the
>> internet was down for about an hour or so. See, e.g.,
>> <http://news.ycombinator.com/item?id=2398095>
>
> I agree on the loging point, but I wouldn't expect us to do a better
> job than google at keeping webservers up :) Also, looking at your
> link, we most likely wouldn't be hit by the problem: to allow caching,
> you need to point to specific versions of jquery, not "latest" (which
> can't have "cache forever" headers).
> But let's ditch google jquery anyway.

Yeah, I agree that google's CDN breaking is unlikely to be a problem.
The real reason I wouldn't want to use google is because of the logging
potential.

>>> - optimise images. According to the guys behind h5bp, that's possibly
>>> the biggest saver. Some programs can reduce image file sizes quite
>>> drastically without humans actually seeing the difference.
>>
>> Yes, in fact we should put optimized images directly into git. I've
>> been using optipng on everything there, with good results.
>
> I'll run it on all files we currently have in the repo as a one shot,
> and we'll try to run it on every new stuff we insert then.

Great; it will be interesting to see if I missed anything in the past.

Thanks for looking into all this!

Jim Garrison

unread,
May 7, 2012, 8:40:00 PM5/7/12
to ductus-d...@googlegroups.com
Everything sounds great. Let's go ahead and move forward with the next
logical steps for integrating django-compressor.

Laurent Savaete

unread,
May 7, 2012, 8:44:37 PM5/7/12
to ductus-d...@googlegroups.com
> Everything sounds great.  Let's go ahead and move forward with the next
> logical steps for integrating django-compressor.

haha, the next logical step is for me to figure out why the compiler
gets stuck on the following code:
function() { return true; }

I've solved pretty much everything for compiling in advanced mode.
Hopefully, I'll have a new branch with all the code on gitorious
before I go to bed, so we can test that out on devbox.

Laurent Savaete

unread,
May 8, 2012, 9:48:21 PM5/8/12
to ductus-d...@googlegroups.com
the combined/compressed javascript setup is now up on
laurent.dev.wikiotics.net as described in previous emails to this
thread.

You shouldn't notice any difference beyond the fact that pages should
load faster the first time you visit the site, and much faster for
subsequent visits/page views, since caching is now set to expire after
30 days (only for javascript and css, other files are not affected
yet).

There's a cheatmode available to load pages the old way (with
compression/caching disabled) if you want to compare. just append
?whatever=1 to any url.

If you notice any bug, please test it with the ?whatever extension to
know if it's related to compression, and let me know.

cheers,
L.
Reply all
Reply to author
Forward
0 new messages