Strip Whitespace Middleware

153 views
Skip to first unread message

Doug Van Horn

unread,
Jul 28, 2006, 12:26:54 AM7/28/06
to Django developers
I don't know if anyone will find this useful, but I thought I'd throw
it out there.

I wrote a little Middleware class to strip trailing and leading
whitespace from a response:

import re
class StripWhitespaceMiddleware:
"""
Strips leading and trailing whitespace from response content.
"""

def __init__(self):
self.whitespace = re.compile('\s*\n+\s*')


def process_response(self, request, response):
new_content = self.whitespace.sub('\n', response.content)
response.content = new_content
return response


This is /nothing special/, I know. It might not even be worth it,
performance-wise. Just thought I'd throw it out there. Just my noob
way of trying to help out...

If you use it, just make sure it comes after GZipMiddleware in the
MIDDLEWARE_CLASSES tuple.

doug.

Adrian Holovaty

unread,
Jul 28, 2006, 11:41:01 AM7/28/06
to django-d...@googlegroups.com
On 7/27/06, Doug Van Horn <dougv...@gmail.com> wrote:
> I don't know if anyone will find this useful, but I thought I'd throw
> it out there.
>
> I wrote a little Middleware class to strip trailing and leading
> whitespace from a response:

Hey Doug,

Thanks for contributing this! If you could, post it to the collection
of user-contributed middleware here:

http://code.djangoproject.com/wiki/ContributedMiddleware

Adrian

--
Adrian Holovaty
holovaty.com | djangoproject.com

Will McCutchen

unread,
Jul 31, 2006, 11:19:25 AM7/31/06
to Django developers
Doug Van Horn wrote:
> I wrote a little Middleware class to strip trailing and leading
> whitespace from a response:

I may be misunderstanding something, but doesn't your middleware
actually leave whitespace at the beginning and end of the response?

If response.content = '\n\n\nHello.\n\n' it is converted to
'\nHello.\n'.

Couldn't you change your middleware to something simpler, like this, to
actually strip off all of the leading and trailing whitespace?


class StripWhitespaceMiddleware:
"""Strips leading and trailing whitespace from response content."""

def process_response(self, request, response):
response.content = response.content.strip()
return response


Just a thought. As I said, I may be misunderstanding something.


Will.

Doug Van Horn

unread,
Jul 31, 2006, 11:43:41 AM7/31/06
to Django developers
The intent is primarily to remove extra non-meaningful lines and
indents from the response. As an example:

<ol>
{% for foo in foo_list %}
<li>{{ foo }}</li>
{% endfor %}
</ol>

yields before middleware:

<ol>

<li>biz</li>

<li>baz</li>

<li>buz</li>

</ol>

and after middleware:

<ol>
<li>biz</li>
<li>baz</li>
<li>buz</li>
</ol>


It's nothing special, and if I've made and error with the RE please let
me know (I'm by no means an expert).

Regarding leading and trailing whitespace (that is, before and after
the response content), I think it /mostly/ works, truncating everything
down to 1 extra line on either side of the response. My guess would be
someone more clever than I am could come up with the appropriate RE to
chomp those two extra characters off, too.

Hopefully that makes the intent more clear.

Will McCutchen

unread,
Aug 1, 2006, 10:12:52 AM8/1/06
to Django developers
Doug Van Horn wrote:
> Hopefully that makes the intent more clear.

Yes, much more clear! For some reason, I thought you were only
concerned with whitespace at the beginning and end of the response.
Sorry for the trouble!


Will.

Gary Wilson

unread,
Aug 2, 2006, 1:53:49 AM8/2/06
to Django developers
I never really liked how the templating system leaves all those
newlines. This middleware is cool, but it would really be nice if the
templating system could collapse the lines that only contain one or
more evaluating-to-nothing template tags.

Thoughts on a possible implementation:

What if an endofline token were added. Lexer.tokenize() creates the
tokens and Parser.parse() creates the nodelist like normal. Each node
in the nodelist is rendered, except for endofline nodes. Pass through
nodelist again, removing whitespace-strings and empty-strings if those
whitespace-strings and empty-strings are all that exist between two
endofline nodes. The endofline nodes following removed
whitespace-strings or empty-strings are also removed, while all other
endifline nodes get rendered to a newline. Join and return rendered
nodelist like normal.

Would this work? Is there a better way?

Malcolm Tredinnick

unread,
Aug 2, 2006, 3:09:34 AM8/2/06
to django-d...@googlegroups.com

The big question with this sort of thing is always going to be speed.
Rendering templates is pretty fast at the moment, but it wants to be,
too. That being said, I haven't implemented or profiled your approach,
so I have no idea of its real impact, but you are introducing another
pass over the source text chunks (chunks == the results of an re.split()
call).

I've been experimenting with a somewhat funky reg-exp change inside the
template parser that would have the same effect as yours. I'm still
optimising it (I *knew* there was a reason that part of Friedl's book
existed) and profiling the results, but it looks possible. Essentially,
this would have the same effect you are after: a blank line that results
from just template directives is removed entirely. Any spaces or other
stuff on the line are left alone, though, so it's a very selective
reaper.

My motivation here was having to debug an email generation template
yesterday that was like a train wreck with all the template tags jammed
together to avoid spurious blank lines. It's going to be a few more days
before I can work on this seriously, I suspect (there are two more
urgent Django things I need to finish first, for a start), so you might
like to experiment along those lines too, if you're keen. I'm not sure I
like my solution a lot, either, since it makes things a little more
opaque in the code; still having debates with myself about that.

Regards,
Malcolm

Nebojša Đorđević

unread,
Aug 2, 2006, 7:32:50 AM8/2/06
to django-d...@googlegroups.com

On 28 Jul 2006, at 17:41, Adrian Holovaty wrote:

>
> On 7/27/06, Doug Van Horn <dougv...@gmail.com> wrote:
>> I don't know if anyone will find this useful, but I thought I'd throw
>> it out there.
>>
>> I wrote a little Middleware class to strip trailing and leading
>> whitespace from a response:
>
> Hey Doug,
>
> Thanks for contributing this! If you could, post it to the collection
> of user-contributed middleware here:
>
> http://code.djangoproject.com/wiki/ContributedMiddleware


Thanks, works great, but ... :)

Your process_response don't check for Content-Type so it will strip
whitespace characters everywhere (i.e from images).

Add at start of process_response:

if 'text/html' not in response.headers.get('Content-Type', '').lower
(): return response

--
Nebojša Đorđević - nesh
Studio Quattro - Niš - Serbia
http://studioquattro.biz/ | http://trac.studioquattro.biz/djangoutils/
Registered Linux User 282159 [http://counter.li.org]

What is the sound of one backpack EMP weapon discharging? -- Joe
Thompson
"Clickety-click" -- Charles Cazabon


PGP.sig

Steven Armstrong

unread,
Aug 2, 2006, 12:31:10 PM8/2/06
to django-d...@googlegroups.com
On 08/02/06 09:09, Malcolm Tredinnick wrote:
[...]

> My motivation here was having to debug an email generation template
> yesterday that was like a train wreck with all the template tags jammed
> together to avoid spurious blank lines. It's going to be a few more days
> before I can work on this seriously, I suspect (there are two more
> urgent Django things I need to finish first, for a start), so you might
> like to experiment along those lines too, if you're keen. I'm not sure I
> like my solution a lot, either, since it makes things a little more
> opaque in the code; still having debates with myself about that.
>

I've written a template tag [1] for just that, more control over
whitespace when generating config files and emails. Maybe that's useful
for someone.

[1] http://www.c-area.ch/code/django/templatetags/normalize.py


Ian Holsman

unread,
Aug 2, 2006, 5:38:19 PM8/2/06
to django-d...@googlegroups.com
Hi Doug.

you have to be careful when striping out whitespace, as sometimes it holds "value"
stuff like javascript and 'pre' tags shouldn't have whitespace stripped as you might change the look/behavior of the page.

regards
Ian

Jacob Kaplan-Moss

unread,
Aug 2, 2006, 6:05:32 PM8/2/06
to django-d...@googlegroups.com
Howdy guys --

Just out of curiosity, have you tried using the {% spaceless %} tag?
What's it missing that you've needed to use middleware for?

(see http://www.djangoproject.com/documentation/templates/#spaceless)

Jacob

gabor

unread,
Aug 2, 2006, 6:18:23 PM8/2/06
to django-d...@googlegroups.com

while talking about {% spaceless %}....

wouldn't make sense to also have something like {%
really_really_spaceless %}? which would, (surprise :-), make the html
spaceless? :-)


i was actually quite confused... i assumed (by the name of the tag),
that it would remove all the spaces, and seemed to me as a great
solution. but then i found out, that it still keeps one space :-(

if we could ignore backwards-compatibility, i would recommend to have:

{% spaceless %}...this would completely strip all the whitespace. means
this:
<ul>
<li>1</li>
<li>2</li>
</ul>
would become :<ul><li>1</li><li>2</li></ul>

{% normspaces %}...this would do what the current spaceless-tag does.

what do you think?

gabor

Reply all
Reply to author
Forward
0 new messages