Understanding autoescape-aware filters

73 views
Skip to first unread message

Ivan Sagalaev

unread,
Nov 17, 2007, 12:49:15 PM11/17/07
to django...@googlegroups.com
Hello!

I'm about to convert my apps to play well with recently introduced
autoescaping but I have to confess that I don't get mark_safe, is_safe
and needs_autoescaping.

First, I don't get why .is_safe attribute is needed at all. If my filter
returns any HTML I should escape it and mark_safe the result, no?

Then, looking at default filters I see that .is_safe is set to False for
all filters returning non-string values. Though these values are pretty
safe for HTML when they would be converted into strings in the end.

And 'needs_autoescape' escapes me absolutely... If I'm dealing with user
content and HTML why, again, can't I escape it inside my filter's code
and mark_safe it?

----

Anyway... Malcolm (as the main implementer), sorry, but the docs are
written in Linux how-to style: "make these magic passes and hope for the
best and don't try to understand the thing since you never will". Could
you please clarify why are those things needed and what exact effect
they are intended to cause?

For example. I'm writing a filter that gets a string and wraps it's
first letter in a <b>...</b>. I'm going to split the first letter,
conditional_escape the letter and the rest, wrap a letter in <b>...</b>,
concatenate and mark_safe. Now, should I stick .is_safe? Because yes, I
think it will return safe output given a safe string. What will break if
I didn't (my experiments so far show that nothing breaks). Should I also
ask for autoescape parameter and how am I supposed to use it?

Ok, this was a bit messy but I honestly thought it should be easier :-)

SmileyChris

unread,
Nov 17, 2007, 3:31:32 PM11/17/07
to Django users
On Nov 18, 6:49 am, Ivan Sagalaev <Man...@SoftwareManiacs.Org> wrote:
> the docs are
> written in Linux how-to style: "make these magic passes and hope for the
> best and don't try to understand the thing since you never will". Could
> you please clarify why are those things needed and what exact effect
> they are intended to cause?
It's explained here:
http://www.djangoproject.com/documentation/templates_python/#filters-and-auto-escaping

Probably could be clearer, it's still an area that makes my head spin
for a few minutes when I read it.

Ivan Sagalaev

unread,
Nov 17, 2007, 3:34:43 PM11/17/07
to django...@googlegroups.com

Yes, I've asked the group after I've read those docs, twice :-). First
time I thought that I was just slow but the second time I didn't
understand again and asked for help...

Karen Tracey

unread,
Nov 17, 2007, 6:59:40 PM11/17/07
to django...@googlegroups.com
On 11/17/07, Ivan Sagalaev <Man...@softwaremaniacs.org> wrote:

Hello!

I'm about to convert my apps to play well with recently introduced
autoescaping but I have to confess that I don't get mark_safe, is_safe
and needs_autoescaping.

I'm also just getting started on learning this, but feel like I've got a reasonably good understanding from the docs and a bit of looking at the code, so I'll take a stab at answering. 

First, I don't get why .is_safe attribute is needed at all. If my filter
  returns any HTML I should escape it and mark_safe the result, no?

From reading the doc, I got the impression that is_safe is for filters that don't mark_safe their output, but that also do not do anything to introduce anything "unsafe" in their output.  Therefore, if they are given a safe string on input, their output will be automatically marked safe.  Setting is_safe to True for a filter that always mark_safe's its output appears to be a no-op -- the framework will call mark_safe a 2nd time on something that has already been marked safe, which is harmless.  However, there are a few filters in defaultfilters.py that do in fact always return mark_safe'd output but also have is_safe set to True.  I don't understand what that accomplishes so perhaps I am missing something here.

Then, looking at default filters I see that .is_safe is set to False for
all filters returning non-string values. Though these values are pretty
safe for HTML when they would be converted into strings in the end.

But they are not returning strings, they are returning ints (or lists, or whatever).  If is_safe was set to True for these filters, then their output would automatically be marked safe whenever they were called with safe input, meaning whatever they were returning would be turned into a (safe) string, changing the type of their output.  is_safe=True is for filters that return strings, not numbers or whatever else.

And 'needs_autoescape' escapes me absolutely... If I'm dealing with user
content and HTML why, again, can't I escape it inside my filter's code
and mark_safe it?

You said "dealing with user content", so you have in your mind that input your filter is given must be escaped.  What if you were writing a filter that could operate on either user-generated (untrusted) input that does need to be escaped or trusted input that may contain HTML and should not be escaped?  That's what needs_autoescape is for.  It's for filters that are going to mark_safe their output but need to know whether or not their input should be escaped as they process it.  They're producing something that will be exempt from further escaping, so they need to know the current autoescape setting in order to determine whether their input should be escaped as it is incorporated into their output, because this is that last chance for getting it escaped.
 
[snip]


For example. I'm writing a filter that gets a string and wraps it's
first letter in a <b>...</b>. I'm going to split the first letter,
conditional_escape the letter and the rest, wrap a letter in <b>...</b>,
concatenate and mark_safe. Now, should I stick .is_safe? Because yes, I
think it will return safe output given a safe string. What will break if
I didn't (my experiments so far show that nothing breaks). Should I also
ask for autoescape parameter and how am I supposed to use it?

As I mentioned above, I don't believe it is necessary to set is_safe to True for a filter that mark_safe's it output, but I might be missing something there.

As for whether you need to ask for autoescape -- is there any use case for your filter where its input could contain HTML that should not be escaped?  If so, then you should ask for autoescape and only escape the input you are given if autoescape is on. 

Anyway, that's my take on it.  Malcolm can correct where I've gotten things wrong.

Cheers,
Karen

Malcolm Tredinnick

unread,
Nov 17, 2007, 7:12:31 PM11/17/07
to django...@googlegroups.com

On Sat, 2007-11-17 at 20:49 +0300, Ivan Sagalaev wrote:
> Hello!
>
> I'm about to convert my apps to play well with recently introduced
> autoescaping but I have to confess that I don't get mark_safe, is_safe
> and needs_autoescaping.
>
> First, I don't get why .is_safe attribute is needed at all. If my filter
> returns any HTML I should escape it and mark_safe the result, no?

The is_safe attribute is a large time-saver when you're writing filters.
Normally, you'll just want auto-escaping behaviour to be applied
automatically and when writing a filter that doesn't add raw HTML markup
you should be able to just write the code without having to worry about
escaping. The only difficulty is when you pass a safe string into the
filter. It's very easy to end up with a result that isn't a SafeData
instance after a few string manipulations, so this isn't a trivial
issue. For many filters, the actions they perform won't remove that
safe-ness in effect, but they won't be a SafeData isntance. So Django
notes that the input was a SafeData and the function is marked is_safe
and, thus, it calls mark_safe() on the result so that you don't have to
in your filter (all other input is automatically escaped at the right
moment, since it isn't safe from further escaping).

Thus, is_safe: if True, you are are guaranteeing a safe input string
will always generate an output string that can be marked as safe (and
Django will automatically do that for you). If not True, it is up to you
to either mark the output safe manually or have it auto-escaped when
auto-escaping is in effect.

> Then, looking at default filters I see that .is_safe is set to False for
> all filters returning non-string values. Though these values are pretty
> safe for HTML when they would be converted into strings in the end.

For filters returning non-strings, is_safe is a no-op, so I just picked
a value. The reason False is better than True is because you don't even
have to bother adding is_safe to those types of filters ("absent"
defaults to False). Adding it won't harm anything, though.

> And 'needs_autoescape' escapes me absolutely... If I'm dealing with user
> content and HTML why, again, can't I escape it inside my filter's code
> and mark_safe it?

Because you wouldn't be able to write a filter that worked correctly in
both auto-escaping and non-auto-escaping environments, which is a
compulsory requirement in most cases. You don't want to escape inside
the filter if the current context doesn't have auto-escaping in effect.
The needs_autoescape attribute tells Django that your function needs to
be passed a parameter called "autoescape" that is the value of the
current auto-escaping effect (True or False).

Yes, you can ignore needs_autoescape if you're going to restrict your
filters to only working in an auto-escaping environment, but that's
highly non-portable (and certainly not an option in Django's core, for
example). Anybody distributing an application, for example, that was
designed to work with other peoples' templates and didn't allow for
auto-escaping to be either True or False at render time would have a bug
in their code.

> Anyway... Malcolm (as the main implementer), sorry, but the docs are
> written in Linux how-to style: "make these magic passes and hope for the
> best and don't try to understand the thing since you never will".

A little bit hyperbolic, even for somebody who's frustrated. It is never
the intention to say "you'll never understand this" and the current
documentation does not even come close to saying that. When writing
feedback, as welcome as it is, try to have some respect for the insane
number of hours that have gone into developing this and the amount of
nonsense it's generated. Consider, also, that documentation written by
the person doing the design and implementation offer does sometimes miss
some of the easier things because they're too easy by the time it's at a
state where the documentation is written and that person is too close to
things. This is just part of the ironing out problems phase.

Given the types of things people complain about not understanding in the
documentation (we always get requests to add things that are effectively
"warning: Python will behave as it normally does and gravity has an
effect on this planet."), if we go into all the fine details of how
things work, the effect gets lost in the implementation. So there's a
limit. Apparently you feel I've fallen short here, but it's going to be
very difficult to find the middle ground.

I'll have one more pass at it and after that I look forwards to reading
your patch to improve things.

> For example. I'm writing a filter that gets a string and wraps it's
> first letter in a <b>...</b>. I'm going to split the first letter,
> conditional_escape the letter and the rest, wrap a letter in <b>...</b>,
> concatenate and mark_safe. Now, should I stick .is_safe?

If you're always returning a safe string, then adding is_safe is a
no-op. The is_safe attribute is only a necessary consideration when you
aren't marking for manual safeness.

> Should I also
> ask for autoescape parameter and how am I supposed to use it?

If your input can contain non-safe strings, you'll need to accept it. As
to how to use it: test the value!! It tells you if autoescaping is in
effect. From the documentation:

When the filter is called, the ``autoescape`` keyword argument
will be ``True`` if auto-escaping is in effect.

> Ok, this was a bit messy but I honestly thought it should be easier :-)

Remember that I've managed to achieve what you said was impossible in
your original list of 10 things you hated about Django: auto-escaping is
implemented in a way that means that almost all code is very close to
backwards-compatible and filters and templates will work in both types
of environments, so people don't have to write two sets of filters and
two sets of templates for fragments that might be included in either
environment. The effort is required by the developer rather than the
template writer.

Regards,
Malcolm

--
Why be difficult when, with a little bit of effort, you could be
impossible.
http://www.pointy-stick.com/blog/

Ivan Sagalaev

unread,
Nov 18, 2007, 4:04:14 AM11/18/07
to django...@googlegroups.com
Malcolm, first of all, I should apologies. I actually intended my letter
being 'funny' but after your answer I understand that it was just harsh
:-(. I'm sorry. And let me again express that I never stopped to wonder
how you manage to do so many great things in Django. Thank you very much!

Still I believe that my dumbness in understanding new filters is a good
use-case to work out since I understand other Django docs well. So here
goes...

Malcolm Tredinnick wrote:
> It's very easy to end up with a result that isn't a SafeData
> instance after a few string manipulations, so this isn't a trivial
> issue. For many filters, the actions they perform won't remove that
> safe-ness in effect, but they won't be a SafeData isntance.

Got it. If I do SafeString('test') + 'test' the result will be a str,
not SafeData.

> So Django
> notes that the input was a SafeData and the function is marked is_safe
> and, thus, it calls mark_safe() on the result so that you don't have to
> in your filter

This is my first misunderstanding. mark_safe seems trivial enough, why
not just use it instead of .is_safe=True on a filter?

Looking at this from template/__init__.py:

if getattr(func, 'is_safe', False) and isinstance(obj, SafeData):
obj = mark_safe(new_obj)

they're essentially equivalent modulo type checking. Why doesn't
mark_safe do type checking itself?

> Because you wouldn't be able to write a filter that worked correctly in
> both auto-escaping and non-auto-escaping environments, which is a
> compulsory requirement in most cases. You don't want to escape inside
> the filter if the current context doesn't have auto-escaping in effect.

Uhm.. This is the second thing I'm missing. I though that {% autoescape
off %} is a backward-compat measure. So .needs_autoescape exists only
for filters that used to do non-safe output and should behave as such in
a non-autoescaped environment. And I thought that in a new era all new
filters and tags actually should *always* return safe values. No?

> I'll have one more pass at it and after that I look forwards to reading
> your patch to improve things.

I will certainly try to do this.

>> For example. I'm writing a filter that gets a string and wraps it's
>> first letter in a <b>...</b>. I'm going to split the first letter,
>> conditional_escape the letter and the rest, wrap a letter in <b>...</b>,
>> concatenate and mark_safe. Now, should I stick .is_safe?
>
> If you're always returning a safe string, then adding is_safe is a
> no-op.

Yes, but *should* I always return a safe string? I believe in my case I
really should because I'm returning some HTML and nothing after my
filter could magically decipher it and escape parts of the string that I
didn't escape. Right?

If yes, does it mean that I should use .need_autoescape to know if my
input was already escaped manually (if autoescape is None) or I should
do it myself (is autoescape == True)?

> Remember that I've managed to achieve what you said was impossible in
> your original list of 10 things you hated about Django:

Uhm... Originally it was "'N things I don't like..." and I called it a
"hate-list" as a joke :-). And actual wording about autoescaping was
that "it's now impossible to fix the easy way so it has to be fixed the
hard way". As far as I understand this indeed was hard.

Thanks for your answer!

Malcolm Tredinnick

unread,
Nov 18, 2007, 5:48:52 AM11/18/07
to django...@googlegroups.com

On Sun, 2007-11-18 at 12:04 +0300, Ivan Sagalaev wrote:
> Malcolm, first of all, I should apologies. I actually intended my letter
> being 'funny' but after your answer I understand that it was just harsh
> :-(. I'm sorry.

Fair enough. I misunderstood your intent. No hard feelings. :-)

[...]


> > So Django
> > notes that the input was a SafeData and the function is marked is_safe
> > and, thus, it calls mark_safe() on the result so that you don't have to
> > in your filter
>
> This is my first misunderstanding. mark_safe seems trivial enough, why
> not just use it instead of .is_safe=True on a filter?
>
> Looking at this from template/__init__.py:
>
> if getattr(func, 'is_safe', False) and isinstance(obj, SafeData):
> obj = mark_safe(new_obj)
>
> they're essentially equivalent modulo type checking. Why doesn't
> mark_safe do type checking itself?

I'm not sure I understand what you're suggesting. Here's another
explanation of what's going on in those two lines:

- 'obj' is initially the data we pass to the filter. The thing
that is being filtered.

- 'new_obj' is what the filter returns.

- now if 'obj' (the original input) was safe *and* the filter
says that safe input will generate safe output (func.is_safe ==
True), we can mark the output as safe.

It's not really easy to collapse all this into mark_safe() because the
here mark_safe() is acting on the new result based on the state of the
original object. So you'd end up having to pass two things to
mark_safe() in this isolated case.

If we didn't have is_safe, every filter that did some kind of string
manipulation such as input = intput + 'x' would need to end with lines
like

if isinstance(orig_input, SafeData):
result = mark_safe(result)
return result

and they would have to remember to save the original input (or test its
type very early). So the 'is_safe' attribute is a way for filter authors
to say "I don't want to worry about marking this result safe if the
input is safe. I know I'm not introducing unsafe characters, so Django
can take care of that".

>
> > Because you wouldn't be able to write a filter that worked correctly in
> > both auto-escaping and non-auto-escaping environments, which is a
> > compulsory requirement in most cases. You don't want to escape inside
> > the filter if the current context doesn't have auto-escaping in effect.
>
> Uhm.. This is the second thing I'm missing. I though that {% autoescape
> off %} is a backward-compat measure. So .needs_autoescape exists only
> for filters that used to do non-safe output and should behave as such in
> a non-autoescaped environment. And I thought that in a new era all new
> filters and tags actually should *always* return safe values. No?

Not really. Okay, three cases to think about here. Firstly, there are
some people who have deep objections to auto-escaping for various
reasons. They want to be able to turn it off and never use it.
Apparently their code never contains any bugs and we're just slowing
them down. :-)

More seriously (case #2), the conversion from Django 0.96 to
accommodating auto-escaping is not entirely trivial. Jeremy Dunck has
estimated about a week's worth of time for him to port some of the stuff
he maintains (which I'm guessing includes the Ellington instance he
works with). For large projects, they might be running with
auto-escaping off for quite a while yet. This might not affect your
use-cases, but there are going to be some people writing applications
intended to be used by the general unknown public and, in those cases,
writing to be able to work in both situations will be good practice.

Finally, auto-escaping is only appropriate for HTML text. You don't want
it on in templates that generate email, or text documents, or even
Javascript fragments (you'd be amazed at how poorly "if (2 &lt; 3)"
works in Javascript). So there will be quite legitimate cases when you
want to wrap entire blocks of output in "{% autoescape off %}...{%
endautoescape %}" sections. However, some of your filters might still be
useful in those sorts of sections. Imagine, for example, a filter that
always replaced the word "and" by "&". It will need to behave
differently in different auto-escaping contexts (use "&amp;" in HTML
templates, and "&" in email). If you let Django handle the autoescaping
by doing nothing to your output, that's fine. It'll work. But if you
also need to add raw HTML, as in your examples, you need to know when to
escape things yourself. Hence "needs_autoescape".

> > I'll have one more pass at it and after that I look forwards to reading
> > your patch to improve things.
>
> I will certainly try to do this.

I've rewritten most of the filtering and auto-escaping section (in
[6692]). Have a read of it and see if it makes more sense from the point
of view of where you were 24 hours ago. I've tried to approach it from a
different direction, hopefully motivating things a bit more without
getting us bogged down in unimportant details.

> >> For example. I'm writing a filter that gets a string and wraps it's
> >> first letter in a <b>...</b>. I'm going to split the first letter,
> >> conditional_escape the letter and the rest, wrap a letter in <b>...</b>,
> >> concatenate and mark_safe. Now, should I stick .is_safe?
> >
> > If you're always returning a safe string, then adding is_safe is a
> > no-op.
>
> Yes, but *should* I always return a safe string? I believe in my case I
> really should because I'm returning some HTML and nothing after my
> filter could magically decipher it and escape parts of the string that I
> didn't escape. Right?

That's correct.

> If yes, does it mean that I should use .need_autoescape to know if my
> input was already escaped manually (if autoescape is None) or I should
> do it myself (is autoescape == True)?

Well if autoescape == False in such a method, you should do *no*
escaping of your output, since it's being used in, e.g., an email or
Javascript or something. As an aside, the reason I chose autoescape=None
as the default there was so that filters could be written that worked
with Django 0.96 (if autoescape is None, you are using a pre-autoescape
version of Django and can conditionally import mark_safe() and friends
only if autoescape == True). That was a subtle trick 15 months ago that
I possibly should have removed in the final version, but it does no real
harm.

If autoescape == True, you should escape all data that isn't already
marked as safe. There is a function
django.utils.html.conditional_escape() that makes this easier. It's like
escape() except it doesn't do anything on SafeData instances. I forgot
to document conditional_escape() earlier, but it's in the new version.

For an example of how all this pulls together, see either the new
example in the docs (which looks a lot like your example) or see, say,
the linebreaks filter in django.template.defaultfilters, which is a
perfect example of something that is introducing HTML into safe or
unsafe input data. Under no circumstances look at urlize for an example
of how to handle mixed content. It gives *me* nose bleeds
(unsurprisingly, it's the one I've screwed up the most so far).

Hopefully this clears up some of your questions. As I said, I've tried
again with the documentation. I'm going to leave it alone for a while
now and let the madding crowds file patches for a bit (and let Adrian
sharpen his blue pencil and go to work editing it).

Regards,
Malcolm

--
The sooner you fall behind, the more time you'll have to catch up.
http://www.pointy-stick.com/blog/

Ivan Sagalaev

unread,
Nov 21, 2007, 4:00:22 AM11/21/07
to django...@googlegroups.com
Thanks for clarification! I have couple more things to iron out though...

Malcolm Tredinnick wrote:
> If we didn't have is_safe, every filter that did some kind of string
> manipulation such as input = intput + 'x' would need to end with lines
> like
>
> if isinstance(orig_input, SafeData):
> result = mark_safe(result)
> return result
>
> and they would have to remember to save the original input (or test its
> type very early).

Got it now. So if I understand correctly

- mark_safe means that filter takes full responsibility for its output
- .is_safe means that filter doesn't want to know details of its input
and thus takes responsibility only for its own additions

> Finally, auto-escaping is only appropriate for HTML text.

Yes, I've completely forgot about emails etc... :-( This now makes sense
why one might want to not escape and mark_safe output.

> However, some of your filters might still be
> useful in those sorts of sections. Imagine, for example, a filter that
> always replaced the word "and" by "&". It will need to behave
> differently in different auto-escaping contexts (use "&amp;" in HTML
> templates, and "&" in email).

And this is a very clear example :-).

> I've rewritten most of the filtering and auto-escaping section (in
> [6692]). Have a read of it and see if it makes more sense from the point
> of view of where you were 24 hours ago. I've tried to approach it from a
> different direction, hopefully motivating things a bit more without
> getting us bogged down in unimportant details.

Yes, it really is better now! Thanks :-). There are a couple of small
points however:

> This attribute tells Django that is a “safe” string is passed into
your filter, the result will still be “safe”

I kinda think that emphasizing the safeness of input here is
distracting. I'd rather emphasize that ".is_safe" means that author
doesn't want to think of input very much and wants to let Django think
for him, and the details of how it will be done are not important. They
are still interesting though and might be noted afterwards. Something
like this (though it's a bit verbose):

This attribute tells Django that your filter works with
various input types and can only be sure that it doesn't
do any "unsafe" changes to it. Django will then decide if
the whole output needs to be escaped or not keeping track
of whether or not input was already safe.

Another thing is the example code of initial_letter_filter. I think it
can be written shorter and without lambda:

- if autoescape:
- esc = conditional_escape
- else:
- esc = lambda x: x
- result = '<strong>%s</strong>%s' % (esc(first), esc(other))
+ if autoescape:
+ first, other = conditional_escape(first), conditional_escape(other)
+ result = '<strong>%s</strong>%s' % (first, other)

Reply all
Reply to author
Forward
0 new messages