Auto-escaping patch

18 views
Skip to first unread message

Malcolm Tredinnick

unread,
Jul 16, 2006, 7:04:22 AM7/16/06
to django-d...@googlegroups.com
I have put an initial version of the auto-escaping patch I mentioned
yesterday into ticket #2359. I'll briefly describe what it does below.
The patch includes changes to the core and a test suite for the
auto-escaping changes (which is about half the patch).

My reason for posting this first pass is that there are a few issues
that have come up that I would like to get some consensus on now, rather
than after I have ported the rest of the core over to use this stuff. If
I need to change some things, this is the easiest time to do it. There
is no documentation patch in this pass because, again, I don't really
feel like having to re-edit practically the whole doc change if we
decide to rename some things. This email acts as a documentation proxy
for the moment.

The whole implementation is very close to Simon Willison's original
proposal [1]. With one exception, I have only modified it where there
were technical requirements to do so. I'm going to assume familiarity
with that throughout. His proposal really is very good and it seems to
meet all the sensible requirements brought up in the three threads
listed at the bottom of that page (the two older threads are probably
more informative than the most recent one, for those wanting to get up
to speed here).

[1] http://code.djangoproject.com/wiki/AutoEscaping

A summary of the points I'd like opinions on is at the end of the email.

What does this add?
-------------------
(1) An "autoescape" template tag that turns automatic escaping on or off
throughout its scope.

(2) A "noescape" filter that marks its result as safe for use without
further escaping (see the description of "safe strings" below).

(3) SafeContext and SafeRequestContext classes that act like Context and
RequestContext, except that they automatically enable auto-escaping in
the templates they are applied to (you can also set context.autoescape
on any Context-derived instance, just as in Simon's proposal).

(4) A "mark_safe()" method to mark strings as not requiring further
escaping.

How does it work?
-----------------
When a variable is evaluated in a context in a template, it is
considered to be either "safe" or not (Simon used the term "escaped",
but that seemed less universally true than "safe"). By default, strings
are not marked as safe.

When automatic escaping is enabled, either because {% autoescape on %}
is in effect, or because a SafeContext class is being used, all strings
that are not marked as safe are escaped at rendering time (here,
"escaped" means "conservative HTML escaping": &, <, >, " and ' are
converted to entities always).

Any string marked as safe (or passed through the "noescape" filter) is
not automatically escaped.

When automatic escaping is disabled in the template, all variable
results are output without further escaping, unless the "escape" filter
is applied to them (this is the same behaviour as currently in Django).

Because some filters are designed to return raw markup, the mark_safe()
function exists so that the returned strings can be designated as safe.
For example, {{ var|markdown }} returns raw HTML and the result is not
subject to any further auto-escaping.

Some filters (e.g. unordered_list) wrap HTML around raw content. If
auto-escaping is enabled, these filters will escape the content before
wrapping it in the HTML tags (the returned result is a safe string in
all cases). So such filters are auto-escaping-aware.

Filters that accept text strings as input and return text strings are
marked (with the "is_safe" attribute) as to whether or not they return a
safe string whenever they are passed a safe string as input. This has
the effect of preserving "safeness" for those filters. Note that this
attribute is a *guarantee* of preserving safeness, so a filter like
"cut" has is_safe = False: {{ var|cut:"&" }} could turn an escaped
string into a monster. If a safe string is passed into a filter that is
not marked as safe and auto-escaping is enabled, the resulting string
will be escaped. If the is_safe attribute is not attached to a function,
it is assumed to be not safe.

Because of the is_safe attribute, it will be possible to change the
automatically generated documentation in the admin interface to annotate
each filter with it's "safeness" guarantee.

The "noescape" filter acts as a way to annotate the result of a filter
chain as safe. So, although "cut" is not a safe filter, we know that
cut:"x" is safe (it can't harm our HTML-escaped strings) and thus
{{ var|cut:"x"|noescape }} will prevent further escaping of the result,
even in auto-escaping-enabled situations. The "noescape" filter does
nothing except mark the result as safe -- the string output is identical
to the input.

Is it backwards compatible?
---------------------------
Mostly.

Auto-escaping is not turned on by default (Adrian made a statement in
[2] and I'm going with that preference at the moment). If you pass a
normal Context or Context-derived class into a template and do not use
the autoescape tag, it will be very close to what happens today.

[2]
http://groups.google.com/group/django-developers/msg/5a57f37667e1e941?


Four filters had their behaviour slightly changed.

Three of these are: linebreaks, linebreaksbr and linenumbers. All three
now respect the current auto-escaping setting on their input content
(before applying breaks or numbering). Previously, linenumbers would
always escape and linebreaks and linebreaksbr would never escape. So,
the main change for somebody not enabling auto-escaping here is that the
linenumbers filter will not escape the output any longer.

The fourth filter is "escape". To make forward porting easier and so
that template designers do not have to feel restricted in their use of
the escape filter, I implemented it so that applying "escape" to a safe
string has no effect. Since "escape" itself makes the result safe,
applying escape multiple times in a chain has the same effect as
applying it exactly once.

Previously (var = "&"): {{ var|escape|escape }} => &amp;amp;
Now: {{ var|escape|escape }} => &amp;

This particular case of chaining "escape" is obviously not common (I
would hope).

All of the default filters (template/defaultfilters.py) and the markup
filters have been ported in this patch. I have not done anything else
under contrib/ or the i18n filters.

Points to note
--------------
(1) Because "is_safe" has to be valid for all arguments, the "pluralize"
filter is not safe at the moment. The bizarre {{ var|puralize:"&" }} is
an example of the problem case. I'm thinking of fixing this so that we
check for unsafe characters in the argument(s) and then return safe
strings on safe input and no unsafe args.

(2) Because of the way "noescape" works, it was not really possible to
make {% filter noescape %} work in an auto-escaping block (the contents
were escaped long before the filter tag was applied). Fortunately, this
particular filter tag construct is equivalent to {% autoescape off %},
so there's no functionality loss. A TemplateSyntaxError is raised if the
illegal construct is used.

(3) Filters that take non-string arguments (e.g. "join") or return
something other than a string (e.g. "length") have is_safe = False. This
is convention more than requirement, but it makes things explicit.

Performance impact
-------------------
Obviously we are doing a little bit more work here, even in the
non-auto-escaping paths (just testing what the auto-escape setting is,
for example). As far as I can work out, the performance impact is very
minor, but I do not have a really good performance test suite for this
at the moment.

One simple test: running the tests/othertests/template.py file 100 times
takes 21.0436 seconds before these changes and 21.1674 seconds
afterwards -- averaged over five runs on my desktop machine. This tests
the "no escaping" path. That's a slowdown 0.6% (and that was almost
within the "noise" of the various runs). These tests aren't particularly
comprehensive, but they do test the templating code a reasonable amount
(although not the filters very much at all).

What are the issues at the moment?
----------------------------------
Now we get to the things I want to sort out before going much further.

(1) Any violent (or even just passionate) objections to using terms like
"safe" and mark_safe()?
- Should we use Simon's original proposal of escaped and
mark_escaped()? I feel "safe" is a bit more consistent with the
behaviour (an opposite-but-similar term to Perl's "tainted").

(2) Is the new behaviour of "escape" reasonable (i.e. it does nothing on
safe strings)?
- The only drawback of this is that there is no way to give an
escaped version of a safe string in the templates. That is,
there is no opposite to the "noescape" filter.

- If we make "escape" apply to safe strings as well, then views
must be very consistent about variables always having the same
"safeness" state. Otherwise, the template would have to escape
sometimes and not escape other times and it has no way of
knowing when. The current implementation lets you whack an
escape filter on there and it will work always.

- Current behaviour also makes forward porting easier (you don't
have to run around removing all the escape filters in your code
immediately).

(3) Auto-escaping inherits down through template inclusions. That is, if
you extend a template that has auto-escaping enabled, you get
auto-escaping enabled (obviously the autoescape template tag can control
this). Anybody have a strong reason not to do this?
- Personally, I think this is a no-brainer, but I've been wrong
plenty of times before.

(4) Should generic views use SafeContext by default?
- I haven't touched this yet, but it's not an insane idea. I
guess most people will divide along the same lines as those
wanting auto-escaping on or off by default. The waverers will be
those favouring consistency over all. I'm not a big enough
generic views user to really have a vote in this one.

(5) Adrian, Jacob: do you guys still want "off by default"?
- I *really* don't care what the answer is here, but I would
rather not have to change things after porting everything under
contrib/ .

- For people thinking it's auto-escaping or nothing, {%
autoescape on %} at the beginning of a template (and {%
endautoescape %}) at the end is not a huge imposition.

Feedback obviously welcome and appreciated.

Regards,
Malcolm

Tom Tobin

unread,
Jul 16, 2006, 12:10:41 PM7/16/06
to django-d...@googlegroups.com
On 7/16/06, Malcolm Tredinnick <mal...@pointy-stick.com> wrote:
>
> I have put an initial version of the auto-escaping patch I mentioned
> yesterday into ticket #2359. I'll briefly describe what it does below.
> The patch includes changes to the core and a test suite for the
> auto-escaping changes (which is about half the patch).

Reading through the changes, it looks pretty good (and a +0 from me)
-- with my quibbles noted below. :-)


> What are the issues at the moment?
> ----------------------------------
> Now we get to the things I want to sort out before going much further.
>
> (1) Any violent (or even just passionate) objections to using terms like
> "safe" and mark_safe()?
> - Should we use Simon's original proposal of escaped and
> mark_escaped()? I feel "safe" is a bit more consistent with the
> behaviour (an opposite-but-similar term to Perl's "tainted").

Here's a double-barreled passionate objection: 1) Templates aren't
always going to be used for outputting HTML; the term "safe" loses
meaning in non-HTML contexts. 2) "Safe" conveys a moral tone, IMHO;
"escaped" comes across as much more neutral.


> (4) Should generic views use SafeContext by default?
> - I haven't touched this yet, but it's not an insane idea. I
> guess most people will divide along the same lines as those
> wanting auto-escaping on or off by default. The waverers will be
> those favouring consistency over all. I'm not a big enough
> generic views user to really have a vote in this one.

Definitely not; this goes back to "don't screw with my data unless
instructed to".


> (5) Adrian, Jacob: do you guys still want "off by default"?
> - I *really* don't care what the answer is here, but I would
> rather not have to change things after porting everything under
> contrib/ .
>
> - For people thinking it's auto-escaping or nothing, {%
> autoescape on %} at the beginning of a template (and {%
> endautoescape %}) at the end is not a huge imposition.

Changing to "on by default", as I've stated in the past, turns this
whole proposal into a *vehement* -1 from me. I'll let Adrian/Jacob
speak for themselves, but I think this has already been discussed
extensively and shot down.

Michael Radziej

unread,
Jul 16, 2006, 3:30:17 PM7/16/06
to django-d...@googlegroups.com
Hi,

I really appreciate your work, it goes all along my wishes--thanks a
*lot*, Malcolm!

I'll try to find some time in the next few days to test how my
existing stuff would look using autoescape.

I have looked in your patch only cursory, so my comments refer to the
general approach and not the actual code bits. I will try to look
into the details later.

> (3) SafeContext and SafeRequestContext classes that act like
> Context and
> RequestContext, except that they automatically enable auto-escaping in
> the templates they are applied to (you can also set context.autoescape
> on any Context-derived instance, just as in Simon's proposal).

My feeling about this is -1; I think there should be only one place
to switch autoescape on or off. If it's in the template, that's
enough. In your patch, you only need it for the 500 error page and in
the tests, both could be easily done with the template tag, can't it?
If I want global autoescape, I can switch it on in my site_base
template (from which I derive all my templates). I also fear it would
complicate the documentation.

> (1) Any violent (or even just passionate) objections to using terms
> like
> "safe" and mark_safe()?
> - Should we use Simon's original proposal of escaped and
> mark_escaped()? I feel "safe" is a bit more consistent with
> the
> behaviour (an opposite-but-similar term to Perl's "tainted").

I'm more for 'escaped' and 'raw', but not really violently. This is a
minor issue, and I wouldn't like to get the work delayed by it.
Also ... I volunteer to rewrite the docs if these terms change. But
only once ;-)

>
> (2) Is the new behaviour of "escape" reasonable (i.e. it does
> nothing on
> safe strings)?

I think so. Is there a case for escaping two times? I don't see any,
and one could still easily craft a custom filter that does escape two
times.

>
> (3) Auto-escaping inherits down through template inclusions. That
> is, if
> you extend a template that has auto-escaping enabled, you get
> auto-escaping enabled (obviously the autoescape template tag can
> control
> this). Anybody have a strong reason not to do this?
> - Personally, I think this is a no-brainer, but I've been
> wrong
> plenty of times before.

I like it since it makes it possible to switch on global escape in my
site_base template. (+0)

> (4) Should generic views use SafeContext by default?
> - I haven't touched this yet, but it's not an insane idea. I
> guess most people will divide along the same lines as those
> wanting auto-escaping on or off by default. The waverers
> will be
> those favouring consistency over all. I'm not a big enough
> generic views user to really have a vote in this one.

I'm against SafeContext in any case ... and even if SafeContext
should make it, I'm strongly against using it in the generic views.

Regards,

Michael

Martina Oefelein

unread,
Jul 16, 2006, 5:53:21 PM7/16/06
to django-d...@googlegroups.com
Hi Malcolm,

> (3) Auto-escaping inherits down through template inclusions. That
> is, if
> you extend a template that has auto-escaping enabled, you get
> auto-escaping enabled (obviously the autoescape template tag can
> control
> this). Anybody have a strong reason not to do this?
> - Personally, I think this is a no-brainer, but I've been
> wrong
> plenty of times before.
>

No idea whether my opinion counts, but I'm -1 on this.
Inheriting the autoescape setting would be inconsistent with custom
tag and filter libraries, which aren't inherited:

http://www.djangoproject.com/documentation/templates/#custom-
libraries-and-template-inheritance

Also, if autoescape is inherited, changing the setting in a base
template could "break" templates which inherit from it. Thus I would
prefer if every template states explicitly whether autoescaping
should be on or off.

ciao
Martina

Todd O'Bryan

unread,
Jul 16, 2006, 6:07:58 PM7/16/06
to django-d...@googlegroups.com
On Jul 16, 2006, at 5:53 PM, Martina Oefelein wrote:

>
>> (3) Auto-escaping inherits down through template inclusions. That
>> is, if
>> you extend a template that has auto-escaping enabled, you get
>> auto-escaping enabled (obviously the autoescape template tag can
>> control
>> this). Anybody have a strong reason not to do this?
>> - Personally, I think this is a no-brainer, but I've been
>> wrong
>> plenty of times before.
>>

Given that, as Simon mentioned, Django is the odd framework out in
not having auto-escaping by default, having to set it in every single
template one creates would be a HUGE pain in the butt. Since there
will be a 'raw' filter available to forego auto-escaping, template
writers can override this for individual tags as appropriate.

In addition, you have to say which template you extend, so you should
know whether that base template includes auto-escaping or not. Yes, a
change to the base template could screw up templates that extend it,
but I don't think that problem is unique to auto-escaping and is yet
another reason to have some functional tests for your site.

So, I'm +1. But I'm +1 for having it on by default, too, and that
doesn't seem to be a popular position. :-)

Todd

Michael Radziej

unread,
Jul 16, 2006, 6:46:52 PM7/16/06
to django-d...@googlegroups.com
Hi,

Am 16.07.2006 um 23:53 schrieb Martina Oefelein:

> Hi Malcolm,
>
>> (3) Auto-escaping inherits down through template inclusions. That
>> is, if
>> you extend a template that has auto-escaping enabled, you get
>> auto-escaping enabled (obviously the autoescape template tag can
>> control
>> this). Anybody have a strong reason not to do this?
>> - Personally, I think this is a no-brainer, but I've been
>> wrong
>> plenty of times before.
>>
>
> No idea whether my opinion counts, but I'm -1 on this.
> Inheriting the autoescape setting would be inconsistent with custom
> tag and filter libraries, which aren't inherited:
>
> http://www.djangoproject.com/documentation/templates/#custom-
> libraries-and-template-inheritance

Sure, but where's the connection between {% load %} and {% autoescape
%}?

I think {% autoescape %} is in the idea quite similar to e.g. {%
spaceless %} which also affects inherited blocks.

BTW, I don't think it's been a deliberate decision to make {% load %}
work like it does; it is implemented in the compilation function
(since it works during template compilation) and so it is natural
that it doesn't work across inheritance. On the other side,
everything that is implemented in the render() method of a template
node quite naturally works across inheritance, and this behaviour is
shared by all the tags, with the single exception of {% load %}.
(Well, extends/block are of course special in a different way ...)

> Also, if autoescape is inherited, changing the setting in a base
> template could "break" templates which inherit from it. Thus I would
> prefer if every template states explicitly whether autoescaping
> should be on or off.

A inherited template is by design very dependent on the parent
template. I don't see this a problem. As noted above, if the parent
template uses {% spaceless %} around the block, this will change how
the child template is rendered. I haven't seen any ticket or
complaint about this behaviour. Another example is {% filter ... %}.

Michael

Malcolm Tredinnick

unread,
Jul 16, 2006, 10:56:40 PM7/16/06
to django-d...@googlegroups.com
On Sun, 2006-07-16 at 21:30 +0200, Michael Radziej wrote:
> Hi,
>
> I really appreciate your work, it goes all along my wishes--thanks a
> *lot*, Malcolm!
>
> I'll try to find some time in the next few days to test how my
> existing stuff would look using autoescape.
>
> I have looked in your patch only cursory, so my comments refer to the
> general approach and not the actual code bits. I will try to look
> into the details later.
>
> > (3) SafeContext and SafeRequestContext classes that act like
> > Context and
> > RequestContext, except that they automatically enable auto-escaping in
> > the templates they are applied to (you can also set context.autoescape
> > on any Context-derived instance, just as in Simon's proposal).
>
> My feeling about this is -1; I think there should be only one place
> to switch autoescape on or off.

That's the reasonable counter-argument to the current approach. I was
wondering how big an issue that would be. (FWIW, I probably agree with
you: having this done in one place if not on by default is more
consistent.)

[...]


> > (1) Any violent (or even just passionate) objections to using terms
> > like
> > "safe" and mark_safe()?
> > - Should we use Simon's original proposal of escaped and
> > mark_escaped()? I feel "safe" is a bit more consistent with
> > the
> > behaviour (an opposite-but-similar term to Perl's "tainted").
>
> I'm more for 'escaped' and 'raw', but not really violently. This is a
> minor issue, and I wouldn't like to get the work delayed by it.
> Also ... I volunteer to rewrite the docs if these terms change. But
> only once ;-)

"Escaped" strikes me as bogus because it's not really the case: we are
just saying this output can be dumped in without further escaping. I
thought about "raw" on Saturday and wondered if it would lead to
confusion: is a raw string "untreated" or "should not be treated
further" (we intend the latter). Whatever we do, we want to designation
to apply to strings we have marked as not requiring more escaping:
default strings should be "unsafe" or "not raw" by default for
auto-escaping purposes.

I don't really have a strong opinion here, so all the suggestions so far
work. But I'm also not entirely illogical, so it would be nice if I
could use the word in a sentence without sounding like I'm redefining
the normal English meaning.

> > (2) Is the new behaviour of "escape" reasonable (i.e. it does
> > nothing on
> > safe strings)?
>
> I think so. Is there a case for escaping two times? I don't see any,
> and one could still easily craft a custom filter that does escape two
> times.

Damn. Your phrasing tipped me off to a case we need this more: RSS feeds
and Atom content elements with type="html". :-(

We might need a "mark as unsafe" filter for these cases (so that {{ var|
escape|unsafe|escape }}) works (or just make "escape" not mark the
string as safe, but I suspect that will have unintended annoying
side-effects).

This comment highlights a hole that needs to be fixed. Excellent.

Malcolm

James Bennett

unread,
Jul 17, 2006, 12:03:31 AM7/17/06
to django-d...@googlegroups.com
On 7/16/06, Malcolm Tredinnick <mal...@pointy-stick.com> wrote:
> What does this add?
> -------------------
> (1) An "autoescape" template tag that turns automatic escaping on or off
> throughout its scope.

OK.

> (2) A "noescape" filter that marks its result as safe for use without
> further escaping (see the description of "safe strings" below).

I don't like that we'd have to have this, but if we're going to offer
an autoescape mechanism there needs to be a way to get out of it.

> (3) SafeContext and SafeRequestContext classes that act like Context and
> RequestContext, except that they automatically enable auto-escaping in
> the templates they are applied to (you can also set context.autoescape
> on any Context-derived instance, just as in Simon's proposal).

Huge -1 from me.

Turning escaping on or off should be something that only ever happens
in one place, because the more places we have which can fiddle with
it, the more bugs we'll have from people who forgot that it could be
toggled somewhere else.

And since escaping is a transformation of the output, that means the
logical place to do it is in the template. Also, putting an escape
toggle into views would have serious effects on application
portability, since you'd have to start building up useless blocks of
template logic to try to figure out whether the view turned escaping
on, and enable/disable it according to what you want.

> (4) A "mark_safe()" method to mark strings as not requiring further
> escaping.

OK.

> Auto-escaping is not turned on by default (Adrian made a statement in
> [2] and I'm going with that preference at the moment). If you pass a
> normal Context or Context-derived class into a template and do not use
> the autoescape tag, it will be very close to what happens today.

This is ideal.

> The fourth filter is "escape". To make forward porting easier and so
> that template designers do not have to feel restricted in their use of
> the escape filter, I implemented it so that applying "escape" to a safe
> string has no effect. Since "escape" itself makes the result safe,
> applying escape multiple times in a chain has the same effect as
> applying it exactly once.

Very nice. Mutli-escaped strings are just a nightmare waiting to happen.

> (1) Any violent (or even just passionate) objections to using terms like
> "safe" and mark_safe()?

As Tom pointed out, "safe" makes very little sense outside the context
of SGML/XML-based output, so I'd be wary of using the term. Saying
"escaped" or "unescaped" also feels much more consistent with what
we're actually doing -- the content may still be unsafe in other ways,
so we're really just saying that characters which delimit tags in
SGML/XML have been escaped to entities or NCRs.

> (2) Is the new behaviour of "escape" reasonable (i.e. it does nothing on
> safe strings)?

I don't think it's perfect, because of the inability to escape "safe"
strings, but I think it's the most sane thing we can do; we should
never send people into "did I already escape that or not" territory,
and making the 'escape' filter idempotent is the best way to avoid
that.

> (3) Auto-escaping inherits down through template inclusions. That is, if
> you extend a template that has auto-escaping enabled, you get
> auto-escaping enabled (obviously the autoescape template tag can control
> this). Anybody have a strong reason not to do this?

The argument in favor is that this isn't like tag/filter library
inheritance, where we don't inherit loaded libraries into an included
template; rather, this is something which affects all the output
between the time it turns on and the time it turns off, regardless of
where that output comes from.

The argument against is that it increases complexity quite a bit by
forcing you to remember, in templates which can be included elsewhere,
to check whether escaping is on or off and re-set it accordingly
inside your escaped template.

Given that, I think "explicit is better than implicit" wins, and
escaping shouldn't inherit; an included template should be responsible
for explicitly turning its own escaping on or off.

> (4) Should generic views use SafeContext by default?

No, because I'm violently opposed to SafeContext even existing ;)

> (5) Adrian, Jacob: do you guys still want "off by default"?

I would personally prefer having it off by default, both because I
don't like the idea of my data being changed when I didn't ask for it
to be changed and because it would break every Django template ever
written.

But as a matter of policy I can see an argument for doing everything
we can to keep people from shooting themselves in the foot; having to
explicitly make a template be "unsafe" is better than havign to
explicitly make a template be "safe".

--
"May the forces of evil become confused on the way to your house."
-- George Carlin

Michael Radziej

unread,
Jul 17, 2006, 6:00:33 AM7/17/06
to django-d...@googlegroups.com
Malcolm Tredinnick wrote:
> On Sun, 2006-07-16 at 21:30 +0200, Michael Radziej wrote:
>> I'm more for 'escaped' and 'raw', but not really violently. This is a
>> minor issue, and I wouldn't like to get the work delayed by it.
>> Also ... I volunteer to rewrite the docs if these terms change. But
>> only once ;-)
>
> "Escaped" strikes me as bogus because it's not really the case: we are
> just saying this output can be dumped in without further escaping.

I see your point, you're right. But 'safe' still isn't necessary
safe, I can perfectly mark unsafe strings as safe ;-)

> I thought about "raw" on Saturday and wondered if it would lead to
> confusion: is a raw string "untreated" or "should not be treated
> further" (we intend the latter).

Interesting ... I was sure for everybody it meant the first one,
along the line 'still needs cooking'. Wellllll ... seems not to work.

I'd have lots of other ideas, but feel this is getting too far.
How about brainstorming this on irc? Perhaps suggest a time that
suits you.

>> I think so. Is there a case for escaping two times? I don't see any,
>> and one could still easily craft a custom filter that does escape two
>> times.
>
> Damn. Your phrasing tipped me off to a case we need this more: RSS feeds
> and Atom content elements with type="html". :-(

Hmm, really ... I've not been into RSS or Atoms, so I wasn't
aware. I feel a little stupid about this, now. I assume that
inside the <summary> element you have to escape html?

> We might need a "mark as unsafe" filter for these cases (so that {{ var|
> escape|unsafe|escape }}) works (or just make "escape" not mark the
> string as safe, but I suspect that will have unintended annoying
> side-effects).

Alternatively, you could add a filter that escapes 'safe' strings
once and unsafe strings twice. Call it 'double_escape'. But this
is a minor issue. I'm presently not sure what is better.

Michael

SmileyChris

unread,
Jul 17, 2006, 6:30:36 AM7/17/06
to Django developers
Great job on the patch, Malcom!
I posted this in the ticket, then felt guilty because you told me not
to. So I'll post here for discusion.

A couple of points:
If a markup filter fails due to an import error, I don't think it
should be marked as safe.
>From a skim read of the patch, I'm missing the purpose of having an
.is_safe property on filters - can't you just check the outputted
string to see if it's SafeData?

Malcolm Tredinnick

unread,
Jul 17, 2006, 7:00:38 AM7/17/06
to django-d...@googlegroups.com
On Mon, 2006-07-17 at 03:30 -0700, SmileyChris wrote:
> Great job on the patch, Malcom!
> I posted this in the ticket, then felt guilty because you told me not
> to. So I'll post here for discusion.
>
> A couple of points:
> If a markup filter fails due to an import error, I don't think it
> should be marked as safe.

Why not? The returned result is the empty string in that case and
there's certainly no danger of that being presented in the raw. "Safe"
implies nothing beyond "does not require further HTML escaping" (and
that is the quite reasonable argument for finding another name for it).
If a filter is returning a safe (or whatever we end up calling it)
string, it should *always* return a safe string. Otherwise the end users
will be uncertain about whether the returned result is safe or not and
will always have to wrap it in an "escape" filter, which they will
forget to do.

> >From a skim read of the patch, I'm missing the purpose of having an
> .is_safe property on filters - can't you just check the outputted
> string to see if it's SafeData?

No. Take a SafeString instance, split it do something to it, run join on
the result. What you have is now a string (not a SafeString). So rather
than overloading every single string method on SafeString and every
single Unicode method on SafeUnicode -- which will add quite a bit of
function call overhead -- and rather than requiring filter writers to
always have to do "if SafeData" tests around string operations (which
they'll screw up), we mark the filter appropriately. If a filter is
marked "is_safe" and a safe string is passed in, then no matter how much
it gets put through the meat grinder internally, we can happily convert
it back to a SafeString at the end. Ditto for SafeUnicode. This makes
the code much shorter and adds a large measure of certainty to the
process.

For filters that are not universally safe (e.g. the pluralize case I
mentioned), it is still possible in some cases to internally check
internally if the munging would keep a safe string "safe" and then
explicitly call mark_safe() on it. That is what I intend to do to
pluralize().

Thanks for the feedback anyway.

Regards,
Malcolm

Malcolm Tredinnick

unread,
Jul 17, 2006, 7:13:17 AM7/17/06
to django-d...@googlegroups.com
On Mon, 2006-07-17 at 12:00 +0200, Michael Radziej wrote:
> Malcolm Tredinnick wrote:
> > On Sun, 2006-07-16 at 21:30 +0200, Michael Radziej wrote:
> >> I'm more for 'escaped' and 'raw', but not really violently. This is a
> >> minor issue, and I wouldn't like to get the work delayed by it.
> >> Also ... I volunteer to rewrite the docs if these terms change. But
> >> only once ;-)
> >
> > "Escaped" strikes me as bogus because it's not really the case: we are
> > just saying this output can be dumped in without further escaping.
>
> I see your point, you're right. But 'safe' still isn't necessary
> safe, I can perfectly mark unsafe strings as safe ;-)

Yeah. Btw, I can completely understand the flip side of this argument;
I'm partly just doing the Devil's Advocate thing so that things end up
on a solid foundation. You guys just need to invent a good name. I'll
write the code, you do the hard bit and come up with the
nomenclature. :-)

>
> > I thought about "raw" on Saturday and wondered if it would lead to
> > confusion: is a raw string "untreated" or "should not be treated
> > further" (we intend the latter).
>
> Interesting ... I was sure for everybody it meant the first one,
> along the line 'still needs cooking'. Wellllll ... seems not to work.

If you want to mark every "still needs cooking" string then you have to
mark *every* string that comes into the system (a la Perl's tainted
strings). The current method is to treat all strings as "untrusted" (or
requiring escaping) by default. Then we mark them when we have worked on
them. So the word you're looking for need to apply to "marked" strings,
not their former versions. (It's just too error-prone to try and catch
all strings on input and treat the ones we miss as safe.)

>
> I'd have lots of other ideas, but feel this is getting too far.
> How about brainstorming this on irc? Perhaps suggest a time that
> suits you.

I'll try to hang around on #django tomorrow if I don't get too busy.
But, seriously, just come up with some good names and make people pick
one. Stop letting people push back on your ideas and become The
Enforcer.

>
> >> I think so. Is there a case for escaping two times? I don't see any,
> >> and one could still easily craft a custom filter that does escape two
> >> times.
> >
> > Damn. Your phrasing tipped me off to a case we need this more: RSS feeds
> > and Atom content elements with type="html". :-(
>
> Hmm, really ... I've not been into RSS or Atoms, so I wasn't
> aware. I feel a little stupid about this, now. I assume that
> inside the <summary> element you have to escape html?

Only for type text="html" in Atom (which is Django's default production
method). There are a whole bunch of rules in the spec to accommodate
everybody from those who just can't get enough double-escaped markup in
their lives to those who want to use well-formed XML fragments to those
who want to shove raw bytes (base-64 encoded) in there. For RSS, things
are double-escaped everywhere as a matter of tradition.

> > We might need a "mark as unsafe" filter for these cases (so that {{ var|
> > escape|unsafe|escape }}) works (or just make "escape" not mark the
> > string as safe, but I suspect that will have unintended annoying
> > side-effects).
>
> Alternatively, you could add a filter that escapes 'safe' strings
> once and unsafe strings twice. Call it 'double_escape'. But this
> is a minor issue. I'm presently not sure what is better.

Again, I don't really care what the answer is here. I've thrown out some
ideas. People should propose better/different ones. This is very much a
non-religious issue for me beyond not wanting to sound stupid if/when I
stand up at a conference to give a tutorial on Django. So try to avoid
forming a consensus around calling the filter "elephant" or something,
ok? :-)

Regards,
Malcolm


Michael Radziej

unread,
Jul 17, 2006, 8:20:28 AM7/17/06
to django-d...@googlegroups.com
Malcolm Tredinnick wrote:

> If you want to mark every "still needs cooking" string then you have to
> mark *every* string that comes into the system (a la Perl's tainted
> strings).

Nonono ... I just was talking about terminology. We need a term
for "safe" and "unsafe" strings. I take it as granted that you'll
only actually mark "safe" ones.

>> I'd have lots of other ideas, but feel this is getting too far.
>> How about brainstorming this on irc? Perhaps suggest a time that
>> suits you.
>
> I'll try to hang around on #django tomorrow if I don't get too busy.
> But, seriously, just come up with some good names and make people pick
> one. Stop letting people push back on your ideas and become The
> Enforcer.

Well then ... some new ideas I like:

- trusted / untrusted
- processed / unprocessed
- resolved / unresolved (with a musical connotation
processing from dissonance to consonance)
- developed / undeveloped
- fixed / unfixed (like in photo processing)
- treated / untreated
- finalized / original
- trusted / tainted (why not?)
- geared / bare
- furnished / ?
- malcomized / unmalcomized (ok, just joking ;-)

And we already have:
- safe / unsafe
- escaped / raw or unescaped

I currently have a taste for the first two, the rest is more an
invitation for others ... Now come on, native speakers, you
should be able to bring in more ideas!

Michael


adurdin

unread,
Jul 17, 2006, 10:15:06 AM7/17/06
to Django developers
Malcolm Tredinnick wrote:
> When a variable is evaluated in a context in a template, it is
> considered to be either "safe" or not (Simon used the term "escaped",
> but that seemed less universally true than "safe").

As long as we're discussing terminology, might as well enumerate the
situations where we'd want the terms to be applicable:

Sources:
1. Ordinary string, not intended to have HTML in it, but may have &s or
<s
2. HTML code string (obviously contains markup)

Actions taken:
A. String is output without processing
B. String is output by substituting entity references for special
characters

Examples:

1A: We'd use this if our template was for a text/plain document, or
some other non-HTML non-XML document.

1B: The normal case for strings as the content of an HTML page.

2A: This is the case we use when the string contains markup, say the
output from textile.

2B: This is the case when we're outputting an HTML-formatted RSS feed,
or outputting the content for a textarea that's used for editing HTML.

As far as the terms "safe" and "unsafe" go, they really only describe
1B, and make very little sense for 1A and 2B. "escaped" and
"raw"/"unescaped" apply in all four, as they're stating that the string
has passed through an "escaping" process.

The names for the string classes are a different situation again; e.g.
a SafeString is not really safe, but will be made so -- the name
UnescapedString isn't much better. RawString?

Andrew

Daniel Poelzleithner

unread,
Jul 18, 2006, 12:06:28 PM7/18/06
to django-d...@googlegroups.com
Malcolm Tredinnick wrote:

> Damn. Your phrasing tipped me off to a case we need this more: RSS feeds
> and Atom content elements with type="html". :-(
>
> We might need a "mark as unsafe" filter for these cases (so that {{ var|
> escape|unsafe|escape }}) works (or just make "escape" not mark the
> string as safe, but I suspect that will have unintended annoying
> side-effects).

I suggest {{var|escape|escape:force}}

force as an optional argument to escape already escaped strings.

But an unsafe filter may be usefull, too.

kindly regards
Daniel

Michael Radziej

unread,
Jul 18, 2006, 5:05:09 PM7/18/06
to django-d...@googlegroups.com

Am 18.07.2006 um 18:06 schrieb Daniel Poelzleithner:

>
> Malcolm Tredinnick wrote:
>
>> Damn. Your phrasing tipped me off to a case we need this more: RSS
>> feeds
>> and Atom content elements with type="html". :-(
>>
>> We might need a "mark as unsafe" filter for these cases (so that
>> {{ var|
>> escape|unsafe|escape }}) works (or just make "escape" not mark the
>> string as safe, but I suspect that will have unintended annoying
>> side-effects).
>
> I suggest {{var|escape|escape:force}}
>
> force as an optional argument to escape already escaped strings.

'force' is nice! But a string as an argument that modifies the
function feels a bit dirty.

How about calling it 'forced_escape'? Like

{{ var | escape | forced_escape }}

That's much better to read than

{{ var | escape | mark_unsafe | escape }}

... which is just mind twisting (huh? They escape? mark it as unsafe?
escape again?
what was this crazy programmer smoking, man ...)


>
> But an unsafe filter may be usefull, too.

Not sure. And you can still write one for your own usage, it's trivial.

Michael

Michael Radziej

unread,
Jul 18, 2006, 5:18:21 PM7/18/06
to django-d...@googlegroups.com
Hi,

I made up my mind and I think I have the solution (for the is_safe
terminology, django world domination, and all the rest :-)

* finalized *

So it's :

FinalizedString (replaced SafeString)
mark_finalized() (replaces mark_safe)
preserves_finalized (replaces is_safe as a function attribute)

It seems there's no need for the opposite.

Rationale for this choice
--------------------------------

* 'finalized' is a more neutral term than 'safe'.
* it does not make any wrong promises.
* it does not suggest that it has actually been escaped
or processed in any way, like 'escaped' did.
* 'finalized' means that its form is considered final, and that
exactly defines the concept.


Using these terms, I've edited Malcolm's introduction of his patch.
Some comments are in double brackets [[ ]].

******** Snip *****

A summary of the points I'd like opinions on is at the end of the email.

What does this add?
-------------------
(1) An "autoescape" template tag that turns automatic escaping on or off
throughout its scope.

(2) A "noescape" filter that marks its result as finalized for use
without
further escaping (see the description of "finalized strings" below).

(3) Safe Context ...

[[remark: I think it's now consens that SafeContext won't make the
run, so this paragraph would be removed anyway ]]

(4) A "mark_finalized()" method to mark strings as not requiring further
escaping.

How does it work?
-----------------
When a variable is evaluated in a context in a template, it is

considered to be either "finalized" or not. By default, strings
are not marked as finalized.

When automatic escaping is enabled, because {% autoescape on %}
is in effect, all strings
that are not marked as finalized are escaped at rendering time (here,


"escaped" means "conservative HTML escaping": &, <, >, " and ' are
converted to entities always).

Any string marked as finalized (or passed through the "noescape"

filter) is
not automatically escaped.

When automatic escaping is disabled in the template, all variable
results are output without further escaping, unless the "escape" filter
is applied to them (this is the same behaviour as currently in Django).

Because some filters are designed to return raw markup, the

mark_finalized()


function exists so that the returned strings can be designated as

finalized.


For example, {{ var|markdown }} returns raw HTML and the result is not
subject to any further auto-escaping.

Some filters (e.g. unordered_list) wrap HTML around raw content. If
auto-escaping is enabled, these filters will escape the content before

wrapping it in the HTML tags (the returned result is a finalized

string in
all cases). So such filters are auto-escaping-aware.

Filters that accept text strings as input and return text strings are

marked (with the "preserves_finalized" attribute) as to whether or
not they return a
finalized string whenever they are passed a finalized string as
input. This has
the effect of preserving "finalizedness" for those filters. Note that
this
attribute is a *guarantee* of preserving finalizedness, so a filter like
"cut" has preserves_finalized = False: {{ var|cut:"&" }} could turn
an escaped
string into a monster. If a finalized string is passed into a filter
that is
not marked as finalized and auto-escaping is enabled, the resulting
string
will be escaped. If the preserves_finalized attribute is not attached
to a function,
it is assumed to be not preserving.

Because of the preserves_finalized attribute, it will be possible to

change the
automatically generated documentation in the admin interface to annotate

each filter whether it guarantees to preserve finalizedness.

[[ I took the freedom to edit a bit more liberally in the preceding
sentence, 'with its guarantee of preserving finalizedness' sounds too
much nouns for my ears ]]

The "noescape" filter acts as a way to annotate the result of a filter

chain as finalized. So, although "cut" is not a filter preserving
finalizedness, we know that
cut:"x" is preserving (it can't harm our HTML-escaped strings) and thus


{{ var|cut:"x"|noescape }} will prevent further escaping of the result,
even in auto-escaping-enabled situations. The "noescape" filter does

nothing except mark the result as finalized -- the string output is
identical
to the input.

Is it backwards compatible?
---------------------------
Mostly.

Auto-escaping is not turned on by default (Adrian made a statement in

[2] and I'm going with that preference at the moment). If you do not use


the autoescape tag, it will be very close to what happens today.

[2]
http://groups.google.com/group/django-developers/msg/5a57f37667e1e941?


Four filters had their behaviour slightly changed.

Three of these are: linebreaks, linebreaksbr and linenumbers. All three
now respect the current auto-escaping setting on their input content
(before applying breaks or numbering). Previously, linenumbers would
always escape and linebreaks and linebreaksbr would never escape. So,
the main change for somebody not enabling auto-escaping here is that the
linenumbers filter will not escape the output any longer.

The fourth filter is "escape". To make forward porting easier and so
that template designers do not have to feel restricted in their use of
the escape filter, I implemented it so that applying "escape" to a

finalized
string has no effect. Since "escape" itself makes the result finalized,


applying escape multiple times in a chain has the same effect as
applying it exactly once.

Previously (var = "&"): {{ var|escape|escape }} => &amp;amp;
Now: {{ var|escape|escape }} => &amp;

This particular case of chaining "escape" is obviously not common (I
would hope).

All of the default filters (template/defaultfilters.py) and the markup
filters have been ported in this patch. I have not done anything else
under contrib/ or the i18n filters.

Points to note
--------------
(1) Because "preserves_finalized" has to be valid for all arguments,
the "pluralize"
filter is not preserving at the moment. The bizarre {{ var|

puralize:"&" }} is
an example of the problem case. I'm thinking of fixing this so that we

check for unescaped characters in the argument(s) and then return
finalized
strings on finalized input and no unescaped characters in the args.

(2) Because of the way "noescape" works, it was not really possible to
make {% filter noescape %} work in an auto-escaping block (the contents
were escaped long before the filter tag was applied). Fortunately, this
particular filter tag construct is equivalent to {% autoescape off %},
so there's no functionality loss. A TemplateSyntaxError is raised if the
illegal construct is used.

(3) Filters that take non-string arguments (e.g. "join") or return
something other than a string (e.g. "length") have

preserves_finalized = False. This


is convention more than requirement, but it makes things explicit.

Performance impact
-------------------
Obviously we are doing a little bit more work here, even in the
non-auto-escaping paths (just testing what the auto-escape setting is,
for example). As far as I can work out, the performance impact is very
minor, but I do not have a really good performance test suite for this
at the moment.

One simple test: running the tests/othertests/template.py file 100 times
takes 21.0436 seconds before these changes and 21.1674 seconds
afterwards -- averaged over five runs on my desktop machine. This tests
the "no escaping" path. That's a slowdown 0.6% (and that was almost
within the "noise" of the various runs). These tests aren't particularly
comprehensive, but they do test the templating code a reasonable amount
(although not the filters very much at all).

What are the issues at the moment?
----------------------------------
Now we get to the things I want to sort out before going much further.

(1) Any violent (or even just passionate) objections to using terms like
"finalize" and mark_finalized()?


- Should we use Simon's original proposal of escaped and
mark_escaped()? I feel "safe" is a bit more consistent with the
behaviour (an opposite-but-similar term to Perl's "tainted").

(2) Is the new behaviour of "escape" reasonable (i.e. it does nothing on
finalized strings)?


- The only drawback of this is that there is no way to give an

escaped version of a finalized string in the templates. That

is,
there is no opposite to the "noescape" filter.

- If we make "escape" apply to finalized strings as well,

then views
must be very consistent about variables always having the same

"finalizedness" state. Otherwise, the template would have to

escape
sometimes and not escape other times and it has no way of
knowing when. The current implementation lets you whack an
escape filter on there and it will work always.

- Current behaviour also makes forward porting easier (you
don't
have to run around removing all the escape filters in your code
immediately).

(3) Auto-escaping inherits down through template inclusions. That is, if


you extend a template that has auto-escaping enabled, you get
auto-escaping enabled (obviously the autoescape template tag can control
this). Anybody have a strong reason not to do this?
- Personally, I think this is a no-brainer, but I've been wrong
plenty of times before.

(4) Should generic views use SafeContext by default? ...

[[ Should probably be removed ]]

(5) Adrian, Jacob: do you guys still want "off by default"?
- I *really* don't care what the answer is here, but I would
rather not have to change things after porting everything under
contrib/ .

- For people thinking it's auto-escaping or nothing, {%
autoescape on %} at the beginning of a template (and {%
endautoescape %}) at the end is not a huge imposition.

Feedback obviously welcome and appreciated.

********* end **************


jeremy bornstein

unread,
Jul 18, 2006, 7:02:52 PM7/18/06
to django-d...@googlegroups.com
In some circles, "finalization" is what happens to an object immediately
before it is GC'd, so this choice may end up being confusing. This is
the case with respect to Java, for example.

SmileyChris

unread,
Jul 18, 2006, 9:03:59 PM7/18/06
to Django developers

Malcolm Tredinnick wrote:
> On Mon, 2006-07-17 at 03:30 -0700, SmileyChris wrote:
> > A couple of points:
> > If a markup filter fails due to an import error, I don't think it
> > should be marked as safe.
>
> Why not? The returned result is the empty string in that case and
> there's certainly no danger of that being presented in the raw.

By the way, I just went and checked this for markup.
An unfinalized string is returned (not an empty string). So I still
think it shouldn't be marked as safe on an import error.

Malcolm Tredinnick

unread,
Jul 18, 2006, 9:19:27 PM7/18/06
to django-d...@googlegroups.com

That's just a bug in the patch. A couple of mark_safe() calls also need
escape() wrapped around them. A filter cannot be half-and-half for the
reasons I gave earlier: it would be no better than not having this patch
in at all.

Regards,
Malcolm


Malcolm Tredinnick

unread,
Jul 18, 2006, 9:25:02 PM7/18/06
to django-d...@googlegroups.com

Sorry, that was too categorical. "A filter that is primarily designed to
return pre-marked-up data should not be half-and-half," is a better way
of saying what I mean.

Malcolm


Michael Radziej

unread,
Jul 19, 2006, 8:40:49 AM7/19/06
to django-d...@googlegroups.com
jeremy bornstein wrote:
> In some circles, "finalization" is what happens to an object immediately
> before it is GC'd, so this choice may end up being confusing. This is
> the case with respect to Java, for example.

Doesn't keep me from liking it, and Java is not python. Probably
each and every word is occupied in some different language.

'escape' and 'safe' have a different meaning for fireworkers, too ;-)

Michael

SmileyChris

unread,
Jul 19, 2006, 4:56:14 PM7/19/06
to Django developers
> 'escape' and 'safe' have a different meaning for fireworkers, too ;-)
Or bank robbers :-P

Back on topic, I like finalization too (even though I cringe having to
write the american Z version).

Michael Radziej

unread,
Jul 19, 2006, 7:11:24 PM7/19/06
to django-d...@googlegroups.com
Hi Chris,

Am 19.07.2006 um 22:56 schrieb SmileyChris:
> Back on topic, I like finalization too (even though I cringe having to
> write the american Z version).

Yeah, but default TZ is Chicago, so ... I chose zee. Be glad that
you're not forced to spell 'aluminum' somwhere!

The Old Britain Empire had better not put so much taxes on tea in
those days.

;-)

Michael

Todd O'Bryan

unread,
Jul 19, 2006, 10:25:31 PM7/19/06
to django-d...@googlegroups.com
Is xml_escaped just too verbose? Seems very descriptive and unambiguous.

Todd

Michael Radziej

unread,
Jul 20, 2006, 3:02:29 AM7/20/06
to django-d...@googlegroups.com

Am 20.07.2006 um 04:25 schrieb Todd O'Bryan:

>
> Is xml_escaped just too verbose? Seems very descriptive and
> unambiguous.

Do you mean

mark_xml_escaped for mark_safe,
XmlEscapedString for SafeString,
is_xml_escaped for is_safe (as function attribute)?

In the (long) discussion, this has already been rejected for
'escaped', and the reasons also apply to 'xml_escaped'.

Michael

Michael Radziej

unread,
Feb 7, 2007, 3:11:31 AM2/7/07
to django-d...@googlegroups.com
Hi,

I'd like to revive the discussion about autoescape (note that it is
*not* on by default). I have brought the patches up to date (see the
notes in the ticket, #2359), and I'm starting to use this now in my
own projects (with the exception of the admin patch which I have no
use for). I can only report that it works great--thanks, Malcolm!

It currently lacks newforms support, but I don't expect any problems
with it. You basically have to wrap mark_safe around the strings
that represent rendered html code. newforms are only a bit moving
around too fast to get proper hold of ;-)

--> http://code.djangoproject.com/ticket/2359

Cheers,

Michael


Malcolm Tredinnick:


> I have put an initial version of the auto-escaping patch I mentioned
> yesterday into ticket #2359. I'll briefly describe what it does below.
> The patch includes changes to the core and a test suite for the
> auto-escaping changes (which is about half the patch).
>

> My reason for posting this first pass is that there are a few issues
> that have come up that I would like to get some consensus on now, rather
> than after I have ported the rest of the core over to use this stuff. If
> I need to change some things, this is the easiest time to do it. There
> is no documentation patch in this pass because, again, I don't really
> feel like having to re-edit practically the whole doc change if we
> decide to rename some things. This email acts as a documentation proxy
> for the moment.
>
> The whole implementation is very close to Simon Willison's original
> proposal [1]. With one exception, I have only modified it where there
> were technical requirements to do so. I'm going to assume familiarity
> with that throughout. His proposal really is very good and it seems to
> meet all the sensible requirements brought up in the three threads
> listed at the bottom of that page (the two older threads are probably
> more informative than the most recent one, for those wanting to get up
> to speed here).
>
> [1] http://code.djangoproject.com/wiki/AutoEscaping


>
> A summary of the points I'd like opinions on is at the end of the email.
>
> What does this add?
> -------------------
> (1) An "autoescape" template tag that turns automatic escaping on or off
> throughout its scope.
>

> (2) A "noescape" filter that marks its result as safe for use without
> further escaping (see the description of "safe strings" below).


>
> (3) SafeContext and SafeRequestContext classes that act like Context and
> RequestContext, except that they automatically enable auto-escaping in
> the templates they are applied to (you can also set context.autoescape
> on any Context-derived instance, just as in Simon's proposal).
>

> (4) A "mark_safe()" method to mark strings as not requiring further


> escaping.
>
> How does it work?
> -----------------
> When a variable is evaluated in a context in a template, it is

> considered to be either "safe" or not (Simon used the term "escaped",

> but that seemed less universally true than "safe"). By default, strings
> are not marked as safe.
>
> When automatic escaping is enabled, either because {% autoescape on %}
> is in effect, or because a SafeContext class is being used, all strings
> that are not marked as safe are escaped at rendering time (here,


> "escaped" means "conservative HTML escaping": &, <, >, " and ' are
> converted to entities always).
>

> Any string marked as safe (or passed through the "noescape" filter) is


> not automatically escaped.
>
> When automatic escaping is disabled in the template, all variable
> results are output without further escaping, unless the "escape" filter
> is applied to them (this is the same behaviour as currently in Django).
>

> Because some filters are designed to return raw markup, the mark_safe()
> function exists so that the returned strings can be designated as safe.


> For example, {{ var|markdown }} returns raw HTML and the result is not
> subject to any further auto-escaping.
>
> Some filters (e.g. unordered_list) wrap HTML around raw content. If
> auto-escaping is enabled, these filters will escape the content before

> wrapping it in the HTML tags (the returned result is a safe string in


> all cases). So such filters are auto-escaping-aware.
>
> Filters that accept text strings as input and return text strings are

> marked (with the "is_safe" attribute) as to whether or not they return a
> safe string whenever they are passed a safe string as input. This has
> the effect of preserving "safeness" for those filters. Note that this
> attribute is a *guarantee* of preserving safeness, so a filter like
> "cut" has is_safe = False: {{ var|cut:"&" }} could turn an escaped
> string into a monster. If a safe string is passed into a filter that is
> not marked as safe and auto-escaping is enabled, the resulting string
> will be escaped. If the is_safe attribute is not attached to a function,
> it is assumed to be not safe.
>
> Because of the is_safe attribute, it will be possible to change the


> automatically generated documentation in the admin interface to annotate

> each filter with it's "safeness" guarantee.


>
> The "noescape" filter acts as a way to annotate the result of a filter

> chain as safe. So, although "cut" is not a safe filter, we know that
> cut:"x" is safe (it can't harm our HTML-escaped strings) and thus


> {{ var|cut:"x"|noescape }} will prevent further escaping of the result,
> even in auto-escaping-enabled situations. The "noescape" filter does

> nothing except mark the result as safe -- the string output is identical


> to the input.
>
> Is it backwards compatible?
> ---------------------------
> Mostly.
>
> Auto-escaping is not turned on by default (Adrian made a statement in

> [2] and I'm going with that preference at the moment). If you pass a

> normal Context or Context-derived class into a template and do not use


> the autoescape tag, it will be very close to what happens today.
>
> [2]
> http://groups.google.com/group/django-developers/msg/5a57f37667e1e941?
>
>
> Four filters had their behaviour slightly changed.
>
> Three of these are: linebreaks, linebreaksbr and linenumbers. All three
> now respect the current auto-escaping setting on their input content
> (before applying breaks or numbering). Previously, linenumbers would
> always escape and linebreaks and linebreaksbr would never escape. So,
> the main change for somebody not enabling auto-escaping here is that the
> linenumbers filter will not escape the output any longer.
>
> The fourth filter is "escape". To make forward porting easier and so
> that template designers do not have to feel restricted in their use of

> the escape filter, I implemented it so that applying "escape" to a safe
> string has no effect. Since "escape" itself makes the result safe,


> applying escape multiple times in a chain has the same effect as
> applying it exactly once.
>
> Previously (var = "&"): {{ var|escape|escape }} => &amp;amp;
> Now: {{ var|escape|escape }} => &amp;
>
> This particular case of chaining "escape" is obviously not common (I
> would hope).
>
> All of the default filters (template/defaultfilters.py) and the markup
> filters have been ported in this patch. I have not done anything else
> under contrib/ or the i18n filters.
>
> Points to note
> --------------

> (1) Because "is_safe" has to be valid for all arguments, the "pluralize"
> filter is not safe at the moment. The bizarre {{ var|puralize:"&" }} is


> an example of the problem case. I'm thinking of fixing this so that we

> check for unsafe characters in the argument(s) and then return safe
> strings on safe input and no unsafe args.


>
> (2) Because of the way "noescape" works, it was not really possible to
> make {% filter noescape %} work in an auto-escaping block (the contents
> were escaped long before the filter tag was applied). Fortunately, this
> particular filter tag construct is equivalent to {% autoescape off %},
> so there's no functionality loss. A TemplateSyntaxError is raised if the
> illegal construct is used.
>
> (3) Filters that take non-string arguments (e.g. "join") or return

> something other than a string (e.g. "length") have is_safe = False. This


> is convention more than requirement, but it makes things explicit.
>
> Performance impact
> -------------------
> Obviously we are doing a little bit more work here, even in the
> non-auto-escaping paths (just testing what the auto-escape setting is,
> for example). As far as I can work out, the performance impact is very
> minor, but I do not have a really good performance test suite for this
> at the moment.
>
> One simple test: running the tests/othertests/template.py file 100 times
> takes 21.0436 seconds before these changes and 21.1674 seconds
> afterwards -- averaged over five runs on my desktop machine. This tests
> the "no escaping" path. That's a slowdown 0.6% (and that was almost
> within the "noise" of the various runs). These tests aren't particularly
> comprehensive, but they do test the templating code a reasonable amount
> (although not the filters very much at all).
>
> What are the issues at the moment?
> ----------------------------------
> Now we get to the things I want to sort out before going much further.
>
> (1) Any violent (or even just passionate) objections to using terms like

> "safe" and mark_safe()?


> - Should we use Simon's original proposal of escaped and
> mark_escaped()? I feel "safe" is a bit more consistent with the
> behaviour (an opposite-but-similar term to Perl's "tainted").
>
> (2) Is the new behaviour of "escape" reasonable (i.e. it does nothing on

> safe strings)?


> - The only drawback of this is that there is no way to give an

> escaped version of a safe string in the templates. That is,


> there is no opposite to the "noescape" filter.
>

> - If we make "escape" apply to safe strings as well, then views


> must be very consistent about variables always having the same

> "safeness" state. Otherwise, the template would have to escape


> sometimes and not escape other times and it has no way of
> knowing when. The current implementation lets you whack an
> escape filter on there and it will work always.
>
> - Current behaviour also makes forward porting easier (you don't
> have to run around removing all the escape filters in your code
> immediately).
>
> (3) Auto-escaping inherits down through template inclusions. That is, if
> you extend a template that has auto-escaping enabled, you get
> auto-escaping enabled (obviously the autoescape template tag can control
> this). Anybody have a strong reason not to do this?
> - Personally, I think this is a no-brainer, but I've been wrong
> plenty of times before.
>
> (4) Should generic views use SafeContext by default?

> - I haven't touched this yet, but it's not an insane idea. I
> guess most people will divide along the same lines as those
> wanting auto-escaping on or off by default. The waverers will be
> those favouring consistency over all. I'm not a big enough
> generic views user to really have a vote in this one.
>

> (5) Adrian, Jacob: do you guys still want "off by default"?
> - I *really* don't care what the answer is here, but I would
> rather not have to change things after porting everything under
> contrib/ .
>
> - For people thinking it's auto-escaping or nothing, {%
> autoescape on %} at the beginning of a template (and {%
> endautoescape %}) at the end is not a huge imposition.
>
> Feedback obviously welcome and appreciated.
>

> Regards,
> Malcolm
>
>
> >


--
noris network AG - Deutschherrnstraße 15-19 - D-90429 Nürnberg -
Tel +49-911-9352-0 - Fax +49-911-9352-100

http://www.noris.de - The IT-Outsourcing Company

Malcolm Tredinnick

unread,
Feb 7, 2007, 3:55:32 AM2/7/07
to django-d...@googlegroups.com
Hey Michael,

On Wed, 2007-02-07 at 09:11 +0100, Michael Radziej wrote:
> Hi,
>
> I'd like to revive the discussion about autoescape (note that it is
> *not* on by default). I have brought the patches up to date (see the
> notes in the ticket, #2359), and I'm starting to use this now in my
> own projects (with the exception of the admin patch which I have no
> use for). I can only report that it works great--thanks, Malcolm!
>
> It currently lacks newforms support, but I don't expect any problems
> with it. You basically have to wrap mark_safe around the strings
> that represent rendered html code. newforms are only a bit moving
> around too fast to get proper hold of ;-)
>
> --> http://code.djangoproject.com/ticket/2359

I just got back today from overseas, so after I've worked out which way
is up I'll have a look at your fixes and fill in the missing bits
(newforms + admin).

Any documentation confusion would be good to hear about, since I
remember that being a bit tricky to write the first time around --
finding the right mix between accurate, comprehensible and non-scary was
the problem.

Thanks for the testing.

Cheers,
Malcolm

Michael Radziej

unread,
Feb 7, 2007, 4:17:33 AM2/7/07
to django-d...@googlegroups.com
Malcolm Tredinnick:

> I just got back today from overseas, so after I've worked out which way
> is up I'll have a look at your fixes and fill in the missing bits
> (newforms + admin).

Hey, nice to hear you're back and safe!

It would certainly be good if you could look into the new patches,
there were a lot of conflicts.

But there's no real need to hurry. newforms and admin are heading
for lots of changes, and unless Adrian would like to commit this
patch soon, you'd only have to go through a lot of merge conflicts.

> Any documentation confusion would be good to hear about, since I
> remember that being a bit tricky to write the first time around --
> finding the right mix between accurate, comprehensible and non-scary was
> the problem.

I think a "when to use what" section about the differences between
escape, conditional_escape, the template filter `escape`, and
mark_for_escaping would be good. I needed this stuff mostly in
template filters and tags.

And I had a surprise with escape(), since the template filter does
conditional_escape, but the one from django.utils.html does not (I
would have expected that it did conditional_escape, too). But I
didn't really had to use the docs much since I was very involved in
the discussion.


> Thanks for the testing.

That's the least I can do! I am *very* interested in this patch.

BTW, I saw that you're using git too (at least the patches look like
it), would it make sense to share the repositories?


Cheers,

Michael

Malcolm Tredinnick

unread,
Feb 7, 2007, 4:48:25 AM2/7/07
to django-d...@googlegroups.com
On Wed, 2007-02-07 at 10:17 +0100, Michael Radziej wrote:
> Malcolm Tredinnick:
> > I just got back today from overseas, so after I've worked out which way
> > is up I'll have a look at your fixes and fill in the missing bits
> > (newforms + admin).
>
> Hey, nice to hear you're back and safe!
>
> It would certainly be good if you could look into the new patches,
> there were a lot of conflicts.

After I sent the last mail, I realised the tests would probably not
apply cleanly either (since I wrote the patch right before Russell
rewrote the test infrastructure).

> But there's no real need to hurry. newforms and admin are heading
> for lots of changes, and unless Adrian would like to commit this
> patch soon, you'd only have to go through a lot of merge conflicts.
>
> > Any documentation confusion would be good to hear about, since I
> > remember that being a bit tricky to write the first time around --
> > finding the right mix between accurate, comprehensible and non-scary was
> > the problem.
>
> I think a "when to use what" section about the differences between
> escape, conditional_escape, the template filter `escape`, and
> mark_for_escaping would be good. I needed this stuff mostly in
> template filters and tags.

Yeah, okay; I'll try to work that out. The trick is not to provide "too
much information" in the wrong place. Somebody working just at the
template level doesn't have to care about the auto-escaping mechanics
(as you discovered when you only had to really worry about custom
filters and tags)). Whereas somebody working at the lower level, like
when writing template tags does need to know that. I'd tried to make
some kind of summary in the "tag writers" section, but I'll give it
another go.

> And I had a surprise with escape(), since the template filter does
> conditional_escape, but the one from django.utils.html does not (I
> would have expected that it did conditional_escape, too). But I
> didn't really had to use the docs much since I was very involved in
> the discussion.

So the "escape" template filter using conditional_escape was intentional
for the reasons that were discussed on the list (basically, don't
inadvertently double-escape and make it so that template authors can
scatter "|escape" about with abandon for template fragments that might
be used in both auto-escaping or non-auto-escaping contexts).

I didn't change django.utils.html.escape() though, since I was trying to
avoid breaking existing code and that function can be called from
outside the templating system (in that sense, the naming is logical;
away from the templating system, escape() escapes always). The way it's
currently implemented, the auto-escaping additions are transparent and
fully backwards-compatible. If we change django.utils.html.escape() that
is no longer true, but we should make a decision about that before
applying the patch to the mainline.

>
>
> > Thanks for the testing.
>
> That's the least I can do! I am *very* interested in this patch.
>
> BTW, I saw that you're using git too (at least the patches look like
> it), would it make sense to share the repositories?

Yes, I'm using git (well, cogito and stg, really, but it's all the same
storage on disk). I'll make my repo publically accessible shortly (when
I fix up the patches) and you're quite welcome to pull from it if it
will make testing easier.

For Django core, I'm using "stacked git" (stg), since I want to be able
to manage things as a stack of patches (rather than just one patch). I
found I was often taking existing patches and modifying them slightly
during testing and evaluation and I have a lot of active branches at any
given time. I have it all down to a reasonably science now, but my dev
setup is very Unix/Linux-based, since git doesn't really work well on
Windows.

But it's all "git" under the covers. I wrote up a brief description when
I started using this a few months ago:
http://www.pointy-stick.com/blog/topics/software/version%20control/ .

Cheers,
Malcolm

Michael Radziej

unread,
Feb 7, 2007, 5:32:31 AM2/7/07
to django-d...@googlegroups.com
Malcolm Tredinnick:

> I didn't change django.utils.html.escape() though, since I was trying to
> avoid breaking existing code and that function can be called from
> outside the templating system (in that sense, the naming is logical;
> away from the templating system, escape() escapes always). The way it's
> currently implemented, the auto-escaping additions are transparent and
> fully backwards-compatible. If we change django.utils.html.escape() that
> is no longer true, but we should make a decision about that before
> applying the patch to the mainline.

OK, I'm +0 for keeping django.utils.html.escape() as is.

But when you use auto-escaping, you should probably never use
escape() directly, and this is a point for the documentation.


So long,

Nicola Larosa (tekNico)

unread,
Feb 7, 2007, 7:01:59 AM2/7/07
to Django developers
Malcolm Tredinnick wrote:
> But it's all "git" under the covers. I wrote up a brief description when
> I started using this a few months ago:
> http://www.pointy-stick.com/blog/topics/software/version%20control/ .

(I know, I should have directly commented on that page, and I would
have, if there would have been a way to do so. ;-) )

"One of the talks I went to at OSCON was about using mq: patch queue
management on top of mercurial, a la quilt."
...
"Inspired by this talk, I checked out stgit, since I tend to use
cogito as my personal version control system of choice these days for
various reasons."

You don't say what those reasons are; presumably you were already
using git and cogito, so trying stgit was the most efficient path.

Nonetheless, I am interested in the reasons why you apparently did not
consider using Mercurial and mq, that seem to have the features you
need, and are mostly written in Python.

It's not that one always aspires to a wholly Pythonic world (well, not
when fully awake, at least ;-) ), but if the tools are Pythonic, it
should be easier hacking *on* them, instead of just with them, if and
when needed.


--
Nicola Larosa - http://www.tekNico.net/

I've heard that some people have a saying: "Pain is weakness leaving
the body." If that's true, then fear is also weakness leaving the
mind. So, go ahead and do what you are afraid you can't. It is not the
way to an easy life, only a worthwhile one. -- Phillip J. Eby, August
2006

Malcolm Tredinnick

unread,
Feb 7, 2007, 4:20:17 PM2/7/07
to django-d...@googlegroups.com
[This is very off-topic, so I'll make this my last post on this topic.
Email me directly if you want more info.]

On Wed, 2007-02-07 at 04:01 -0800, Nicola Larosa (tekNico) wrote:
> Malcolm Tredinnick wrote:
> > But it's all "git" under the covers. I wrote up a brief description when
> > I started using this a few months ago:
> > http://www.pointy-stick.com/blog/topics/software/version%20control/ .
>
> (I know, I should have directly commented on that page, and I would
> have, if there would have been a way to do so. ;-) )
>
> "One of the talks I went to at OSCON was about using mq: patch queue
> management on top of mercurial, a la quilt."
> ...
> "Inspired by this talk, I checked out stgit, since I tend to use
> cogito as my personal version control system of choice these days for
> various reasons."
>
> You don't say what those reasons are; presumably you were already
> using git and cogito, so trying stgit was the most efficient path.

I didn't really seek out "most efficient". It wasn't needed. I needed
"sufficiently efficient", and since I was already using git and cogito,
something on top of those had a low barrier to entry for me.

> Nonetheless, I am interested in the reasons why you apparently did not
> consider using Mercurial and mq, that seem to have the features you
> need, and are mostly written in Python.

Well, stgit is written in Python, if that is one of your selection
criteria. It's not one of mine, though.

On a much more practical level, there are so many version control
systems available these days, that the decision ultimately comes down to
"pick one or two that are in wide use in the area you work in and use
them". I work on projects that use CVS, subversion and git, so they tend
to be the version control systems I use for my personal work as well.
Keeping the command sets for more tools inside my head and readily
available is more work than my old brain can handle. Mercurial just
isn't mainstream enough in the domains I work in for it to be something
I need to worry about at the moment.

> It's not that one always aspires to a wholly Pythonic world (well, not
> when fully awake, at least ;-) ), but if the tools are Pythonic, it
> should be easier hacking *on* them, instead of just with them, if and
> when needed.

There are assumptions in that paragraph that don't apply to my choices
here: (a) I am not really that interested in hacking on version control
systems unless there is some kind of show-stopper bug (unlikely with the
systems I'm using since they are all in production-level use on very
large projects.), (b) not being in Python is not a problem if I want to
look at the code (maybe it would be if Python was the only language I
understood and I didn't want to branch out. But git and cogito are in C
and shell, for example; not exactly fringe languages and two languages I
can use comfortably if I wanted to work on these systems).

To be honest, (a) is the over-riding point here: I just don't care about
the language it's in providing it's something I already have installed
on my system (so Darcs is never going to get a look in, for example).
And once you get to some level of experience, all languages are pretty
much the same, so a sufficiently motivated (read "desperate") person
could work on anything with the right mindset. Although Python code is
pretty readable and easy to understand when you first approach it and is
my language of choice for almost everything these days, it isn't that
much of an advantage that I'm going to use it as an exclusion criterion
for a tool I primarily want to use (rather than develop).

Regards,
Malcolm

Reply all
Reply to author
Forward
0 new messages