On 6/16/06, Christopher Lenz <cml...@gmx.de> wrote:
> To reiterate: templates shouldn't need to care about escaping. Django > *in particular* uses an intentionally dumbed down template system > that is supposed to be easy for non-programmers, which includes the > notion that little mistakes in templates shouldn't break the site or > even introduce security holes.
The problem here, architecture-wise, is that the template is the thing that cares about what output looks like. Moving the decision of whether to escape or not into some other part of the stack breaks with that and introduces the possibility of frustrating inconsistency in the templating system; explaining to a template author why {{ foo }} escapes in one case but not another, based on (to the template author) black magic happening in the backend isn't something I particularly want to do.
> IMHO, a real solution for this problem is that any normal string > inserted into template output is escaped by default. This does not > necessarily mean that there needs to be an unescape filter, though.
Yes. Yes, it does.
> In fact, most of the time Django components that generate a string > they *know* that they are generating text that must not be escaped, > such as the output of the markdown filter, or form field render() > results. Those places should flag the strings they are generating in > some way (for example by wrapping them in a special class), thereby > signaling to the template system that those strings should not be > escaped again.
As someone who's followed various RSS-related discussions for a long time, I can say that having multiple layers of a system have to worry about whether the other layers have escaped or unescaped something is a very special kind of hell that I don't want Django to get mired in.
But beyond that, it feels like a violation of loose coupling; doing this would bind Django components to each other in ways that don't feel right.
My vote is for escaping being off unless explicitly turned on, and for it being turned on in the template.
-- "May the forces of evil become confused on the way to your house." -- George Carlin
An additional field type would be added, extending CharField, called say "HTMLSafeField". It would strip/escape/convert/reject invalid strings both when being set and when being read. Otherwise it would behave just like a CharField.
The key is not to think of it as an escaping mechanism; simply as a data validity check. And there is ample precedence for this in Django. What are EmailFields, PhoneNumberFields and SlugFields if not simply CharFields that match a regex?
"Intro" users who are not able to grok XSS can simply be told to always use HTMLSafeFields instead of CharFields. Converting existing apps would be simple model-only search-and-replace exercises. Folks who don't like wrapper tags around all variables in templates will be appeased. (as will those who don't want "escape=on" tags at the top of every template) And I (and my like-minded kin) who think both "breaking every template==bad" and "magic behind the scenes==worse" will not vomit at the addition.
Likewise XMLSafeField, JavascriptSafeField, MustMatchUserRegexField, etc. would be logical extensions.
The biggest downside is if you want valid HTML data stored for one output type and escaped for another. But this is not a scenario I've ever seen in the real world, and regardless is easily worked around with simply returning to CharFields for that one attribute. (and manually escaping of course)
> The biggest downside is if you want valid HTML data stored for one > output > type and escaped for another. But this is not a scenario I've ever > seen > in the real world, and regardless is easily worked around with simply > returning to CharFields for that one attribute. (and manually > escaping of > course)
> What do you think?
I'm not keen on escaping being controlled by the model - escaping should be a template-level decision as that's when you decide what format is being output (plain text email / HTML / XML / LaTeX for PDF conversion etc).
I played around with some proof of concepts over the weekend and I think I have some ideas that should keep most people happy. I'll try to write them up on the wiki this evening.
> I'm not keen on escaping being controlled by the model - escaping > should be a template-level decision as that's when you decide what > format is being output (plain text email / HTML / XML / LaTeX for PDF > conversion etc).
> I played around with some proof of concepts over the weekend and I > think I have some ideas that should keep most people happy. I'll try > to write them up on the wiki this evening.
that's why i suggest looking at this as a data validation issue. (not simply as escaping) we do lots of validation in the model already. (some argue that *all* data validation should be in the model) this would just be an additional type.
anyway, i suppose i will wait for you to elaborate on your reasoning in the wiki this evening. :)
pub...@kered.org wrote: > that's why i suggest looking at this as a data validation issue. (not > simply as escaping) we do lots of validation in the model already.
But it is an escaping issue. There's nothing wrong with allowing html to be entered in (for example) a comment field. It should be escaped in most templates, but sometimes not, for example if there was a plain-text email of comments that gets sent.
It incorporates stuff from a whole bunch of prior discussions. In my opinion the most important aspect is the use of special escapedstr and escapedunicode subclasses to mark a string as having been already escaped, meaning that the auto escaping mechanism knows if it should kick in to action or not. This should also avoid double escaping, and allow a decent level of finely grained control over the escaping mechanism.
I'd like to get a branch going to explore this stuff properly. From messing around with my own local code it seems like it should all work, but there's a bunch of work that needs to be done to make existing Django filters and templates auto escape compliant.
A very nice solution, with a good method of automatically flagging things as escaped or not; but it seems to me more complicated than is needed. And, of course there's more than just html escaping needed; URLs should be escaped differently, and other values intended to be used as attributes also need a different escape filter -- I'm not sure your proposal will allow these to be handled correctly and conveniently. So here's another idea to throw into the soup:
Having the context aware of the primary escaping needs of the output is a nice idea, but as James Bennett pointed out, the template is what should be making the decision. Suppose the template render had a "default filter" that would get applied to all otherwise unfiltered output? Obviously, the default value for this would be django.template.defaultfilters.escape -- but it could be set to another filter for JSON output, or to None for plain text. One possible mechanism for doing this would be a {% default_filter ... %} tag in the template...?
Assuming the default, then {{name}} would be the equivalent of {{name|escape}}, whereas <a href="{{myurl|urlencode}}"> would remain unchanged, and a new filter "raw" (just a pass-thru) could be used for situations like <script>{{myscript|raw}}</script>.
The main drawback I see with this is that the behaviour of {{mylist|count}} is not obviously unescaped. Perhaps having all output piped through the default filter unless it is piped through the "raw" filter (which could perhaps be handled using Michael's escaped strings)?
Couldn't we do something less invasive/complicated?
How about
{{ var }}
by default escapes the contents (in other words, the very first filter called on a variable is escape, by default) and
{{ var|raw }}
skips the call to escape?
It breaks backwards compatibility, but maybe there's a way to avoid that with a setting of some sort. (Say AUTO_ESCAPE=false in settings.py for people who don't want the change.)
> A very nice solution, with a good method of automatically flagging > things as escaped or not; but it seems to me more complicated than is > needed. And, of course there's more than just html escaping needed; > URLs should be escaped differently, and other values intended to be > used as attributes also need a different escape filter -- I'm not sure > your proposal will allow these to be handled correctly and > conveniently. So here's another idea to throw into the soup:
> Having the context aware of the primary escaping needs of the > output is > a nice idea, but as James Bennett pointed out, the template is what > should be making the decision. Suppose the template render had a > "default filter" that would get applied to all otherwise unfiltered > output? Obviously, the default value for this would be > django.template.defaultfilters.escape -- but it could be set to > another filter for JSON output, or to None for plain text. One > possible mechanism for doing this would be a {% default_filter ... %} > tag in the template...?
> Assuming the default, then {{name}} would be the equivalent of > {{name|escape}}, whereas <a href="{{myurl|urlencode}}"> would remain > unchanged, and a new filter "raw" (just a pass-thru) could be used for > situations like <script>{{myscript|raw}}</script>.
> The main drawback I see with this is that the behaviour of > {{mylist|count}} is not obviously unescaped. Perhaps having all > output > piped through the default filter unless it is piped through the "raw" > filter (which could perhaps be handled using Michael's escaped > strings)?