I read this from Jakob Vrana a few days ago ...
<http://php.vrana.cz/defense-against-xss-in-zend-framework.php>
... and it occurred to me that we could apply and automatic escaping system in Aura View, one that applies escaping when methods and properties are *accessed* (not when they are assigned). With that in mind, I have committed an "escaper" branch to Aura.View:
<https://github.com/auraphp/Aura.View/tree/escaper>
To try it out, go to your Aura.View clone and issue `git checkout escaper`.
With that branch, you don't need to issue $this->escape() on absolutely everything all the time. Instead, any time you access a variable assigned to a template, Aura View will escape it for you before handing it back. This applies to strings, array keys and values, object properties, method returns ... *and* it applies to all values returned by method returns. (It's done by using an escaper object decorator.)
I am absolutely certain there is a performance penalty for this approach; there is at least one additional method call per data access call. I think that having the automatic escaping is a nice tradeoff, given that we don't have to remember to call `$this->escape()` everywhere, *and* the lack of the `$this->escape()` calls should make the templates look cleaner.
The solution seems very thorough, but I want everyone to take a look and try to break it. Let me know what you think.
--
Paul M. Jones
http://paul-m-jones.com/
> A quick question is why the naming __raw ?
On the word "raw" itself: it's the difference between "escaped" data and "raw, unfiltered, possibly dangerous" data.
In the escaper object, the double-underscore prefix is to reduce the likelihood of conflicting with an underlying object method.
In the template object, we reuse __raw() as the name so you don't have to remember a different method name.
Does that make sense?
Following up on this after, what, six weeks?
I think making it immutable is the right thing, too. Do we want an exception proper, or would E_NOTICE be sufficient?
-- pmj
I've worked through this, and making it immutable makes it difficult/impossible to do things like:
$template->var = 'this & that';
You could still access that functionality through setData() and addData(), but then you have the same problem as described above.
Instead of making it immutable, perhaps the escaper should do two things:
1. Implement __set() as Jakub notes above, but instead of throwing ImmutableException, it works the same as offsetSet(), so that we can add properties to the object and have them escaped properly.
2. The __escape() method should set the $double_encode param on htmlspecialchars() to false. That way we can do operations on the properties (like the concatenation above) and not worry about double-escaping.
Thoughts?
-- pmj
> I know I'm late to the game on this discussion, but when you look at what is being proposed here, does it really make sense to do this given the context of the app? I mean, why should the view automatically escape data? Yes, I believe there is a facet of the view that should offer that functionality, but to make it an automatic process seems... not right.
This is a fair assessment. Honestly I only recently came to the idea that the data for output should be escaped automatically, after holding exactly your opinion for almost a decade.
(For reference, the codebase for the auto-escaping is at <https://github.com/auraphp/Aura.View/tree/escaper>.)
The short version of my change of heart is this: I had a good friend and skilled developer come to me and say that auditing Solar/Savant/Zend/PHP-based templates was very difficult, because he had to see if escape() was called on each individual output call. He had to check every case where values weren't escaped to see if it was accidental or intentional, over hundreds of template scripts, and track back through the codebase to see what the data was supposed to be. Most of the time the lack of escaping was intentional, but *occasionally* it was accidental, and it was tedious to decipher that to find potential security holes.
With that in mind, I realized that it's a real hassle to do that kind of thing. High-end enterprisey folks really do need to do audits of that sort.
Now, *most* of the time, output should be escaped. I'd say it's at least 80%, and probably closer to 98%. That is, only 2 out of 100 times you send output (or examine a data value) will you need it to be the raw unescaped data (such as user-entered HTML that has been otherwise sanitized). As such, escaping automatically removes the manual tedium for a major part of development work. You still have access to the raw values, but it becomes easy to grep the templates for calls to `__raw`. That makes it the default case that outputs are escaped, and makes it obvious that using unescaped values is intentional. Voila: audits are easier, and you have to *ask* for unescaped values. By default you are more-safe from accidental XSS vulnerabilities.
This is not to say that it's perfect; things like `count()` and `instanceof` are not going to work right in templates, because of the way the automatic escaping works. For things like that you need to ask for the raw original data, not the escaped object, but those things are very few and (I believe) infrequent or easily worked around. The automatic escaping is also guaranteed to be slower to execute than calling $this->escape() on things manually; however, that in turn is slower than calling htmlspecialchars() on everything too, so it's a not a question of absolute speed, but of whether you think the performance tradeoff is worth it.
> Honestly, what is so hard, as a programmer, about saying:
> <?php echo $this->escape($this->var) ?>
>
> as opposed to:
> <?php echo $this->var ?>
You're right; there's nothing hard about it, aside from being a bit tedious (but necessary). But the proper comparison under PHP 5.4 and the automatic escaping system is this:
<?php echo $this->escape($this->var); ?>
vs:
<?= $this->var; ?>
(The echo-tag is enabled in 5.4 along with long-form tags.) That second form looks like a win to me.
Now, we really see a difference when we compare this:
<?php echo $this->html; ?>
vs:
<?= $this->__raw()->html; ?>
Is the first one *intended* to be escaped, or was it accidentally left unescaped? Hard to tell. But in the second one, it's immediately clear what's going on: the developer wants the raw, unsafe, dangerous, be-very-careful-with-this unescaped value.
> Or am I just being a confrontational ass on this matter? :)
Not at all. These are valid and reasonable questions. Like I said, I only came to this approach after years, maybe a decade, of being a "you should manually escape in your PHP templates" guy.
Please let me know if that helps explain my thinking, even if you still don't agree. :-)
-- pmj
"You can write & in HTML as &."
Applying htmlspecialchars() with disabled $double_encode produces this:
"You can write & in HTML as &."
Which will render in browser as:
"You can write & in HTML as &."
--
Jakub Vrana
-----Original Message-----
From: Paul M. Jones [mailto:pmj...@paul-m-jones.com]
Sent: Wednesday, April 18, 2012 10:30 AM
To: aur...@googlegroups.com
Cc: Jakub Vrana
Subject: Re: [auraphp] Aura View and Auto-Escaping
On Mar 7, 2012, at 2:35 AM, Jakub Vrana wrote:
I've worked through this, and making it immutable makes it
> Disabling $double_encode is evil. I could want to say:
>
> "You can write & in HTML as &."
>
> Applying htmlspecialchars() with disabled $double_encode produces this:
>
> "You can write & in HTML as &."
>
> Which will render in browser as:
>
> "You can write & in HTML as &."
Good call Jakub.
While I'm concerned about *not* escaping things, I'm less concerned about property manipulations within the template that end up with double-escaping. So, per a modified version of the example you provided earlier, where no escaping is applied at all ...
$this->c = 'x';
$this->c .= ' > ';
$this->c .= '2';
echo $this->c; // "x > 2"
... is clearly one that has to be addressed immediately, at the very least by defining __set(). (The $this represents the idea that this is happening inside a template script.) That's a no-brainer; I've done it locally by making __set() mimic offsetSet().
However, I feel some resistance to preventing double-escaping by making the object immutable, especially after having tried it locally. It's too easy to get around, and if you prevent all the get-arounds, it becomes very difficult to use in practice.
A modified form of the example you gave earlier, where double-escaping occurs ...
$this->b = 'x';
$this->b .= ' > ';
$this->b .= '2';
echo $this->b; // "x &gt; 2"
... seems less troublesome when you realize that it's actually operating as designed. The revised version of that would be:
$this->b = 'x';
$this->__raw()->b .= ' > ';
$this->__raw()->b .= '2';
echo $this->b; // "x > 2"
...or even:
$this->b = 'x';
echo $this->b . $this->escape(' > 2');
I don't know how often other people do these kinds of manipulations within template scripts. For myself, they are quite rare, but I'm only a sample size of one.
It seems like the kind of thing we can address in the documentation quite easily. It's annoying, but it's not a security violation; I think the tradeoffs here are are reasonable.
Other thoughts?
-- pmj
--
Jakub Vrana
-----Original Message-----
From: Paul M. Jones [mailto:pmj...@paul-m-jones.com]
Sent: Wednesday, April 18, 2012 2:10 PM
To: Jakub Vrana
Cc: aur...@googlegroups.com
Subject: Re: [auraphp] Aura View and Auto-Escaping
On Apr 18, 2012, at 3:57 PM, Jakub Vrana wrote:
> Disabling $double_encode is evil. I could want to say:
>
> "You can write & in HTML as &."
I had intuited as much, but I am at a loss to come up with actual usage
where it would be unsafe.
Your example shows an unexpected result, but I think it's from writing it
incorrectly in the first place. It seems more-correct to write it as "You
can write & in HTML as &amp;." That would get translated to "You can
write & in HTML as &amp;." via htmlspecialchars, and would render in
the browser as "You can write & in HTML as &."
Are there other examples where it is either unsafe or produces something
dramatically unexpected?
-- pmj
> Your class should be able to handle plaintext. So requiring "You can write
> & in HTML as &amp;." is obviously wrong because it's not plaintext.
> Inputs to your class can come from all kinds of different sources.
I agree; I actually deleted this message and re-sent a different one. ;-)
-- pmj
> But do you still think Escaper should be with Aura View or separate package ?
For now I think it's still good to keep it with the View package. I'd prefer to make sure the View package doesn't have a dependency on anything else at this point.
-- pmj
Hi all,
I'm at the point where I'd like to merge the `escaper` branch to `master`, thus making auto-escaping the default. There will need to be one additional change in Aura.Framework, where the inner view is set into the outer view via `__raw()` (so that it doesn't get double-escaped) but aside from that it appears to have no ill effects.
Comments? Critique?
-- pmj
$template = new Template(
new EscaperFactory,
new TemplateFinder,
new HelperLocator
);
$template = new Template(
new TemplateFinder,
new HelperLocator,
new EscaperFactory
);