Element.innerHtml Sanitization

1,428 views
Skip to first unread message

Pete Blois

unread,
Aug 14, 2013, 6:07:10 PM8/14/13
to web...@dartlang.org, mi...@dartlang.org

Coming soon-


In order to decrease the vulnerability of Dart applications to cross-site scripting attacks Dart will start sanitizing Element.innerHtml and related use cases by default. The basic API changes are that everywhere a string is parsed into DOM nodes we’re adding default sanitization as well as the ability to customize the sanitization rules.


Existing APIs such as new Element.html() will have optional arguments for specifying the sanitization rules, if not specified then the default sanitization is applied.


Affected APIs:

  • Element.innerHtml (use Element.setInnerHtml to override default behavior)

  • new Element.html()

  • new DocumentFragment.html()

  • new DocumentFragment.svg()

  • new SvgElement.svg()


The primary changes under the default sanitization rules, all of which can be customized:

  • Elements or attributes which are not whitelisted will be automatically stripped.

  • Only same-origin URIs will be allowed by default.

  • Inline style attributes will not be allowed- we currently have no scheme for sanitizing the contents of inline styles.

  • Custom element tags and tag extensions need to be explicitly enabled.

  • SVG elements will not be allowed in HTML fragments by default.

  • HTML elements will not be allowed in SVG fragments by default.

  • Script tags and script event attributes will not be allowed.


In addition, the Element.html() constructor previously tried to guess the correct parsing context for the HTML fragment but no longer will. This was used primarily for building up <table> contents.  Use Element.createFragment to parse an HTML fragment within a specific context.


Please keep in mind that mitigating XSS vulnerabilities is more than preventing arbitrary script execution, it’s also ensuring that users cannot inject unexpected content into the page. The default sanitization aims to mitigate scripting vulnerabilities, but if your application is consuming untrusted HTML strongly consider using a validation scheme which only covers the syntax you expect people to be using (for example the new NodeValidationBuilder.allowTextElements API).

Pete Blois

unread,
Aug 28, 2013, 3:36:30 PM8/28/13
to web...@dartlang.org, mi...@dartlang.org
This is now in. Some additional notes:
  • There's a temporary unsafeInnerHtml setter which can be used while code is being migrated
  • Common gotcha's are:
    • Inline styles aren't supported by default
    • Custom elements are not supported by default (use NodeValidatorBuilder to allow them)
  • There's a console warning for when items are being sanitized away
Please speak up with any migration issues or questions you may encounter!

Pete Blois

unread,
Aug 30, 2013, 11:53:51 AM8/30/13
to web...@dartlang.org, mi...@dartlang.org
Updating title to make it clear that this is a (fairly impactful) breaking change on it's way.

Additionally, if you are dealing with 'known safe' HTML and need to bypass sanitization for performance reasons then a 'null sanitizer' can be used. An example of this is at: https://code.google.com/p/dart/source/browse/branches/bleeding_edge/dart/pkg/mdv/test/custom_element_bindings_test.dart#280

Ideally customization around what is and is not allowed is done via the NodeValidatorBuilder interface and custom NodeValidators- http://api.dartlang.org/docs/bleeding_edge/dart_html/NodeValidator.html

Tom Yeh

unread,
Sep 13, 2013, 11:10:28 AM9/13/13
to mi...@dartlang.org, web...@dartlang.org
Sanitizing is great, but, IMO, it shall not be the default behavior of Element.innerHtml/html/... It shall be as thin as possible. First, it won't surprise users in unpleasant way (especially if they master JavaScript or copy code from JS projects). Second, the code size will be smaller. After all, not everyone needs to sanitize the HTML fragment. Many, if not most, applications the fragment are predictable and not from end users -- end user's input actually needs to be encoded more restrictively.

It will be great if sanitization is only a set of utilities. Developers have the right to choose the right tool. It shall not be the purpose of Element API. Just my two cents.

Nik Graf

unread,
Sep 13, 2013, 12:26:17 PM9/13/13
to mi...@dartlang.org, mi...@dartlang.org, web...@dartlang.org
I'm totally with you that this change might surprise developers coming from JS. Which is a bad thing. Same for increased code size. Still I believe this change is a huge step forward in Dart. To me security matters a lot and although there are some disadvantages    I believe this change is very beneficial for everyone building web applications in Dart. It's too easy to miss out on escaping user input.

Sent from my iPhone
--
For other discussions, see https://groups.google.com/a/dartlang.org/
 
For HOWTO questions, visit http://stackoverflow.com/tags/dart
 
To file a bug report or feature request, go to http://www.dartbug.com/new

To unsubscribe from this group and stop receiving emails from it, send an email to misc+uns...@dartlang.org.

Alex Tatumizer

unread,
Sep 13, 2013, 5:13:31 PM9/13/13
to mi...@dartlang.org, web...@dartlang.org
> To me security matters a lot and although there are some disadvantages    I believe this change is very beneficial for everyone building web applications in > Dart. It's too easy to miss out on escaping user input.

The same can be said about the idea of wearing a helmet at all times. Who knows what can fall on your head from the skies.
Maybe deep see diving suit won't hurt either. :-)

 

Florian Loitsch

unread,
Sep 13, 2013, 7:43:39 PM9/13/13
to General Dart Discussion, web...@dartlang.org
It seems pretty simple: depending on the default a tiny mistake can either:
- show funny characters, or
- open an XSS attack.
And it's not even as if XSS attacks were something extremely rare. It's basically the buffer-overflows of the web.

 

--
For other discussions, see https://groups.google.com/a/dartlang.org/
 
For HOWTO questions, visit http://stackoverflow.com/tags/dart
 
To file a bug report or feature request, go to http://www.dartbug.com/new

To unsubscribe from this group and stop receiving emails from it, send an email to misc+uns...@dartlang.org.



--
Give a man a fire and he's warm for the whole day,
but set fire to him and he's warm for the rest of his life. - Terry Pratchett

Joao Pedrosa

unread,
Sep 13, 2013, 8:11:43 PM9/13/13
to mi...@dartlang.org
Hi,

It's a trade-off. I find it a little contrived. We're told that creating a custom Validator is a tough job. And creating a custom TreeSanitizer could work. But there's also the problem of skipping in-line style by default, not sure what to do about that. Dealing with this on per innerHtml call basis is quite tough.

In JavaScript folks are trained to at most call a escaping function. Which they probably do not even do. Because they might escape content server-side or something else.

I think the sanitizer may be quite clever by reusing methods provided by the browser itself for traversing nodes. I don't know all about it though.

It's as though they have tried to make skipping the sanitizer a harder thing to do, in order to encourage folks to give it a try and to use it more.

Right now though Dart users are being presented with all kinds of new APIs and have to deal with defaults that may be hard to grasp at the beginning. Eventually with more documentation and understanding by a greater community we could grow used to it all.

Many of us have seen a lot of the evolution from the very early days up until now. What frightens me is how new users have been dealing with these differences. I can tell by the feedback being given by the Angular folks that the surprises are too many already. :-)

There's a lot of "unlearning" being requested from previous JavaScript users. Luckily, Dart has tried to appease JavaScript users to some extent.

Cheers,
Joao

Tom Yeh

unread,
Sep 13, 2013, 10:46:05 PM9/13/13
to mi...@dartlang.org, web...@dartlang.org
Element.innerHtml is not the only API that HTML injection might take place. Unless you construct UI 100% in Dart, you still need to encode it properly. For instance, it is common to generate the content of a HTML page on the server. Furthermore, in additions to the default sanitizing, you still need to encode the user's input further to avoid, say, ruining the display (you don't want a user comment becoming a floating div).

With this default sanitizing, developers will either wonder what are actually sanitized, or assume everything is fine. Both are not good in term of security.

The security is better to be protected by good disciplines and testing tools. They are both doable and are done for years. Protecting by a mysterious API, IMO, is not a good idea, especially when it only reduces the possibility rather than a total cure.

IMO, a simple and intuitive API is the key.

Simon Pai

unread,
Sep 14, 2013, 1:56:54 AM9/14/13
to mi...@dartlang.org
I do recognize and appreciate the value of sanitization, but I also agree with this. In addition, I don't think the API surprise comes from JS culture -- even given anyone who doesn't have programming background, he would expect new Element.html() create exactly what he passed in.

It's a bit exaggerated, but this is what I experienced:
1. Figured out there are missing DOM attributes upon dart SDK update, found warning messages.
2. Googled the messages and figured out sanitization had kicked in.
3. Ask myself why the default rules don't like data attributes?
4. Googled but found no obvious documentation about sanitization.
5. Found a way to nullify ALL rules with NullTreeSanitizer.
6. Since I didn't know how to do that correctly, just stayed with it and marked a FIXME before there is a clear documentation about sanitization.

I guess the best way to work around user surprise is to enhance documentation to the maximal extents, including:
* What's sanitization & why
* APIs in effect
* What are the default rules & why
* How to build your own Sanitizer
* How to migrate old code

Also if a link to that document is provided in dartdoc (of all affected API) and warning messages, it would be helpful.

Cheers,
Simon

Alex Tatumizer

unread,
Sep 14, 2013, 7:23:38 PM9/14/13
to mi...@dartlang.org
I don't understand how this idea can work even in principle.
Sanitizer has no clue what came from user's input and what was written by programmer.
Interesting property: program becomes non-debuggable and non-testable.
Because you construct your string (to be passed to innerHtml) programmatically, it's affected by lots of conditions and parameters and variables.
In one run, it will be ok, in another - it will fail, and you never know in advance how it will work out. After a couple of new bug reports related to unforeseen combinations of data, you will have to drop your innerHtml manipulations and replace by something else, otherwise you will be "fixing" same line of code repeatedly. And how to you report the problem to user?
"Your program contains potentially dangerous material"? You scare away users most likely for no reason. 
Under these circumstances it's better not touch innerHtml with 6-foot pole to begin with. 

The problem is not specific to HTML -  it's exactly the same as SQL injection problem. 
There's it's solved with prepared statement. Whatever values you substitute for ?s in prepared statement don't even require sanitization, because they will never be interpreted as SQL code, by design.
Any sane method of HTML sanitization should be similar IMO. To support it, dart could introduce lightweight templating or something (e.g https://code.google.com/p/dart/issues/detail?id=9831 - many different variants are possible). I'm afraid the kind of sanitization you have now will just scare people away. 

John Messerly

unread,
Sep 16, 2013, 2:12:35 PM9/16/13
to General Dart Discussion
On Sat, Sep 14, 2013 at 4:23 PM, Alex Tatumizer <tatu...@gmail.com> wrote:
I don't understand how this idea can work even in principle.
Sanitizer has no clue what came from user's input and what was written by programmer.

Yeah, that's exactly the problem with the innerHtml setter. It has no idea where the HTML came from, which is what makes it unsafe.

For those hitting sanitization problems -- could you give additional info about what the default sanitizer is removing that you would like preserved? Maybe weak can tweak the default rules.

Alex Tatumizer

unread,
Sep 16, 2013, 3:36:16 PM9/16/13
to mi...@dartlang.org
> Yeah, that's exactly the problem with the innerHtml setter
John.
I'm afraid there's a kind of non sequitur in your response.
If you agree with a problem, the natural question would be: how to design the "builder" of innerhtml so that system can know what is what there. In a context of SQL, PreparedStatement is a trivial example of a builder. In case of HTML, situation is a bit more complicated, but still...  Problem clearly calls for more creative solution.




John Messerly

unread,
Sep 16, 2013, 3:49:52 PM9/16/13
to General Dart Discussion
Let me try and explain it better :)

innerHtml setter has no context, so it must assume the worst. It sanitizes the HTML according to what is safe even if attacker-controlled HTML is in there.

You can provide a custom NodeValidator or NodeTreeSanitizer. If you know something about your HTML structure, you may be able to apply a different set of validation rules.

You can also ignore innerHtml entirely and build HTML in other ways. For example, this can be easily done via the normal DOM constructors and setters (which are even easier to use with cascades). Another approach is to use data-binding.

IMHO, there are enough good alternatives for using HTML in Dart, that it is okay if innerHtml is secure-by-default.

Justin Fagnani

unread,
Sep 16, 2013, 4:14:36 PM9/16/13
to General Dart Discussion
And I assume that one could build a GWT SafeHtml-like abstraction on top of sanitization as well.

Alex Tatumizer

unread,
Sep 16, 2013, 4:21:39 PM9/16/13
to mi...@dartlang.org
> You can also ignore innerHtml entirely and build HTML in other ways. For example, this can be easily done via the > normal DOM constructors and setters (which are even easier to use with cascades). Another approach is to use data-binding.

I agree with that. The whole story can be reformulated simply as:
innerHtml: DO NOT USE THIS METHOD
:-)

John Messerly

unread,
Sep 16, 2013, 4:28:54 PM9/16/13
to General Dart Discussion
On Mon, Sep 16, 2013 at 1:21 PM, Alex Tatumizer <tatu...@gmail.com> wrote:
> You can also ignore innerHtml entirely and build HTML in other ways. For example, this can be easily done via the > normal DOM constructors and setters (which are even easier to use with cascades). Another approach is to use data-binding.

I agree with that. The whole story can be reformulated simply as:
innerHtml: DO NOT USE THIS METHOD
:-)

:-). yeah, personally I agree, and generally avoid innerHtml.

However, it is fine to use innerHtml, and it should be safe from XSS after Pete's change. Depending on what kind of HTML you're building, it might work just fine for you. The default validator allows a bunch of stuff:

Tom Yeh

unread,
Sep 17, 2013, 12:29:22 AM9/17/13
to mi...@dartlang.org
Why not call it innerSafeHtml and leave innerHtml without sanitizing?

There are other methods but innerHtml is the simplest (and well-known) way.

Eric Snellman

unread,
Sep 17, 2013, 7:03:14 AM9/17/13
to mi...@dartlang.org
Can unsafeInnerHtml be preserve for performance reasons?

--

Tom Yeh

unread,
Sep 17, 2013, 9:10:15 PM9/17/13
to mi...@dartlang.org
With sanitizing, the following statement is not always true!! The getter and setter are not symmetric!

node.innerHtml == (node.innerHtml = node.innerHtml)

John Messerly

unread,
Sep 17, 2013, 9:38:36 PM9/17/13
to General Dart Discussion
On Tue, Sep 17, 2013 at 6:10 PM, Tom Yeh <tom....@gmail.com> wrote:
With sanitizing, the following statement is not always true!! The getter and setter are not symmetric!

node.innerHtml == (node.innerHtml = node.innerHtml)

It is a bit odd, but was the case even before this change. DOM serialization prints the tree according to serialization spec, not necessarily the same text as was originally parsed. You can see this by firing up your favorite JavaScript console:

> var d = document.createElement('div')
undefined
> d.innerHTML = "<div id=foobar>"
"<div id=foobar>"
> d.innerHTML
"<div id="foobar"></div>"


Alex Tatumizer

unread,
Sep 17, 2013, 11:11:09 PM9/17/13
to mi...@dartlang.org
This effect is not specific to innerHtml. 
 var list=new Uint8List(1);
  list[0]=256;
  print(list[0]==256); // false
  print(list[0]==0);   // true


Paul Spychala

unread,
Oct 31, 2013, 11:31:14 AM10/31/13
to mi...@dartlang.org
Its gone :(

I've just locked our sdk version to 28990 for our project. We build a lot of widgets (so to speak) with innerhtml and have many attributes that get scrubbed even though we know they are safe. Building a custom sanitizer to disable a 'custom' default behavior seems yuck.

John Messerly

unread,
Oct 31, 2013, 2:58:25 PM10/31/13
to General Dart Discussion
On Thu, Oct 31, 2013 at 8:31 AM, Paul Spychala <tomatoa...@gmail.com> wrote:
Its gone :(

I've just locked our sdk version to 28990 for our project. We build a lot of widgets (so to speak) with innerhtml and have many attributes that get scrubbed even though we know they are safe. Building a custom sanitizer to disable a 'custom' default behavior seems yuck.

Hi Paul, you can still disable sanitizer for your project:

in some shared file:

class NullTreeSanitizer implements NodeTreeSanitizer {
  const NullTreeSanitizer();
  void sanitizeTree(Node node) {}
}
final allHtml = const NullTreeSanitizer();


everywhere else:

// elem.unsafeInnerHtml = html; becomes:
elem.setInnerHtml(html, allHtml);

... not too bad, eh?

On Tuesday, September 17, 2013 7:03:14 AM UTC-4, Eric Snellman wrote:
Can unsafeInnerHtml be preserve for performance reasons?

yeah, I'm worried about this too.
Message has been deleted
Message has been deleted

George Moschovitis

unread,
Jan 6, 2014, 4:14:46 AM1/6/14
to mi...@dartlang.org

Hi Paul, you can still disable sanitizer for your project:

in some shared file:

class NullTreeSanitizer implements NodeTreeSanitizer {
  const NullTreeSanitizer();
  void sanitizeTree(Node node) {}
}
final allHtml = const NullTreeSanitizer();


everywhere else:

// elem.unsafeInnerHtml = html; becomes:
elem.setInnerHtml(html, allHtml);

... not too bad, eh?


The API  seems to have changed. What's the different between a Validator and a Sanitizer?
Do I need to define a NullValidator also?
Can you update the above example?

-g. 
Reply all
Reply to author
Forward
0 new messages