HTML Sanitizer changes to text node handling

45 views
Skip to first unread message

Mike Samuel

unread,
Dec 10, 2012, 1:01:34 PM12/10/12
to owasp-java-html-s...@googlegroups.com
Please ignore if you do not define custom OWASP HTML sanitizer element policies.

As of today, 10 Dec. 2012, the latest maven release, r133, and the
downloadable jars available from the project homepage contain an API
change and related change to the semantics of an existing method.

Change log : http://owasp-java-html-sanitizer.googlecode.com/svn/trunk/CHANGE_LOG.html

Javadoc : http://owasp-java-html-sanitizer.googlecode.com/svn/trunk/distrib/javadoc/org/owasp/html/HtmlPolicyBuilder.html#allowTextIn(java.lang.String...)


----
API Changes
----

The new methods HtmlPolicyBuilder.allowTextIn and disallowTextIn allow
control over which elements can contain text.

By default, allowing an element (via allowElements(...)) allows text
if that element can contain flow or block level elements.

This has the effect of disallowing text in <iframe>, <script>,
<style>, and similar elements by default.


----
User Visible Changes
----

This may cause existing policies to be overly-restrictive in several cases:

  * If you have a custom element policy that whitelists <script> or
<style> nodes with text content, you must not explicitly allow text in
those node types.
  * If you have a custom element policy that converts elements of one
kind into elements of another AND you do not call allowElements with
the latter tag kind, you must explicitly allowTextIn the latter kind.


----
BACKGROUND
----

Previously, a custom sanitizer policy that allowed <iframe> elements
would allow iframe elements to contain text.  For example, in

    <iframe><script>alert(1337)</script></iframe>

is an <iframe> element in HTML that contains a single text node whose
text looks like a <script> element.  This does not cause script
execution in browsers loading the document as HTML, so is not a
vulnerability.

This policy could interact badly though when combined with
tag-strippers or when the document is loaded as XHTML, so to reduce
the number of ways a custom element policy can fail I decided to break
backwards compatibility in a way that should hopefully affect few if
any existing policies and should fail stricter.

Please report any problems on the issue tracker (
http://code.google.com/p/owasp-java-html-sanitizer/issues/list ).
Discuss at https://groups.google.com/forum/?fromgroups#!forum/owasp-java-html-sanitizer-support


----
PRIOR DISCUSSION
----

https://groups.google.com/forum/?fromgroups=#!topic/owasp-java-html-sanitizer-support/BImfwVN9wWs

2012/11/20 Mike Samuel <...>:
> Revision 132 changes the policy builder to elide <iframe> content by
> default.  More generally, it adds to the policy builder methods to
> allow/disallow text content in various elements, and changed
> allowElements so that text content is allowed in any allowed elements
> that can contain flow or block content or human readable raw text
> content (for title & textarea).
>
> This might change the behavior of some custom policies so I'm not
> going to push a release to maven just before the start of a major US
> holiday without a good reason, so you can patch and test if you like,
> but it won't replace the top maven version until there's less chance
> of causing regressions while people are on vacation.
Reply all
Reply to author
Forward
0 new messages