Issue 8 in owasp-java-html-sanitizer: child elements are moved out of their parents

46 views
Skip to first unread message

owasp-java-h...@googlecode.com

unread,
Feb 1, 2013, 7:10:49 AM2/1/13
to owasp-java-html-...@googlegroups.com
Status: New
Owner: ----
Labels: Type-Defect Priority-Medium

New issue 8 by matzep...@gmail.com: child elements are moved out of their
parents
http://code.google.com/p/owasp-java-html-sanitizer/issues/detail?id=8

> What steps will reproduce the problem?
Execute the attached testcase

> What is the expected output? What do you see instead?
When sanitizing, the sanitizer moves inner elements out of it's parent
under certain circumstances (see examples in testcase).

I don't want the sanitizer to change the markup but to remove all contents
that are not allowed.

> What version of the product are you using? On what operating system?
r135 / linux


Attachments:
OwaspSanitizerBugs.java 400 bytes

owasp-java-h...@googlecode.com

unread,
Feb 1, 2013, 7:15:20 AM2/1/13
to owasp-java-html-...@googlegroups.com

Comment #1 on issue 8 by matzep...@gmail.com: child elements are moved out
updated testcase

Attachments:
OwaspSanitizerBugs.java 407 bytes

owasp-java-h...@googlecode.com

unread,
Feb 2, 2013, 1:19:43 AM2/2/13
to owasp-java-html-...@googlegroups.com
Updates:
Status: Invalid

Comment #2 on issue 8 by mikes...@gmail.com: child elements are moved out
Entering into http://html5.validator.nu/ the first example
<p>123<p>abcdefg</p>456</p>
gives
Error: No p element in scope but a p end tag seen.
From line 1, column 24; to line 1, column 27
efg</p>456</p>↩

because the </p> at the end doesn't close a tag. The second <p> closes the
first <p> per HTML5 parsing rules.
http://www.whatwg.org/specs/web-apps/current-work/multipage/tree-construction.html#parsing-main-inbody
says

"""
A start tag whose tag name is one
of: "address", "article", "aside", "blockquote", "center", "details", "dialog", "dir", "div", "dl", "fieldset", "figcaption", "figure", "footer", "header", "hgroup", "main", "menu", "nav", "ol", "p", "section", "summary", "ul"
If the stack of open elements has a p element in button scope, then act as
if an end tag with the tag name "p" had been seen.

Insert an HTML element for the token.
"""

which means that when a <p> is seen inside a <p>, an implicit </p> is seen,
so
<p>123<p>abcdefg</p>456</p>
is equivalent to
<p>123</p><p>abcdefg</p>456

which is what the HTML sanitizer produces.

By understanding browser tag nesting rules, the sanitizer avoids a lot of
ambiguity in HTML, and can produce output that will be consistently and
safely interpreted by a variety of browsers.

----

Sanitizers.BLOCKS.sanitize("<div><meta/><p>abcdefg</p></div>")

should not produce

"<div><meta/><p>abcdefg</p></div>"

since <meta> is not a block tag, and is not even allowed in the body.

----

Marking this bug invalid. Please reopen if you feel this was in error.

owasp-java-h...@googlecode.com

unread,
Feb 4, 2013, 7:30:57 AM2/4/13
to owasp-java-html-...@googlegroups.com

Comment #3 on issue 8 by matzep...@gmail.com: child elements are moved out
The paragraph handling has just been added to illustrate the sanitizers
behaviour.

However, the meta-tag is a real problem for us as thunderbird generates
markup like "<blockquote><meta></blockquote>" all the time and we have to
display this for our users correctly. However, this becomes hard because
the sanitizer modifies the markup during the removal of the meta-tag. I
just want the sanitizer to remove the meta-tag which is currently not
possible.

Please reopen as I'm not allowed to...

Kind regards
Matthias

owasp-java-h...@googlecode.com

unread,
Feb 5, 2013, 4:42:13 PM2/5/13
to owasp-java-html-...@googlegroups.com
Updates:
Status: New

Comment #4 on issue 8 by mikes...@gmail.com: child elements are moved out
Reopened.

Is the problem that you're doing something like

PolicyFactory policy = new HtmlPolicyBuilder()
.allowCommonBlockElements()
.allowElements("meta")
.toFactory();
String htmlSnippet = "<blockquote><meta></blockquote>";
String sanitized = policy.sanitize(htmlSnippet);
System.out.println(sanitized);

and you get

<blockquote></blockquote></body><meta />

?

owasp-java-h...@googlecode.com

unread,
Jul 24, 2013, 12:00:40 PM7/24/13
to owasp-java-html-...@googlegroups.com
Updates:
Status: WontFix
Owner: mikes...@gmail.com

Comment #5 on issue 8 by mikes...@gmail.com: child elements are moved out
Closing for lack of response. Re the attached test case:

> assertEquals("<p>123<p>abcdefg</p>456</p>",
>
> Sanitizers.BLOCKS.sanitize("<p>123<p>abcdefg</p>456</p>"));

the test golden is invalid. <p> tags do not nest in HTML.

> assertEquals("<div><meta/><p>abcdefg</p></div>",
>
> Sanitizers.BLOCKS.sanitize("<div><meta/><p>abcdefg</p></div>"));

is also invalid since <p> tags cannot be direct children of <div> elements.
You can white-list <meta> elements if you like using a custom policy, but
<meta> is not a block element so should be Sanitizers.BLOCKS.


--
You received this message because this project is configured to send all
issue notifications to this address.
You may adjust your notification preferences at:
https://code.google.com/hosting/settings

owasp-java-h...@googlecode.com

unread,
Sep 29, 2014, 3:08:31 AM9/29/14
to owasp-java-html-...@googlegroups.com

Comment #6 on issue 8 by a.chichi...@semrush.com: child elements are moved
out of their parents
https://code.google.com/p/owasp-java-html-sanitizer/issues/detail?id=8

Hello everyone!

We have a similar behaviour in this case:
assertEquals("<h1>TEXT</h1>",
Sanitizers.BLOCKS.sanitize("<H1><center>TEXT</H1>"));

For this one the result is:
<h1></h1>TEXT
instead of:
<h1>TEXT</h1>

But test case:
assertEquals("<h1>TEXT</h1>",
Sanitizers.BLOCKS.sanitize("<H1></center>TEXT</H1>"));

works as expected:
<h1>TEXT</h1>

What's wrong with the first one?

I would appreciate your feedback to this case.

owasp-java-h...@googlecode.com

unread,
Oct 1, 2014, 8:47:11 AM10/1/14
to owasp-java-html-...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages