Issue 15 in owasp-java-html-sanitizer: Single and double quotes are being transformed

166 views
Skip to first unread message

owasp-java-h...@googlecode.com

unread,
Jun 24, 2013, 12:55:31 PM6/24/13
to owasp-java-html-...@googlegroups.com
Status: New
Owner: ----
Labels: Type-Defect Priority-Medium

New issue 15 by jcmathe...@gmail.com: Single and double quotes are being
transformed
http://code.google.com/p/owasp-java-html-sanitizer/issues/detail?id=15

I had hijacked another issue and was asked to create a new one :) After
writing several tests, it's simpler than I though

What steps will reproduce the problem?
1. Pass an input string with a ' or " in it
2. Comes back escaped as ' or "

What is the expected output? What do you see instead?
I expect my input to come back with the ' or " in it.

What version of the product are you using? On what operating system?
Using version r164 on Mac mountain lion

Please provide any additional information below.
The code is quite basic:

HtmlPolicyBuilder builder = new HtmlPolicyBuilder();
PolicyFactory factory = builder.toFactory();
String sanitized = factory.sanitize(input);
return sanitized;





--
You received this message because this project is configured to send all
issue notifications to this address.
You may adjust your notification preferences at:
https://code.google.com/hosting/settings

owasp-java-h...@googlecode.com

unread,
Jun 24, 2013, 1:03:53 PM6/24/13
to owasp-java-html-...@googlegroups.com

Comment #1 on issue 15 by mikes...@gmail.com: Single and double quotes
It is probably unnecessary to escape these characters in HTML text nodes,
though it is necessary in attribute bodies.

How is this causing problems though?

owasp-java-h...@googlecode.com

unread,
Jun 24, 2013, 2:14:45 PM6/24/13
to owasp-java-html-...@googlegroups.com

Comment #2 on issue 15 by jcmathe...@gmail.com: Single and double quotes
It's causing problems because user-entered text is being returned to them
in a format different than they entered. I've put in a hack to unescape the
sanitized text which will fix this, but I'd like to not have to do that as
this might have unforeseen consequences.

owasp-java-h...@googlecode.com

unread,
Jun 24, 2013, 2:43:52 PM6/24/13
to owasp-java-html-...@googlegroups.com

Comment #3 on issue 15 by mikes...@gmail.com: Single and double quotes
jcmather21, Without more context, I really can't help. Can you give an
example of user-entered text and explain why a different, but semantically
equivalent form is problematic?

If your hack involves replacing `"` with `"` and you white-list the "b"
element and an attribute like "title" then you might have problems with
inputs like

<b title='foo " onmouseover="alert(1337)'>Foo</b>

being sanitized to

<b title="foo &#34; onmouseover=&#34;alert(1337)">Foo</b>

and then you might hack that back to

<b title="foo " onmouseover="alert(1337)">Foo</b>

which executes script when the user's mouse passes over the text.

owasp-java-h...@googlecode.com

unread,
Jun 24, 2013, 3:03:10 PM6/24/13
to owasp-java-html-...@googlegroups.com

Comment #4 on issue 15 by jcmathe...@gmail.com: Single and double quotes
Yes sorry. I'm thinking outside the scope of html in the input string. For
example, a sentence like this:
He said, "This is the best place ever!"

I want that to be returned exactly like that, but it comes back as:
He said, &#34;This is the best place ever!&#34;

For my use case, I am taking user-entered description text and I want all
html, script, css removed (we only support plaintext now). But I want
regular text, left as is. So in your above example:
<b title='foo " onmouseover="alert(1337)'>Foo</b>
I want back
Foo

owasp-java-h...@googlecode.com

unread,
Jun 24, 2013, 3:09:53 PM6/24/13
to owasp-java-html-...@googlegroups.com

Comment #5 on issue 15 by mikes...@gmail.com: Single and double quotes
The HTML sanitizer takes messy-unsafe HTML and gives back well-formed-safe
HTML.

It sounds like you want to get back plain text -- the innerText or
textContent -- without any tags at all.

If so, that's doable using the HTML sanitizer, but not using a method that
is advertised as returning HTML.

If that's what you need, let me know and I can knock up some example code.

owasp-java-h...@googlecode.com

unread,
Jun 24, 2013, 3:14:23 PM6/24/13
to owasp-java-html-...@googlegroups.com

Comment #6 on issue 15 by jcmathe...@gmail.com: Single and double quotes
Yes that does sound like what I need so example code would be awesome on
how to accomplish this!

owasp-java-h...@googlecode.com

unread,
Jun 24, 2013, 3:43:12 PM6/24/13
to owasp-java-html-...@googlegroups.com
Updates:
Status: Fixed

Comment #7 on issue 15 by mikes...@gmail.com: Single and double quotes
If you have a policy builder called myPolicyBuilder, and a string of HTML
called myHtml then

StringBuilder sb = new StringBuilder();
HtmlSanitizer.policy = myPolicyBuilder.build(new
HtmlStreamEventReceiver() {
public void openDocument() {}
public void closeDocument() {}
public void openTag(String elementName, List<String> attribs) {
if ("br".equals(elementName)) { sb.append('\n'); }
}
public void closeTag(String elementName) {}
public void text(String text) { sb.append(text); }
});
HtmlSanitizer.sanitize(myHtml, policy);
// sb should now contain the plain text content of the page with <br>
replaced by newlines.

owasp-java-h...@googlecode.com

unread,
Jun 24, 2013, 3:44:42 PM6/24/13
to owasp-java-html-...@googlegroups.com

Comment #8 on issue 15 by mikes...@gmail.com: Single and double quotes
Sorry. There were typos. Try

final StringBuilder sb = new StringBuilder();
HtmlSanitizer.Policy policy = myPolicyBuilder.build(new

owasp-java-h...@googlecode.com

unread,
Jun 25, 2013, 11:49:11 AM6/25/13
to owasp-java-html-...@googlegroups.com

Comment #9 on issue 15 by jcmathe...@gmail.com: Single and double quotes
I haven't yet gotten that to work for all of my use cases but thank you.

owasp-java-h...@googlecode.com

unread,
Jul 5, 2014, 3:35:13 AM7/5/14
to owasp-java-html-...@googlegroups.com

Comment #10 on issue 15 by rajkumar...@gmail.com: Single and double quotes
A solution is to use StringEscapeUtils.unescapeHtml() from apache.commons
before storing in database.

owasp-java-h...@googlecode.com

unread,
Sep 10, 2014, 5:59:05 PM9/10/14
to owasp-java-html-...@googlegroups.com

Comment #11 on issue 15 by cantara...@gmail.com: Single and double quotes
are being transformed
https://code.google.com/p/owasp-java-html-sanitizer/issues/detail?id=15

StringEscapeUtils.unescapeHtml() is not a solution. A clever hacker can
then use &lt;script&gt; to inject XSS.

owasp-java-h...@googlecode.com

unread,
Sep 11, 2014, 6:29:37 PM9/11/14
to owasp-java-html-...@googlegroups.com

Comment #12 on issue 15 by mikes...@gmail.com: Single and double quotes
Cantata, please see comments 5&6. The string is to be used as plain text,
not HTML.
Reply all
Reply to author
Forward
0 new messages