Re: [owasp-sanitizer] Accepting @ in email

1,529 views
Skip to first unread message

Mike Samuel

unread,
Jul 26, 2012, 10:37:53 AM7/26/12
to owasp-java-html-...@googlegroups.com
2012/7/25 <ashanair...@gmail.com>:
> I just started using Java HTML sanitizer yesterday. But one issue is, I am
> not sure how to keep the '@' intact when a user types in their emails. I am
> creating my own policy file using HTMLPolicyBuilder. However, I am not sure
> how to keep their emails intact. I see only ways to allow attributes and
> elements. But this does not fall into any category.

I'm not sure I understand your question.
Given

<a href=mailto:happy_s...@example.com>Send Me Email!!!</a>

the sanitizer produces

<a href="mailto:happy_sanitizer&#64;example.com">Send Me Email!!!</a>

Is this what is causing you confusion? If so, when is this a problem?
In all browsers that I'm aware of, these are equivalent HTML as
demonstrated by this JavaScript
var div = document.createElement();
div.innerHTML = '<a href=mailto:happy_s...@example.com>Send
Me Email!!!</a>';
alert(div.getElementsByTagName('a')[0].href);
and clicking the link will cause the mail plugin to launch with the
correct email addy.

If that's not your problem, please provide an example input and an
example of what you would like to see instead.

cheers,
mike

AG

unread,
Jul 26, 2012, 10:55:28 AM7/26/12
to owasp-java-html-...@googlegroups.com, mikes...@gmail.com
Thanks for the email Mike. Unfortunately, that is not my problem.
I have a webpage used to add new users accepting their names and email addresses as input. We have a java class that receives all these paramaters, in this case, user_email="a...@aaa.com". This value is then sanitized via Java HTML sanitizer. I adopted the EBAYPolicy example to suit our needs. When the email value a...@aaa.com is sanitized, it sends it out as aaa&364aaa, which is further validated by our program. It needs to be in the format a...@aaa.com. Is there a way allow this in the sanitizer or do I need to handle it in a different way in our program so that &#64 is valid?

thanks,

Asha

On Thursday, July 26, 2012 7:37:53 AM UTC-7, Mike Samuel wrote:
2012/7/25  <>:
> I just started using Java HTML sanitizer yesterday. But one issue is, I am
> not sure how to keep the '@' intact when a user types in their emails. I am
> creating my own policy file using HTMLPolicyBuilder. However, I am not sure
> how to keep their emails intact. I see only ways to allow attributes and
> elements. But this does not fall into any category.

I'm not sure I understand your question.
Given

    <a href=mailto:happy_sanitizer@example.com>Send Me Email!!!</a>

the sanitizer produces

    <a href="mailto:happy_sanitizer&#64;example.com">Send Me Email!!!</a>

Is this what is causing you confusion?  If so, when is this a problem?
In all browsers that I'm aware of, these are equivalent HTML as
demonstrated by this JavaScript
    var div = document.createElement();
    div.innerHTML = '<a href=mailto:happy_sanitizer@example.com>Send

Mike Samuel

unread,
Jul 26, 2012, 10:56:17 AM7/26/12
to owasp-java-html-...@googlegroups.com, mikes...@gmail.com


On Thursday, July 26, 2012 10:37:53 AM UTC-4, Mike Samuel wrote:
2012/7/25  <elided>:
> I just started using Java HTML sanitizer yesterday. But one issue is, I am
> not sure how to keep the '@' intact when a user types in their emails. I am
> creating my own policy file using HTMLPolicyBuilder. However, I am not sure
> how to keep their emails intact. I see only ways to allow attributes and
> elements. But this does not fall into any category.

I saw your other email.  If you have a situation like

    <span class="email">name@.domain.tld</span>

then the sanitizer will produce

    <span class="email">name&#64;.domain.tld</span>

The problem is that the contents of the span is HTML text, not plain text.

Just not escaping '@' won't fix the underlying problem.

    Muhammed "The Greatest" Ali <muhamm...@example.name>

is a valid email address (RFC 2822 mailbox) that contains multiple HTML special characters that need to be escaped.

When you later extract the email address, just convert it from HTML to plain text.  If you're using a DOM parser, this should happen for you.  If not, there's a function in the sanitizer that decodes HTML text nodes to plain text (see HTMLEntities.java), but it's not public.  I can probably expose that if it would help.

Mike Samuel

unread,
Jul 26, 2012, 10:57:25 AM7/26/12
to owasp-java-html-...@googlegroups.com, mikes...@gmail.com


On Thursday, July 26, 2012 10:55:28 AM UTC-4, AG wrote:
Thanks for the email Mike. Unfortunately, that is not my problem.
I have a webpage used to add new users accepting their names and email addresses as input. We have a java class that receives all these paramaters, in this case, user_email="a...@aaa.com". This value is then sanitized via Java HTML sanitizer. I adopted the EBAYPolicy example to suit our needs. When the email value a...@aaa.com is sanitized, it sends it out as aaa&364aaa, which is further validated by our program. It needs to be in the format a...@aaa.com. Is there a way allow this in the sanitizer or do I need to handle it in a different way in our program so that &#64 is valid?

Is the input HTML? 

AG

unread,
Jul 26, 2012, 11:02:07 AM7/26/12
to owasp-java-html-...@googlegroups.com, mikes...@gmail.com
Yes, it is.

- Asha

AG

unread,
Jul 26, 2012, 11:08:09 AM7/26/12
to owasp-java-html-...@googlegroups.com, mikes...@gmail.com
I think that would help. I have a way of getting the value alone, so its not html anymore. When I pass it to the sanitizer, I can extract and just pass a single string 'a...@aaa.com' to the sanitizer. I get the output in a StringBuilder object. I just need it to return 'a...@aaa.com'. Is it possible?

thanks,

Asha

AG

unread,
Jul 26, 2012, 11:12:45 AM7/26/12
to owasp-java-html-...@googlegroups.com, mikes...@gmail.com
Sorry Mike, I meant I can extract the value of the email address alone and pass it to the sanitizer function. I just dont want the sanitizer to escape '@'. I want it preserved as is.

thanks,

Asha

Mike Samuel

unread,
Jul 26, 2012, 1:20:30 PM7/26/12
to AG, owasp-java-html-...@googlegroups.com
2012/7/26 AG <ashanair...@gmail.com>:
> Sorry Mike, I meant I can extract the value of the email address alone and
> pass it to the sanitizer function. I just dont want the sanitizer to escape
> '@'. I want it preserved as is.

What is your goal in passing an email address to an HTML sanitizer?

AG

unread,
Jul 30, 2012, 2:47:24 AM7/30/12
to owasp-java-html-...@googlegroups.com, AG, mikes...@gmail.com
Hi Mike,

We do this to prevent malicious values from being processed and saved to database. So we are passing all the values received by the server to the sanitizer before being processed by our code. I am extracting the texts. However, I do want to allow values like '@' and '=' to be preserved and not be replaced by the equivalent escape characters. Currently, I am using a utility class to un-escape them once I receive the value from the sanitizer if it happens to be an email address. Is it possible to specify this in the sanitizer itself, not to escape special characters like '@' and '=' rather than me having to unescape these characters later? I know we can specify to allow it if it comes up as an attribute of a particular element, but is it possible to specify it for a value or text?

Appreciate the pointers.

thanks,

Asha

On Thursday, July 26, 2012 10:20:30 AM UTC-7, Mike Samuel wrote:
2012/7/26 AG <>:

Mike Samuel

unread,
Jul 30, 2012, 10:37:17 AM7/30/12
to AG, owasp-java-html-...@googlegroups.com
2012/7/30 AG <ashanair...@gmail.com>:
> Hi Mike,
>
> We do this to prevent malicious values from being processed and saved to
> database. So we are passing all the values received by the server to the
> sanitizer before being processed by our code. I am extracting the texts.
> However, I do want to allow values like '@' and '=' to be preserved and not
> be replaced by the equivalent escape characters. Currently, I am using a
> utility class to un-escape them once I receive the value from the sanitizer
> if it happens to be an email address. Is it possible to specify this in the
> sanitizer itself, not to escape special characters like '@' and '=' rather
> than me having to unescape these characters later? I know we can specify to
> allow it if it comes up as an attribute of a particular element, but is it
> possible to specify it for a value or text?

The HTML sanitizer outputs HTML, not plain text. If you need plain
text as a result, you need to decode entities in the output.

This escaping is not configurable. It happens in
HtmlStreamRenderer.escapeHtmlOnto which uses a lookup table
(http://code.google.com/p/owasp-java-html-sanitizer/source/browse/trunk/src/main/org/owasp/html/HtmlStreamRenderer.java#399)
to decide which characters to escape.

Line 414 escapes '@'
REPLACEMENTS['@'] = "&#" + ((int) '@') + ";"; // Conditional compilation.
because the '@' symbol is often used by browsers to enable special
extensions that should not be available to untrusted content.

AG

unread,
Jul 30, 2012, 6:41:19 PM7/30/12
to owasp-java-html-...@googlegroups.com
Thanks Mike. I will stick with the utility function approach then.

-Asha

On Wednesday, July 25, 2012 6:03:15 PM UTC-7, AG wrote:
Hello, 

I just started using Java HTML sanitizer yesterday. But one issue is, I am not sure how to keep the '@' intact when a user types in their emails. I am creating my own policy file using HTMLPolicyBuilder. However, I am not sure how to keep their emails intact. I see only ways to allow attributes and elements. But this does not fall into any category.

Any pointers are appreciated.

thanks,

Asha

AG

unread,
Aug 22, 2012, 4:42:41 PM8/22/12
to owasp-java-html-...@googlegroups.com
Hi Mike,

On a similar note, we do allow text containing '<' and '>' in our application. User can put these characters for adding description like '<place name here>'. However, the sanitizer strips it off as expected. Is there a way to do this in the definition policy itself? Currently, I am converting only those values to their escape characters before passing it to the sanitizer itself. Is it possible to achieve it in the policy itself?

thanks,

Asha

Mike Samuel

unread,
Aug 22, 2012, 4:47:55 PM8/22/12
to owasp-java-html-...@googlegroups.com
2012/8/22 AG <ashanair...@gmail.com>:
> Hi Mike,
>
> On a similar note, we do allow text containing '<' and '>' in our
> application. User can put these characters for adding description like
> '<place name here>'. However, the sanitizer strips it off as expected. Is
> there a way to do this in the definition policy itself? Currently, I am
> converting only those values to their escape characters before passing it to
> the sanitizer itself. Is it possible to achieve it in the policy itself?

This sanitizer takes HTML of unknown provenance and returns HTML that
is safe to embed in a web-page.

If your input is not HTML, then you need to convert it to HTML (by
converting '&' to "&amp;", '<' to "&lt;", etc.) before passing it
through the sanitizer.

If you want an output that is not HTML, then you need to convert the
output from HTML by doing the reverse: replacing "&amp;" with "&",
etc.

If neither your input nor your output are HTML, then you don't need an
HTML sanitizer.
Reply all
Reply to author
Forward
0 new messages