Value which starts with & like '&#x1' gets ignored in the string

sajid....@gmail.com

unread,

May 21, 2018, 9:04:22 AM5/21/18

to OWASP Java HTML Sanitizer Support

Hi All,

I have a query string which contain a value '&#x1' (which starts with &) gets ignored,

I'm creating policy factory like this:

public static final PolicyFacoty POLICY_FACTORY = new HtmlPolicyBuilder().toFactory();

Please do needful.

Thanks

Mike Samuel

unread,

May 21, 2018, 9:07:00 AM5/21/18

to sajid....@gmail.com, OWASP Java HTML Sanitizer Support

What is your input?

What do you get as output?

What would you expect to get as output were everything functioning properly?

http://sscce.org/

--
You received this message because you are subscribed to the Google Groups "OWASP Java HTML Sanitizer Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to owasp-java-html-sanitizer-support+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

sajid....@gmail.com

unread,

May 21, 2018, 11:36:39 AM5/21/18

to OWASP Java HTML Sanitizer Support

Input: PASSING XMLTYPE(REPLACE(cr.Form_Data,'&#x1', ''))

Output: PASSING XMLTYPE(REPLACE(cr.Form_Data,'', ''))

As we can note that string '&#x1' has been removed after sanitize.

Expectation: There should not be any change in the above input.

sajid....@gmail.com

unread,

May 23, 2018, 11:43:23 AM5/23/18

to OWASP Java HTML Sanitizer Support

Is this a bug and/or any workaround is there to fix it ?

Please do needful.

Thanks

On Monday, May 21, 2018 at 6:34:22 PM UTC+5:30, sajid....@gmail.com wrote:

Mike Samuel

unread,

May 23, 2018, 11:53:06 AM5/23/18

to OWASP Java HTML Sanitizer Support

On Wed, May 23, 2018 at 2:08 AM, <sajid....@gmail.com> wrote:

Is this a bug and/or any workaround is there to fix it ?

I can repeat your testcase but it's working as intended.

Control characters have tickled a lot of parser bugs in the past.

This project made a conscious decision to emit HTML fragments that are as close to

well-formed XML fragments as possible to make sure that user-agents parse them as intended.

Emitting control characters other than (#9, #xA, and #xD) would violate that constraint.

XML defines

and https://www.w3.org/TR/xml/#dt-charref says

CharRef ::= '&#' [0-9]+ ';'| '&#x' [0-9a-fA-F]+ ';'[WFC: Legal Character]
Well-formedness constraint: Legal Character

Characters referred to using character references must match the production for Char.

so  is not a valid XML character reference and not one that the sanitizer emits.

If this is supposed to be code in some language, maybe use that languages builtin mechanism to escape U+1.

Reply all

Reply to author

Forward