Value which starts with & like '&#x1' gets ignored in the string

58 views
Skip to first unread message

sajid....@gmail.com

unread,
May 21, 2018, 9:04:22 AM5/21/18
to OWASP Java HTML Sanitizer Support
Hi All,

I have a query string which contain a value '&#x1' (which starts with &) gets ignored, 
I'm creating policy factory like this: 
   public static final PolicyFacoty POLICY_FACTORY = new HtmlPolicyBuilder().toFactory();


    
Please do needful.


Thanks

Mike Samuel

unread,
May 21, 2018, 9:07:00 AM5/21/18
to sajid....@gmail.com, OWASP Java HTML Sanitizer Support
What is your input?
What do you get as output?
What would you expect to get as output were everything functioning properly?

--
You received this message because you are subscribed to the Google Groups "OWASP Java HTML Sanitizer Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to owasp-java-html-sanitizer-support+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

sajid....@gmail.com

unread,
May 21, 2018, 11:36:39 AM5/21/18
to OWASP Java HTML Sanitizer Support
Input:  PASSING XMLTYPE(REPLACE(cr.Form_Data,'&#x1', ''))      
Output: PASSING XMLTYPE(REPLACE(cr.Form_Data,'', ''))
As we can note that string '&#x1' has been removed after sanitize.

Expectation:  There should not be any change in the above input.

sajid....@gmail.com

unread,
May 23, 2018, 11:43:23 AM5/23/18
to OWASP Java HTML Sanitizer Support
Is this a bug and/or any workaround is there to fix it ?

Please do needful.
Thanks


On Monday, May 21, 2018 at 6:34:22 PM UTC+5:30, sajid....@gmail.com wrote:

Mike Samuel

unread,
May 23, 2018, 11:53:06 AM5/23/18
to OWASP Java HTML Sanitizer Support
On Wed, May 23, 2018 at 2:08 AM, <sajid....@gmail.com> wrote:
Is this a bug and/or any workaround is there to fix it ?

I can repeat your testcase but it's working as intended. 

Control characters have tickled a lot of parser bugs in the past.

This project made a conscious decision to emit HTML fragments that are as close to
well-formed XML fragments as possible to make sure that user-agents parse them as intended.

Emitting control characters other than (#9, #xA, and #xD) would violate that constraint.

XML defines

    Char   ::=   #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]


    CharRef   ::=   '&#' [0-9]+ ';'| '&#x' [0-9a-fA-F]+ ';'[WFC: Legal Character]
    Well-formedness constraint: Legal Character
    Characters referred to using character references must match the production for Char.

so &#x1; is not a valid XML character reference and not one that the sanitizer emits.

If this is supposed to be code in some language, maybe use that languages builtin mechanism to escape U+1.

Reply all
Reply to author
Forward
0 new messages