When the list of open div tags is very big sanitizer is removing other tags which actually has content.

252 views
Skip to first unread message

Rasmita Mahapatra

unread,
Jun 22, 2020, 6:51:49 AM6/22/20
to OWASP Java HTML Sanitizer Support
I have a HTML which is surrounded by a big list of empty <div> tags, sanitizer is removing a portion of a HTML. When removed the empty list of <div> tags from the HTML the sanitizer did not strip off the portion of the HTML.
Please find the HTML where the issue is seen. Please suggest what can be done for this issue.
<div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div dir="rtl"><div class="gmail_quote"><div class="gmail_attr" style="text-align:right" dir="ltr"><b><span lang="HE" style="font-family:&quot;David&quot;,&quot;sans-serif&quot;;font-size:14pt">,חבר יקר</span></b></div><div dir="rtl"><div dir="rtl"><div class="gmail_quote" dir="rtl"><div class="gmail_quote" dir="rtl"><div class="gmail_quote" dir="rtl"><div class="gmail_quote" dir="rtl"><div class="gmail_quote" dir="rtl"><div class="gmail_quote" dir="rtl"><div class="gmail_quote" dir="rtl"><div class="gmail_quote" dir="rtl"><div class="gmail_quote" dir="rtl"><div class="gmail_quote" dir="rtl"><div class="gmail_quote" dir="rtl"><div class="gmail_quote" dir="rtl"><div class="gmail_quote" dir="rtl"><div lang="EN-US">Rasmita01<div><div><div><div>Rasmita02<div><div><div><div><div>Rasmita03<div>Rasmita04<div><div><div><div>Rasmita05<div><div><div><div>Rasmita06<div>Rasmita07<div>Rasmita08<div>Rasmita09<div>Rasmita010<div><div><div><div><div><div>Rasmita011<div><div>Rasmita012<div><div><div>Rasmita00<div><div><div><div><div><div><div><div><div><div><div><div><div><div><div><div><div><div><div><div><div><div><div><div>Rasmita000<div>Rasmita0000<div><div><div>Rasmita1 \r\n </div><div></div>Testing Sanitizer is removing divs when the number of open tags are very high<div></div><div><p class="MsoNormal" style="text-align:right;unicode-bidi:embed;direction:rtl" dir="RTL"><span lang="HE"> <u></u><u></u></span></p></div><div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div lang="EN-US"></div><div lang="EN-US"></div><div lang="EN-US"></div></div><div class="gmail_quote" dir="rtl"></div></div><div class="gmail_quote" dir="rtl"></div></div><div class="gmail_quote" dir="rtl"></div></div><div class="gmail_quote" dir="rtl"></div></div><div class="gmail_quote" dir="rtl"></div></div><div class="gmail_quote" dir="rtl"></div></div><div class="gmail_quote" dir="rtl"></div></div><div class="gmail_quote" dir="rtl"></div></div><div class="gmail_quote" dir="rtl"></div></div><div class="gmail_quote" dir="rtl"></div></div><div class="gmail_quote" dir="rtl"></div></div><div class="gmail_quote" dir="rtl"></div></div><div class="gmail_quote"></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div> 

Post sanitization

<div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div class="gmail_quote"><div dir="rtl"><div dir="rtl"><div class="gmail_quote"><div class="gmail_attr" style="text-align:right" dir="ltr"><b><span lang="HE" style="font-family:&#39;david&#39; , &#39;sans-serif&#39;;font-size:14pt">,חבר יקר</span></b></div><div dir="rtl"><div dir="rtl"><div class="gmail_quote" dir="rtl"><div class="gmail_quote" dir="rtl"><div class="gmail_quote" dir="rtl"><div class="gmail_quote" dir="rtl"><div class="gmail_quote" dir="rtl"><div class="gmail_quote" dir="rtl"><div class="gmail_quote" dir="rtl"><div class="gmail_quote" dir="rtl"><div class="gmail_quote" dir="rtl"><div class="gmail_quote" dir="rtl"><div class="gmail_quote" dir="rtl"><div class="gmail_quote" dir="rtl"><div class="gmail_quote" dir="rtl"><div lang="EN-US">Rasmita01<div><div><div><div>Rasmita02<div><div><div><div><div>Rasmita03<div>Rasmita04<div><div><div><div>Rasmita05<div><div><div><div>Rasmita06<div><div><div><div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div lang="EN-US"></div><div lang="EN-US"></div><div lang="EN-US"></div></div><div class="gmail_quote" dir="rtl"></div></div><div class="gmail_quote" dir="rtl"></div></div><div class="gmail_quote" dir="rtl"></div></div><div class="gmail_quote" dir="rtl"></div></div><div class="gmail_quote" dir="rtl"></div></div><div class="gmail_quote" dir="rtl"></div></div><div class="gmail_quote" dir="rtl"></div></div><div class="gmail_quote" dir="rtl"></div></div><div class="gmail_quote" dir="rtl"></div></div><div class="gmail_quote" dir="rtl"></div></div><div class="gmail_quote" dir="rtl"></div></div><div class="gmail_quote" dir="rtl"></div></div><div class="gmail_quote"></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div> 
Post sanitization:
After Rasmita06 all other divs are removed.

What I did next is removed the unwanted divs from the top and also removed the closing divs and see the results below.

<div lang="EN-US">Rasmita01<div><div><div><div>Rasmita02<div><div><div><div><div>Rasmita03<div>Rasmita04<div><div><div><div>Rasmita05<div><div><div><div>Rasmita06<div>Rasmita07<div>Rasmita08<div>Rasmita09<div>Rasmita010<div><div><div><div><div><div>Rasmita011<div><div>Rasmita012<div><div><div>Rasmita00<div><div><div><div><div><div><div><div><div><div><div><div><div><div><div><div><div><div><div><div><div><div><div><div>Rasmita000<div>Rasmita0000<div><div><div>Rasmita1 \r\n </div><div></div>Testing Sanitizer is removing divs when the number of open tags are very high<div></div><div><p class="MsoNormal" style="text-align:right;unicode-bidi:embed;direction:rtl" dir="RTL"><span lang="HE"> <u></u><u></u></span></p></div><div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div> 

post Sanitization:
<div lang="EN-US">Rasmita01<div><div><div><div>Rasmita02<div><div><div><div><div>Rasmita03<div>Rasmita04<div><div><div><div>Rasmita05<div><div><div><div>Rasmita06<div>Rasmita07<div>Rasmita08<div>Rasmita09<div>Rasmita010<div><div><div><div><div><div>Rasmita011<div><div>Rasmita012<div><div><div>Rasmita00<div><div><div><div><div><div><div><div><div><div><div><div><div><div><div><div><div><div><div><div><div><div><div><div>Rasmita000<div>Rasmita0000<div><div><div>Rasmita1 \r\n </div><div></div>Testing Sanitizer is removing divs when the number of open tags are very high<div></div><div><p class="MsoNormal" style="text-align:right;unicode-bidi:embed;direction:rtl" dir="RTL"><span lang="HE"> <u></u><u></u></span></p></div><div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div><div></div><div></div><div></div></div> 
See here post Rasmita06 all other divs are not removed.

Is this a bug or or some issue in sanitizer or in the HTML which is causing the issue.


양봉수

unread,
Jun 23, 2020, 9:25:53 PM6/23/20
to OWASP Java HTML Sanitizer Support
Hello I am sanitizer user. 
in HtmlSanitizer, we set nesting limit to 257

```java
private static HtmlStreamEventReceiver initializePolicy(
      Policy policy, HtmlStreamEventProcessor preprocessor) {
    TagBalancingHtmlStreamEventReceiver balancer
        = new TagBalancingHtmlStreamEventReceiver(policy);


balancer.setNestingLimit(256);
```

when text is Rasmita06, openElements size is 255. so after that text is not added.

```java
public void text(String text) {
...

if (openElements.size() < nestingLimit) {
      underlying.text(text);
    }
}
```
스크린샷 2020-06-23 오후 5.18.13.png


2020년 6월 22일 월요일 오후 7시 51분 49초 UTC+9, Rasmita Mahapatra 님의 말:

Jacob Shields

unread,
Jun 23, 2020, 9:26:07 PM6/23/20
to OWASP Java HTML Sanitizer Support
Hi Rasmita,

I was able to recreate what you are seeing.

It appears to be due to a maximum tag depth of 256 that is hardcoded in HtmlSanitizer.java. If I modify HtmlSanitizer locally to increase the maximum tag depth, all of your divs appear to be preserved.

However I don't see any way to modify this behavior by consumers without reflection (or forking the repo).

Maybe Mike can provide some additional insight. I'm curious why the limit in the HTML sanitizer is 256 when the colocated comment mentions that Webkit supports tag depths up to 512. Why not support 512 in the HTML sanitizer as well?

Cheers,

Jacob

Rasmita Mahapatra

unread,
Jun 24, 2020, 12:57:13 AM6/24/20
to OWASP Java HTML Sanitizer Support
Hi Jacob,

Is there a workaround for this issue?

Thanks
Rasmita

Jim Manico

unread,
Jun 24, 2020, 7:14:59 AM6/24/20
to owasp-java-html-...@googlegroups.com, Jacob Shields

> 256 when the colocated comment mentions that Webkit supports tag depths up to 512

This is server-side performance tuning. But I do not think it would hurt the library to bump it to 512 or make it a configurable argument or similar.  Let's give Mike a chance to weigh in first.

Aloha, Jim

--
You received this message because you are subscribed to the Google Groups "OWASP Java HTML Sanitizer Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to owasp-java-html-saniti...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/owasp-java-html-sanitizer-support/e29b7b74-d53e-40ba-b305-f23971e81cceo%40googlegroups.com.

Rasmita Mahapatra

unread,
Jun 25, 2020, 7:33:24 AM6/25/20
to OWASP Java HTML Sanitizer Support

Please let me know how to file a bug for this issue.

Thanks
Rasmita
To unsubscribe from this group and stop receiving emails from it, send an email to owasp-java-html-sanitizer-support+unsubscribe@googlegroups.com.

Jim Manico

unread,
Jun 25, 2020, 7:44:29 AM6/25/20
to owasp-java-html-...@googlegroups.com, Rasmita Mahapatra
To unsubscribe from this group and stop receiving emails from it, send an email to owasp-java-html-saniti...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/owasp-java-html-sanitizer-support/f4088795-c0f4-4b14-aea3-5f4a32f96427o%40googlegroups.com.
-- 
Jim Manico
Manicode Security
https://www.manicode.com

Rasmita Mahapatra

unread,
Jun 29, 2020, 12:42:42 AM6/29/20
to OWASP Java HTML Sanitizer Support
Hi Jim,

I have created https://github.com/OWASP/java-html-sanitizer/issues/205 bug please look into this as its Sev2 as  its data loss and Customer is making lot of noise on this. Currently we are using owasp-java-html-sanitizer-20171016.1 version of the jar.

Thanks
Rasmita

Rasmita Mahapatra

unread,
Jun 30, 2020, 2:57:05 AM6/30/20
to OWASP Java HTML Sanitizer Support

Our Customer has raised this issue as sev1 issue can you please fix this issue and release a patch, I will take it asap.

Thanks
Rasmita

Rasmita Mahapatra

unread,
Jun 30, 2020, 10:06:01 AM6/30/20
to OWASP Java HTML Sanitizer Support
Hi Jim,
I have already raised an bug for this issue, our customer has raised this as Sev1 issue, please let me know when this issue will be fixed.
If you see some delay in the fix then please suggest any alternate for this. We generally take the jar and use in our code, we don't maintain the code.
If required can I take the code and make the changes  for depth and release a patch. Do you see any issue in that.

Request to Mike also to please look into this on priority.
How to make this issue https://github.com/OWASP/java-html-sanitizer/issues/205 as sev1 issue please let me know.

Thanks
Rasmita


On Wednesday, June 24, 2020 at 4:44:59 PM UTC+5:30, Jim Manico wrote:
To unsubscribe from this group and stop receiving emails from it, send an email to owasp-java-html-sanitizer-support+unsubscribe@googlegroups.com.

Jim Manico

unread,
Jun 30, 2020, 10:38:57 AM6/30/20
to owasp-java-html-...@googlegroups.com, Rasmita Mahapatra

Rasmita,

You can certainly build a version on your own that addresses this issue.

I'm sorry this has caused you stress. Mike is a very kind volunteer who has built and maintained this project and has spent countless hours working on it - but we cannot promise a fast fix.

Regards,

- Jim

To unsubscribe from this group and stop receiving emails from it, send an email to owasp-java-html-saniti...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/owasp-java-html-sanitizer-support/f4fb6f4f-0883-49be-8888-6cc6f526ee40o%40googlegroups.com.

Jim Manico

unread,
Jun 30, 2020, 12:10:34 PM6/30/20
to owasp-java-html-...@googlegroups.com, Rasmita Mahapatra

Rasmita,

May I ask; can you send us a sample of the HTML that is failing and attach that to the bug? I cannot see your HTML in the submitted issue.

Thank you,

- Jim

To unsubscribe from this group and stop receiving emails from it, send an email to owasp-java-html-saniti...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/owasp-java-html-sanitizer-support/4ea2430d-5fa7-4f53-aa96-0cdd5293988do%40googlegroups.com.

Rasmita Mahapatra

unread,
Jul 1, 2020, 12:09:56 AM7/1/20
to OWASP Java HTML Sanitizer Support
Hi Jim,

I have updated the bug with the sample html where the issue is seen. 

Thanks
Rasmita

Rasmita Mahapatra

unread,
Jul 1, 2020, 12:14:30 AM7/1/20
to OWASP Java HTML Sanitizer Support
Hi Jim,

Can you please point me to the correct repository location from where I can clone the code, in case I have to work on the fix.

Thanks
Rasmita

Jim Manico

unread,
Jul 1, 2020, 12:34:01 AM7/1/20
to owasp-java-html-...@googlegroups.com
https://github.com/OWASP/java-html-sanitizer

--
Jim Manico
@Manicode


On Jun 30, 2020, at 6:14 PM, Rasmita Mahapatra <rasm...@gmail.com> wrote:


To unsubscribe from this group and stop receiving emails from it, send an email to owasp-java-html-saniti...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/owasp-java-html-sanitizer-support/4d97ff0c-1401-4d12-acc4-9a2f11b8bfc3o%40googlegroups.com.

Mike Samuel

unread,
Jul 3, 2020, 11:06:00 AM7/3/20
to OWASP Java HTML Sanitizer Support


On Mon, Jun 29, 2020 at 12:42 AM Rasmita Mahapatra <rasm...@gmail.com> wrote:
 as  its data loss

The HTML sanitizer makes no guarantees about data loss.  A sanitizer removes things by design, which is at odds with preserving things. This project does not try to thread that needle.  If you need to avoid data loss, store unsanitized content in your database and sanitize it on select.  We suggest sanitizing on receipt from the database anyway since updating to a new version of the sanitizer automatically affects all sanitized values, and you won't have to manually go through the database and sanitize values that were sanitized using a buggy version of the sanitizer.


Rasmita Mahapatra

unread,
Jul 4, 2020, 12:35:20 PM7/4/20
to OWASP Java HTML Sanitizer Support
Hi Mike,
I agree sanitizer removes things by design and it should remove the content which is vulnerable. Here its removing content which is not vulnerable.

Moreover its very clear from the discussions that because of the tag depth limitation this bug is happening. Our application is a web application and we don't sanitizer the content and store. we display the data post sanitization on the fly. I request you to fix this bug as its critical for us and the sanitizer is removing valid data moreover there is no way to change the tag depth.

Thanks
Rasmita

Mike Samuel

unread,
Jul 7, 2020, 12:44:22 PM7/7/20
to OWASP Java HTML Sanitizer Support
On Sat, Jul 4, 2020 at 12:35 PM Rasmita Mahapatra <rasm...@gmail.com> wrote:
Hi Mike,
I agree sanitizer removes things by design and it should remove the content which is vulnerable. Here its removing content which is not vulnerable.

It biases towards removing content where something is not clearly safe.  How did you determine that deeply nested divs are clearly safe?


Moreover its very clear from the discussions that because of the tag depth limitation this bug is happening. Our application is a web application and we don't sanitizer the content and store. we display the data post sanitization on the fly. I request you to fix this bug as its critical for us and the sanitizer is removing valid data moreover there is no way to change the tag depth.

There is a way to change the tag depth limit as noted at https://github.com/OWASP/java-html-sanitizer/issues/205#issuecomment-653586510

 
Thanks
Rasmita

On Friday, July 3, 2020 at 8:36:00 PM UTC+5:30, Mike Samuel wrote:


On Mon, Jun 29, 2020 at 12:42 AM Rasmita Mahapatra <rasm...@gmail.com> wrote:
 as  its data loss

The HTML sanitizer makes no guarantees about data loss.  A sanitizer removes things by design, which is at odds with preserving things. This project does not try to thread that needle.  If you need to avoid data loss, store unsanitized content in your database and sanitize it on select.  We suggest sanitizing on receipt from the database anyway since updating to a new version of the sanitizer automatically affects all sanitized values, and you won't have to manually go through the database and sanitize values that were sanitized using a buggy version of the sanitizer.


--
You received this message because you are subscribed to the Google Groups "OWASP Java HTML Sanitizer Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to owasp-java-html-saniti...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/owasp-java-html-sanitizer-support/7b6b60e8-ee39-4380-ad36-ec1d0f2cad3fo%40googlegroups.com.

Jim Manico

unread,
Jul 25, 2020, 2:54:07 PM7/25/20
to owasp-java-html-...@googlegroups.com, Mike Samuel

Rasmita,

Mike is a volunteer and it's important that you treat him with respect and I request that you do not make demands of him. My request is that you ask politely and make suggestions but avoid making demands.

Your use case is very unique. I recommend you pre-process your input and remove the hundreds of <div> tags before you sanitize.

- Jim

--
You received this message because you are subscribed to the Google Groups "OWASP Java HTML Sanitizer Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to owasp-java-html-saniti...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/owasp-java-html-sanitizer-support/CACod6GstFyozXQozY62ZdirgD8WVdtVH5mYGTtHMBeiket31Sg%40mail.gmail.com.

Rasmita Mahapatra

unread,
Jul 30, 2020, 3:00:29 AM7/30/20
to OWASP Java HTML Sanitizer Support
Hi Jim,
I made a request to the team to fix this issue in the upcoming release or provide a configuration option for this, if it sounded like a demand then my apologies. I request the OWASP team to consider this issue in the next patch.
Thanks
Rasmita


On Sunday, July 26, 2020 at 12:24:07 AM UTC+5:30, Jim Manico wrote:

Rasmita,

Mike is a volunteer and it's important that you treat him with respect and I request that you do not make demands of him. My request is that you ask politely and make suggestions but avoid making demands.

Your use case is very unique. I recommend you pre-process your input and remove the hundreds of <div> tags before you sanitize.

- Jim

On 7/3/20 11:05 AM, Mike Samuel wrote:


On Mon, Jun 29, 2020 at 12:42 AM Rasmita Mahapatra <rasm...@gmail.com> wrote:
 as  its data loss

The HTML sanitizer makes no guarantees about data loss.  A sanitizer removes things by design, which is at odds with preserving things. This project does not try to thread that needle.  If you need to avoid data loss, store unsanitized content in your database and sanitize it on select.  We suggest sanitizing on receipt from the database anyway since updating to a new version of the sanitizer automatically affects all sanitized values, and you won't have to manually go through the database and sanitize values that were sanitized using a buggy version of the sanitizer.


--
You received this message because you are subscribed to the Google Groups "OWASP Java HTML Sanitizer Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to owasp-java-html-sanitizer-support+unsubscribe@googlegroups.com.

Jim Manico

unread,
Jul 30, 2020, 11:34:43 AM7/30/20
to owasp-java-html-...@googlegroups.com
My suggestion is that you remove the hundreds of div’s before you sanitize. This is not a high priority for us since it adds a potential weakness to the library.

Mike may have a different opinion and we’ll wait to see what he says.

Respectfully,
--
Jim Manico
@Manicode

On Jul 30, 2020, at 3:00 AM, Rasmita Mahapatra <rasm...@gmail.com> wrote:


To unsubscribe from this group and stop receiving emails from it, send an email to owasp-java-html-saniti...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/owasp-java-html-sanitizer-support/44d7701b-be88-436b-b370-7f2e7131a6fdo%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages