On Thu, Jun 2, 2016 at 8:21 AM, Salman Khan <
sakha...@gmail.com> wrote:
> URLs gets encoded when passed through OWASP HTML Sanitizer. A sample policy
> (Ebay policy) was used.
> Input String <a href="mailto:
x...@company.com" > Email </a> results in <a
> href="mailto:
xxx@
company.com"> Email </a>
How is this causing problems? This seems like a reasonable output for
that input.
> In case there is space infront between quotes of href it is not accepted.
> For eg
> Input String <a href = "
http://www.google.com "> GOOGLE </a> reuts in <a>
> GOOGLE </a>
This looks like a bug.
https://www.w3.org/TR/html5/links.html#attr-hyperlink-href says
"""The href attribute on a and area elements must have a value that is
a valid URL potentially surrounded by spaces."""
https://www.w3.org/TR/html5/infrastructure.html#strip-leading-and-trailing-whitespace
explains what space means
"""When a user agent is to strip leading and trailing whitespace from
a string, the user agent must remove all space characters that are at
the start or end of the string."""
and
https://www.w3.org/TR/html5/infrastructure.html#space-character says
"""The space characters, for the purposes of this specification, are
U+0020 SPACE, "tab" (U+0009), "LF" (U+000A), "FF" (U+000C), and "CR"
(U+000D)."""
so I can try to get the URL sanitizing code to strip leading and
trailing space first.