On 17 Feb 2018 19:44:26 GMT, Stefan Ram wrote:
> When I write
>
> <p>2&3<4</p>
>
> into the HTML5 source code, a recent browser will display this
> paragraph as
>
> 2&3<4
>
> . I would like to have some examples, where "escaping" is still
> needed. I found that
>
> <p>2&3<a</p>
>
> ("a" instead of "4") will need the "<" to be escaped, i.e.,
>
> <p>2&3<a</p>
>
> . Can you show me other situations where escaping is required?
I think why a "<" character would be displayed as is, is how the browser
interprets HTML codes. For examples:
<0
<#
<"
<@
They will be displayed as is because a HTML tag name can not start with a
number or any other invalid character. IIRC, a HTML tag name can only start
with character "A"/"a" to "Z"/"z".
Below code however, will not be displayed as is:
<!
<?
Because the "!" and "?" characters are special characters which are used for
HTML comment, document type, and processing (or prolog). There may be other
special character(s) used for HTML tag, but I only know these two. You might
want to check the HTML specifications if you want to find out.
Same thing goes for the "&" character. It will be displayed as is if that
character and its following character doesn't form a valid HTML entity
syntax which is can be (no quotes):
- "&A;". Where "A" is a valid HTML entity name. e.g. "quot", "amp", etc.
- "&#N;". Where "N" is a decimal number for a Unicode character code.
- "&#xH;". Where "H" is a hexadecimal number for a Unicode character code.
While you can use this as an exploit, there's a possibility that in the
future, a strict HTML parsing will be enforced - where web browsers will
reject a HTML code if it has an invalid syntax. Similar like how browsers
reject JavaScript code which contains invalid syntax.