XSS protection with an noscripts attribute

Yog Bii

unread,

Apr 5, 2019, 6:47:01 PM4/5/19

to Firefox Security Mailing List

XSS prevention is a very important and costly part of a Websites Security.
Because XSS is currently prevented by matching for JS in user input
and is than either blocked or masked by the Web Developer, each on his
own site,
XSS attacks find differences between the matching of the Web Developer
and the Browser, such that the Web Developer's matching doesn't
recognize JS as JS, but the Browser executes it.

This is a constant fight between the Web Developer and the XSS attacker,
that costs many resources needed somewhere else instead.
And this fight favors larger business over small Web developers.

I think that this fight can be terminated by letting the
Web Developer not guess what the Browser may think to be JS
and instead tell him explicitly that somewhere shouldn't be any code.
The Browser then behaves in that region like
he would have JS disabled.

I would do that with a new attribute, called noscripts.
Inside an HTML element with noscripts = "true",
the Browser handle anything inside that element like
JS would be disabled globally.

An example HTML would look like this:
<!doctype html>
<html>
...
<div noscripts="true">
<script>
// No danger by unescaped <script> tags
</script>
<button onclick="nor by Event listeners">Click me</button>
...
</html>

If you know a way to do this without any differences between what the
Browser executes and what ever that mechanic lets pass, let me know
and let me know why it isn't thought in every HTML/JS Tutorial and
every Documentation about Web Development.

Craig Francis

unread,

Apr 6, 2019, 6:52:19 AM4/6/19

to Yog Bii, Firefox Security Mailing List

While I quite like the simplicity of this idea, where it kind of reminds me
of the @inert attribute.

My main concern is how to bypass it, take the code:

<div noscripts="true"><?= $unsafe_user_name ?></div>

Where the attacker can set their username to `X*</div>*
<script>evil_code</script><div>`

---

Unfortunately, I think this is why we need to work with more
complicated/advanced solutions...

We need to sanitise all strings that are included in the HTML on the server
side - e.g. using templating systems; or passing the string though
something like HTML Purifier:

http://htmlpurifier.org/

Or, and you have to be careful here... escaping all HTML output though
functions like htmlentities() / htmlencode(), where this does not fix `<a
href=<?= htmlentities($unsafe_url)>` due to the url being able to start
with `javascript:`, or being able to take advantage of the missing
quotation marks on the attribute via ` onclick=evil_code`.

And when working with strings in JavaScript - you should use safe methods
like `element.textContent`, or pass them though something to sanitise the
HTML (both in removing the many ways JavaScript can be included, but also
just making sure the HTML is well formed):

https://github.com/google/closure-library/blob/master/closure/goog/html/sanitizer/htmlsanitizer.js

https://github.com/punkave/sanitize-html

Then you would ideally add a Content Security Policy to limit the scripts
on the page, just incase you miss something.

https://developer.mozilla.org/en-US/docs/Web/HTTP/CSP

And as an extra bonus, start playing with the (currently in development)
Trusted Types, to make sure you aren't using unsafe things like
element.innerHTML.

https://developers.google.com/web/updates/2019/02/trusted-types

Or for even more fun (pain), on your local development server, try setting
the header:

Content-Type: application/xhtml+xml; charset=UTF-8

Do not do this on live, as any bad formatting of your HTML will break the
page - but this ensures all of your attributes are quoted, and all of your
tags are perfectly nested (this includes `<br>` needing to be `<br />`, the
attribute `selected` needing to be `selected="selected"`, etc).

Craig

> _______________________________________________
> dev-security mailing list
> dev-se...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-security
>

joris

unread,

Apr 7, 2019, 4:01:10 AM4/7/19

to Craig Francis, Firefox Security Mailing List

I agree, that would be a vulnerability.
But I think this is not the core of my wonder.
I wonder, why do Web developers have to
guess what the Browser thinks is JS and executes
it and what isn't?
Why can't they just ask the Web Browser to do that
for them?
That would be more secure because
all third-party libraries parse somewhat differently
than all the Web Browser they are used with.

> dev-se...@lists.mozilla.org <mailto:dev-se...@lists.mozilla.org>
> https://lists.mozilla.org/listinfo/dev-security
>

Craig Francis

unread,

Apr 7, 2019, 8:02:16 AM4/7/19

to joris, Firefox Security Mailing List

Hi Joris,

I suspect it's just how the web has developed, where the mixing of
JavaScript and imperfect HTML is normal.

I quite like this video as a demo:

https://www.youtube.com/watch?v=lG7U3fuNw3A

Where I think your point is raised when comparing the different parsing of:

1) <div><script title="</div>">
2) <script><div title="</script">

My favourite exploit is very similar...

*<script>*
user_name = "Craig*</script>*Hello";
</script>

Personally I'd like to say to the browser, similar to the old/obsolete
<plaintext> element, you won't find any JavaScript code after this point
(maybe it can block all scripts in the <body>?)... but this is only because
I load my JS files in the <head>, and attach event listeners after
DOMContentLoaded, but I know so few developers do this, so it won't be
useful to add.

I think this is the main reason Content Security Policy came into
existence, where I can skip "unsafe-inline" to block any inline JavaScript,
and limit the JavaScript files that can be included.

You can kind of get an idea of what happens with the browser parsing by
using JavaScript to load the HTML into a <template> element... but that
does raise the question on how you get the unsafe variables to the
JavaScript in the first place.

As an aside, I use <meta name="js_name" content="..." /> tags... sometimes
with JSON encoded data in the content attribute, where I'd use something
like the following to get the content:

var my_data = document.querySelector('meta[name="js_data"]');
if (my_data) {
try {
my_data = JSON.parse(my_data.getAttribute('content'));
} catch (e) {
my_data = null;
}
}

But going forwards, the HTML5 spec does cover how the browser (and
third-party libraries) should be parsing imperfect HTML, so hopefully these
differences will reduce (but I don't imagine they will all be perfectly
aligned, in the same way different browsers aren't).

Craig

>> https://lists.mozilla.org/listinfo/dev-security
>>
>
>

joris

unread,

Apr 7, 2019, 10:25:07 AM4/7/19

to Craig Francis, Firefox Security Mailing List

Alright, I think we that <template> is the furthest
we get now and we got to mind the specs!
Thanks Craig.

>> <mailto:dev-se...@lists.mozilla.org>
>> https://lists.mozilla.org/listinfo/dev-security
>>
>