Hi Joris,
I suspect it's just how the web has developed, where the mixing of
JavaScript and imperfect HTML is normal.
I quite like this video as a demo:
https://www.youtube.com/watch?v=lG7U3fuNw3A
Where I think your point is raised when comparing the different parsing of:
1) <div><script title="</div>">
2) <script><div title="</script">
My favourite exploit is very similar...
*<script>*
user_name = "Craig*</script>*Hello";
</script>
Personally I'd like to say to the browser, similar to the old/obsolete
<plaintext> element, you won't find any JavaScript code after this point
(maybe it can block all scripts in the <body>?)... but this is only because
I load my JS files in the <head>, and attach event listeners after
DOMContentLoaded, but I know so few developers do this, so it won't be
useful to add.
I think this is the main reason Content Security Policy came into
existence, where I can skip "unsafe-inline" to block any inline JavaScript,
and limit the JavaScript files that can be included.
You can kind of get an idea of what happens with the browser parsing by
using JavaScript to load the HTML into a <template> element... but that
does raise the question on how you get the unsafe variables to the
JavaScript in the first place.
As an aside, I use <meta name="js_name" content="..." /> tags... sometimes
with JSON encoded data in the content attribute, where I'd use something
like the following to get the content:
var my_data = document.querySelector('meta[name="js_data"]');
if (my_data) {
try {
my_data = JSON.parse(my_data.getAttribute('content'));
} catch (e) {
my_data = null;
}
}
But going forwards, the HTML5 spec does cover how the browser (and
third-party libraries) should be parsing imperfect HTML, so hopefully these
differences will reduce (but I don't imagine they will all be perfectly
aligned, in the same way different browsers aren't).
Craig
>>
https://lists.mozilla.org/listinfo/dev-security
>>
>
>