For over a decade, the browser security model has been evolving in
spurts between periods of deathly stillness. The rules have stabilized
for same-origin sandboxing, at least for common DOM APIs. But a great
deal of the DOM was never standardized, including the whole of "Level
0" including window objects. XHR's infamous same-origin restriction
gives people fits and drives them to take chances with script src='s
hazards.
In general, security mechanism and policy were never defined or agreed
upon using an open standardization process, so browser implementors
have had to reverse-engineer, read open source, and chase after XSS
bugs to achieve both interoperation and safety -- which may be at odds.
Moreover, for Mozilla with chrome and content windows, and XUL apps and
extensions built using chrome, we have a non-standard model. It's more
powerful and (of course) more vulnerable. It was unsound without
XPCNativeWrappers, from the get-go. Even with wrappers it seems
unsound to me.
Then there is GreaseMonkey, which also mixes trust labels within the
same sandbox, a risky proposition. We clearly need more degrees of
sandboxing than "runs as you" and "runs as a crippled mid-90s web app's
window-bound JS".
A lot of great research work has been done over the years, especially
in the last ten years, yielding results such as information flow type
systems and data-tainting compiler/virtual-machine combos that can
uphold important security properties such as confidentiality (no send
on socket after secret read from filesystem, e.g.), while allowing
useful browser-based computation and user interaction including
sanitization.
We are poised to benefit from this work. But if we keep wasting time
patching an unsound system, we will die the death of a thousand cuts.
We need to specify the system and then move our code to match the spec.
So I thought to write down some kind of semi-formal set of definitions
and rules, from which inductive or other proofs could be done.
This is hard, and it wants to turn into some kind of operational
semantics. The first rough cut is at
http://wiki.mozilla.org/Security:Strawman_Model. Comments and
questions welcome. I have had fruitful exchanges with bz over IRC, cut
short by our other work. It would be great to recap them here and
build on them.
/be
Agree.
> So I thought to write down some kind of semi-formal set of definitions
> and rules, from which inductive or other proofs could be done.
>
> This is hard, and it wants to turn into some kind of operational
> semantics. The first rough cut is at
> http://wiki.mozilla.org/Security:Strawman_Model. Comments and
> questions welcome.
I realize this is a rough cut, but should the XHTML subset include CSS
code? I suspect it should, since CSS can contain URIs.
<http://www.feedparser.org/docs/html-sanitization.html#advanced.sanitization.why>
> We clearly need more degrees of sandboxing than "runs as you" and
> "runs as a crippled mid-90s web app's window-bound JS".
I might be jumping ahead to the fun stuff, but I think can see a useful
degree above and below crippled mid-90s webapps.
1.) Content served with mutual authentication (where the client also
checks the server's creds) should execute with higher privileges.
2.) We should create a container element that invokes sanitization code.
Let's call it <livejournal-comment>.
<livejournal-comment>
do whatever in here, the browser will elide the dangerous stuff
as the content sink receives it...
</livejournal-comment>
> A lot of great research work has been done over the years...
bibliography?
thanks,
Rob
Brendan and I have had some disagreement on this issue. ;) In my opinion, the
examples in that document should throw a security exception from the
window.location getter in a reasonably designed UA.
> 2.) We should create a container element that invokes sanitization code.
> Let's call it <livejournal-comment>.
I think any model that assumes that the only way we get content is by parsing it
is more or less doomed to failure. Unless <livejournal-comment> has the same
effect when someone clones nodes they got via XMLHttpRequest and then inserts
them as kids of it?
-Boris
I'm leaving that out for now. Also, as alluded to by bz, we have
anti-sanitization constraints on URIs in CSS that are "just so", which
could be relaxed in a data-tainting future architecture. But I'm not
ready to add these to the model. We should keep them in mind as
important restrictions to enforce in our code, and perhaps to remove in
future models.
> 1.) Content served with mutual authentication (where the client also
> checks the server's creds) should execute with higher privileges.
Right! Policy should uphold certain properties, but let's suppose that
when you have a non-server-signed cert authenticating the server for an
established, stepped-up SSL connection, and you have signed with help
from the password manager, the site should be allowed to do certain
things we don't let origins do by default.
An example would be popping up system notification messages (system
tray "toast" on Windows). This is something web apps need to be
competitive with desktop apps.
Other examples could include freedom from resource quotas of various
kinds imposed on random web JS.
More thoughts welcome, here or (better, eventually) in the wiki.
> 2.) We should create a container element that invokes sanitization code.
> Let's call it <livejournal-comment>.
>
> <livejournal-comment>
> do whatever in here, the browser will elide the dangerous stuff
> as the content sink receives it...
> </livejournal-comment>
This has been suggested, I believe there is a bug on file. Can someone
find and cite it?
> > A lot of great research work has been done over the years...
>
> bibliography?
http://wiki.mozilla.org/index.php?title=Security:Bibliography is under
construction.
/be
> So I thought to write down some kind of semi-formal set of definitions
> and rules, from which inductive or other proofs could be done.
>
> This is hard, and it wants to turn into some kind of operational
> semantics. The first rough cut is at
> http://wiki.mozilla.org/Security:Strawman_Model.
I had a look at this page but i have a hard time understanding the
model. would you mind defining some of the concepts in your formal
syntax? For example -- what is a Request and who do you expect to
be making such Requests? Could you state the English description
of each security property you're trying to enforce, next to the
formal rules that are intended to express that property? What is
the purpose of the XHTML-subset language you've defined?
I'm sorry that i lack the previous context of your design discussion,
but i hope these clarifications will be useful to others as well as
myself.
Thanks!
-- ?!ng
Ka-Ping Yee wrote:
> I had a look at this page but i have a hard time understanding the
> model.
Sorry, it was more of a brain-dump. Lacking the time to do it right
before imposing it on readers, I went with "release early and often."
> would you mind defining some of the concepts in your formal
> syntax? For example -- what is a Request and who do you expect to
> be making such Requests?
Principals make Requests of Objects. The model is basically Lampson's
[http://portal.acm.org/citation.cfm?doid=121133.121160,
http://research.microsoft.com/~lampson/09-protection/WebPage.html].
In the wacky browser world, ignoring signed scripts, principals are
"codebase principals" representing the origin (URL scheme://hostpart,
not including path but including any optional port number after a : in
hostpart) of the script making a request.
Objects are mappings from string keys to arbitrary values -- the (key,
value) pair, or sometimes just the value, is called a property.
A principal may get, set or call a property (only callable properties
can be successfully called, but since the system is dynamically typed
at present, you have to get and test, or just attempt the call, to find
out what's callable).
A script downloaded via <script src="http://evil.org/bar.js"/> from an
including web page at http://good.com/foo.html is given http://good.com
as its origin.
A script with origin "maps.good.com" can change its origin to
"good.com" or even "com" by assigning the new origin string to
document.domain. This allows other scripts from, e.g. "ads.good.com"
to change their origin to the same domain name and join principals.
> Could you state the English description
> of each security property you're trying to enforce, next to the
> formal rules that are intended to express that property?
I will try, when I have more time; ideally by tomorrow.
> What is
> the purpose of the XHTML-subset language you've defined?
To help specify the principal for pseudo (about:, data:, and
javascript:) URLs loaded via static markup source. And to specify
event handler principals. Also to specify script principals, in a
future iteration that adds src=.
> I'm sorry that i lack the previous context of your design discussion,
> but i hope these clarifications will be useful to others as well as
> myself.
No need to apologize -- the unspecified world of browser security is
nothing but context: bug histories, patch histories, ideas implemented
9 years ago based on Java security models that died, etc.
Turning all of this context, too much of it in peoples' heads, into a
spec, is the task I claim we face in order to avoid wandering in the
wilderness of Hack-and-patch land. Help, including question/answer
exchange a la the Socratic method at "web speed", is more than welcome.
;-)
/be
* load/parse: when does parse return --- when the document is parsed,
when it is finished loading, or when the document is closed? If the
first, it is not clear to me why the stack is popped, or why the stack
is not needed for operations while the document is open.
* canAccess/mapMeet(stack): I don't understand what windows are on the
stack. If load pops the window when done loading, there normally
wouldn't be more than one window on stack. If load doesn't pop window
until document is closed, then stack would grow when either a link
accessed or a subsidiary dialog/window is opened. But in that case it
is not clear why access should be restricted by all the prior windows on
the stack. Even in the case where an application opens a dialog, the
dialog may have additional trust that the parent window does not have
(e.g., after entering password), so it seems like it should not
necessarily be restricted by parent.
Maybe rather than windows there is some notion of a library or component
that would be on stack. Maybe a javascript file would be such a
library. Then the stack would be extended with the principle of the
library whenever a call was made, and popped when that call returns.
That way the permission of a library does not normally exceed the
permission of its callers, though there needs to be a way to override
this within libraries that access particular resources. (This is more
like the current Java security architecture [1].)
* minor: it's a little confusing that ',' is used for both parameter
tuple separation and sequence separation, but ';' is used for the latter
in javascript. [At first I thought sequences were multiple value
tuples. Then I thought comma works like Javascript, but javascript
handling of comma is a little obscure to someone who hasn't used it:
var x = 2; x = ++x, 5*x; x
---> returns first value 3, not 15, though
var x = 2; x = (++x, 5*x); x
--> returns last value, 15.] So value of load(w,s) would be value of
first expression, stack.push(w), not last expression, stack.pop().
Maybe parentheses are in order around definition of load, even though in
this particular case they should return the same value, w, in either
case (if push and pop are like in javascript arrays).
* minor: need to specify what matches scheme and hostname, maybe with a
regex [some urls have embedded urls, so scheme should not be (.*)]
* minor: in grammar, missing quotes on '<', '>', '</', and '/>'
[1]
http://java.sun.com/j2se/1.5.0/docs/guide/security/spec/security-spec.doc4.html#24646
Thanks -- I will address minor comments by fixing them in a forthcoming
revision on the wiki.
> * load/parse: when does parse return --- when the document is parsed,
> when it is finished loading, or when the document is closed? If the
> first, it is not clear to me why the stack is popped, or why the stack
> is not needed for operations while the document is open.
This model is simplified to use synchronous iframe sub-document
fetching. So if the outer document contains an iframe whose src is
fetched and parsed, the stack needs to be pushed again before the
iframe content is parsed.
> * canAccess/mapMeet(stack): I don't understand what windows are on the
> stack. If load pops the window when done loading, there normally
> wouldn't be more than one window on stack.
In the model, script (inline only) tag contents are eval'ed using the
top of stack global, as the document containing the scripts is parsed.
You raise a good point. Today we parse an iframe src="http://bar.com"
included in a document loaded from "http://foo.com" without bounding
principals between the two origins. The stack is needed to handle
nested iframes, but the rule for canAccess would use only the top
window's principal, not mapMeet. This seems like a bug in the spec.
The model stack really is meant to compress the JS control stack, where
an arbitrary number of functions loaded in one window may nest, but
when control flows into a function from another trust domain (typically
cross-site by allAccess, e.g. location.href's setter or document.open
being called), the model stack must grow by one entry, the global that
owns the callee function (and its principal).
So there are two different access checking modes based on the stack:
loading, where only the top principal is used as subject principal;
executing (either from an inline script, or from a button onclick
handler, or from an a href="javascript:..." link click). I will work to
separate the two modes. Our code today uses only the top principal in
both modes, but we believe this is unsound.
> If load doesn't pop window
> until document is closed, then stack would grow when either a link
> accessed or a subsidiary dialog/window is opened.
Stack is popped at end of load() function.
> Then I thought comma works like Javascript, but javascript
> handling of comma is a little obscure to someone who hasn't used it:
> var x = 2; x = ++x, 5*x; x
> ---> returns first value 3, not 15, though
> var x = 2; x = (++x, 5*x); x
> --> returns last value, 15.]
JS inherited C's comma operator, with lowest precedence and
left-to-right evaluation. No tuples. I will work on
standardizing/clarifying the spec language.
/be
Unsound in the second (execution after load completes) mode, I mean.
/be
We could disallow DOM modification by non-chrome, but allow innerHTML.
That would go back through the parser. Seems like everyone uses
innerHTML anyway. I can't think of any non-religious objections to this
approach, other than click tracking. But the ping attribute should take
care of that.
-Rob
So when a page tried to access part of itself it would get an exception? Or what?
-Boris
Sure, if it's important to prevent the case where "someone clones nodes
they got via XMLHttpRequest and then inserts them".
I guess it depends on what you meant by "doomed to failure." I'm only
trying to solve one authoring problem. Consider your typical CGI or PHP
script. They're writing HTML (and XML) with string concatenation, like this:
echo("<div>" + user_submitted_text + "</div>");
A professional might have
echo("<div>" + strip_dangerous(user_submitted_text) + "</div>");
and the strip_dangerous function might be very good indeed at an
operation with lots of resources, like Bloglines or Google Reader. But
even they get tripped up regularly, because they haven't reverse
engineered /parser/htmlparser and the strip_dangerous function tends to
execute separately from the code that writes to the wire (where the
encodings get mangled, etc). I want to give authors a replacement string
for the "<div>" portions of the examples, and remove the burden of
implementing an HTML parser from websites that want to include
user-submitted content.
-Rob
If we just replace <div> with <strip_dangerous>, then it seems to me
that we've only moved the problem. Now, strip_dangerous has to be sure
to remove all possible insertions of "</strip_dangerous>" in
user_submitted_text, which, as you point out is hard if you haven't
fully reversed engineered parser/htmlparser.
I seem to recall dveditz asking about using HTML processing instructions
for this purpose (since they don't have any sort of end tag), but that
has interoperability problems since most browsers don't support them.
There is also the problem that the untrusted content *must* be after all
untrusted content, since there's no way to turn the filtering off.
--
Blake Kaplan
This has been proposed several times in the past. Usually, the idea
founders on either the definition of "dangerous stuff", the difficulty
of making sure the content doesn't close the livejournal-comment tag, or
the difficulty of making sure the content inside doesn't affect the
content outside (e.g. by overlaying it using CSS absolute positioning).
Gerv
Hmm. We don't have this problem with innerHTML, since we can disallow
the element within the string, and we could also get some context by
doing <strip_dangerous src="">.
The other thing I can thing of would be to allow some random stuff in
the tag name,
<strip_dangerous:737466a7bfc72727d95cc0c1aa9dc5e7>
</strip_dangerous:737466a7bfc72727d95cc0c1aa9dc5e7>
getting kind of ugly, I know.
-Rob
"Not provably secure" might be a better way to put it. Basically, I'm afraid of
solutions that give people a sense of security without actual security -- that's
a dangerous combination.
> I'm only trying to solve one authoring problem. Consider your typical CGI or PHP
> script. They're writing HTML (and XML) with string concatenation, like
> this:
>
> echo("<div>" + user_submitted_text + "</div>");
I would assume that you would first replace all '<' in said text with >.
Certainly anyone sane would do that... I suppose the problem is that you want
to preserve certain user formatting (say <b> and style attributes) but at the
same time disallow things like <script> tags and onclick attributes?
> I want to give authors a replacement string
> for the "<div>" portions of the examples, and remove the burden of
> implementing an HTML parser from websites that want to include
> user-submitted content.
My problem is how to make sure that such a solution remains safe when some other
part of the same organization writes some script that rearranges things on the
page a bit.
-Boris
Boris Zbarsky wrote:
> Robert Sayre wrote:
>> I guess it depends on what you meant by "doomed to failure."
...
>
> My problem is how to make sure that such a solution remains safe when
> some other part of the same organization writes some script that
> rearranges things on the page a bit.
I am not sure what the threat is. If the page source is
<strip_dangerous>
<b>
<a onclick='javascript:doEvil()'>a</a>
</b>
</strip_dangerous>
the onclick attribute will not be in the DOM. The content sink will
never even see it. If the threat is that a script could insert such an
attribute, then I don't see how my proposal makes that problem worse. If
the threat is a script that re-authors the page on the server, then I
agree that my proposal does not solve that problem.
Gervase Markham wrote:
> [1] the definition of "dangerous stuff",
> [2] the difficulty of making sure the content doesn't close the
> livejournal-comment tag,
> [3] the difficulty of making sure the content inside doesn't affect the
> content outside (e.g. by overlaying it using CSS absolute positioning).
#2 seems solvable, though the solutions might not be pretty. #1 and #3
seem like social barriers to completion. I guess the answer is to cut
features until #3 is no longer a problem, reassure upset feature
advocates that strip_dangerous can be made more permissive by
identifying sanitization policies in an attribute, but place the
definition of any such policies out of scope for the initial effort.
I still suspect this is worth the effort, but everyone else seems to be
trying to talk me out of it :)
-Rob
Or then a pair of associated tags ? :
<strip_dangerous>
<strip_dangerous_seq_start id='737466a7bfc72727d95cc0c1aa9dc5e7' />
<!-- restricted content goes here -->
<strip_dangerous_seq_end id='737466a7bfc72727d95cc0c1aa9dc5e7' />
<strip_dangerous>