Scott Fleckenstein replied 1 hour ago
>I hate to be a crank, but if firefox is _allowing_ extensions to crash the browser, it really is firefox's fault.
>
>You control the operating environment, you should have an environment in place to catch and disable offending extensions individually.
I asked Lucy for a good response to this, she gave me some answers but
nothing backing it up (and, like on Wikipedia, [citation needed]), and
suggested instead I ask here for some good responses to that
statement. I assumed there might be a good blog post or something by
a developer already?
Thanks.
--
Tom on irc.mozilla.org
What response is needed? It's obviously stupid, Firefox would not
"allow" extensions to crash the browser.
John
Hi guys, I'm Scott Fleckenstein who was quoted above. 'allow' was
perhaps not the correct word, but it was not in any way meant as a
troll. I think that you are concentrating on the wrong part of the
statement.
Given that firefox owns the operating environment that extensions live
in and decides itself when and how to load extensions, it seems very
reasonable to believe that it could prevent crashes because of
extensions, or at least fail in a friendly fashion and provide
reliable information to the end user about what extension they should
disable. Over at Get Satisfaction (where the quote was pulled from),
the topic I posted in was the 17th topic pulled from the twittersphere
when people were venting about firefox continually crashing.
Correct me if I'm wrong, but if such a technical solution is possible,
it is firefox's fault for not putting in place such a tracing or
protection system. It is unreasonable to believe that extension
developers would all suddenly learn how to produce un-crashy code.
However, I think it is entirely reasonable to hope that the firefox
developers would be able to put into place a system that helps protect
the end-user experience in face of poor extension code.
I'd love to hear your thoughts about where my assumptions are wrong.
Thanks for your time,
Scott Fleckenstein
Firefox runs on Window, Linux, MacOS. Its not the operating system ... yet.
> in and decides itself when and how to load extensions, it seems very
> reasonable to believe that it could prevent crashes because of
> extensions, or at least fail in a friendly fashion and provide
> reliable information to the end user about what extension they should
Sorry, just not correct. Crash means Firefox loses control and its up
to the OS after that.
> disable. Over at Get Satisfaction (where the quote was pulled from),
> the topic I posted in was the 17th topic pulled from the twittersphere
> when people were venting about firefox continually crashing.
>
> Correct me if I'm wrong, but if such a technical solution is possible,
> it is firefox's fault for not putting in place such a tracing or
Firefox does not allow Javascript to have operations that crash the
browser. Extension can have C++ code, then there is no protection.
> protection system. It is unreasonable to believe that extension
> developers would all suddenly learn how to produce un-crashy code.
In Javascript, any crash is Firefox's 'fault'. Report them, they get fixed.
> However, I think it is entirely reasonable to hope that the firefox
> developers would be able to put into place a system that helps protect
> the end-user experience in face of poor extension code.
Its called Javascript and it works wonderfully. FF3 is very solid for a
first release.
>
> I'd love to hear your thoughts about where my assumptions are wrong.
The number one assumption you have wrong is that Firefox can be bug
free. Number two is that Firefox team can fix bugs that are not
reported. Since there are bugs that only extensions trip on, FF can
crash with extensions because of these bugs. So the only, *only* way to
get them fixed is for users to report them. Only, did I say that enough ;-)
Thats a bit of a copout. Firebug used to be crashy as hell, and it
has not a single piece of native code in it. Besides, when you look
at the crash reports referenced in the topic at Get Satisfaction, they
are all EXCEPTION_ACCESS_VIOLATION crashes, which is perfectly
catchable. If you look at the framestacks, you can see it happens
when inside of the javascript interpreter.
It seems reasonable to track on a thread a stack for extensions: push
when entering extension code, pop when returning. When an uncaught
exception occurs, save that and act upon when there is a crash.
If there is no protection, how can does the crash reporter get run?
It almost never is a hard crash. Am I wrong in assuming that there is
some hook out there that tells firefox to launch the crash reporter
and where it can find the crash report data?
> In Javascript, any crash is Firefox's 'fault'. Report them, they get fixed.
True, but like I mentioned in the topic at Get Satisfaction, that
solution is very mediocre, and breeds anger and frustration. Give me,
the user, the ability to improve my experience. Tell me what
extensions are triggering that code so I can remove the offending add-
ons. Right now, Firefox doesn't empower me at all, I'm at the mercy
of a volunteer dev team if I don't have the ability diagnose a crash.
Not many do.
> Its called Javascript and it works wonderfully. FF3 is very solid for a first release.
But not well enough. I don't want to beat up on the firefox team
because they do great work and I'm very thankful. That said, to say
"disable your extensions because they cause most crashes" out of one
side of your mouth (the support team) and to say "Its called
Javascript and it works wonderfully [to prevent crashes]" out of the
other side is a conflicting story. What percentage of extensions out
there have native code in them? I'd like to do some more research
about the nature of the crashes in Firefox, but am unable to browse
around the crash stats site. Is there a place that end users can
download the crash data directly?
> The number one assumption you have wrong is that Firefox can be bug free.
I don't have that assumption at all. However, I do assume that
firefox can provide a better experience than what is in place.
> Number two is that Firefox team can fix bugs that are not reported.
Again, that is not what I'm assuming at all. It has nothing to do
with empowering users to better diagnose and solve their own problems.
> Since there are bugs that only extensions trip on, FF can crash with extensions because of these bugs. So the only, *only* way to get them fixed is for users to report them.
I'm not convinced that this is the only path. Surely, we can improve
the experience for at least _some_ crashes. Even if it is just
catching EXCEPTION_ACCESS_VIOLATION that are thrown underneath
XPC_WN_CallMethod (2 of the three crashes mentioned in the topic at
Get Satisfaction) and reporting on the nearest extension code it would
be an improvement.
I think you meant to say: Firefox 2 crashed frequently when using
Firebug and browsing sites with lots of AJax and errors. The combination
of javascript debugging and AJAX exercised a code path in FF2 that had
not been seen in the past. This was fixed in FF3.
> at the crash reports referenced in the topic at Get Satisfaction, they
> are all EXCEPTION_ACCESS_VIOLATION crashes, which is perfectly
> catchable. If you look at the framestacks, you can see it happens
> when inside of the javascript interpreter.
>
> It seems reasonable to track on a thread a stack for extensions: push
> when entering extension code, pop when returning. When an uncaught
> exception occurs, save that and act upon when there is a crash.
What kind of act did you have in mind? With access violation you know
for sure there is a bad address in the code. Now what?
>
> If there is no protection, how can does the crash reporter get run?
> It almost never is a hard crash. Am I wrong in assuming that there is
> some hook out there that tells firefox to launch the crash reporter
> and where it can find the crash report data?
Ok then, problem solved: crash reported does trap it isn't that what you
want?
>
>> In Javascript, any crash is Firefox's 'fault'. Report them, they get fixed.
>
> True, but like I mentioned in the topic at Get Satisfaction, that
> solution is very mediocre, and breeds anger and frustration. Give me,
> the user, the ability to improve my experience. Tell me what
> extensions are triggering that code so I can remove the offending add-
> ons. Right now, Firefox doesn't empower me at all, I'm at the mercy
> of a volunteer dev team if I don't have the ability diagnose a crash.
> Not many do.
Excellent idea. But not very easy to implement. The C++ trap handler
see the C++ stack, but the Javascript stack is needed to id the
extension code. That stack is invisible the C++ debugger, or rather it
is encoded in C++ data structures. So the trap handler would have to
look through the memory and analyze it for the stack.
Maybe the trap handler could be built into the JS engine layer?
>
>> Its called Javascript and it works wonderfully. FF3 is very solid for a first release.
>
> But not well enough. I don't want to beat up on the firefox team
> because they do great work and I'm very thankful. That said, to say
> "disable your extensions because they cause most crashes" out of one
> side of your mouth (the support team) and to say "Its called
> Javascript and it works wonderfully [to prevent crashes]" out of the
> other side is a conflicting story. What percentage of extensions out
I think this is just a primitive debugging by trial and error. So I
agree with you.
> there have native code in them? I'd like to do some more research
> about the nature of the crashes in Firefox, but am unable to browse
> around the crash stats site. Is there a place that end users can
> download the crash data directly?
I think the site is broken now. And when it is up, the info is very
obscure for most users. The Javascript stack is not available in this
site anyway, because of the above issues.
Yes, because of bugs in Firefox, or because of invariants violated by
Firebug (more the former than the latter, in Firebug's case).
> Besides, when you look
> at the crash reports referenced in the topic at Get Satisfaction, they
> are all EXCEPTION_ACCESS_VIOLATION crashes, which is perfectly
> catchable. If you look at the framestacks, you can see it happens
> when inside of the javascript interpreter.
That it's catchable doesn't mean it's recoverable. That a piece of
code is causing a crash means that it is in an unknown state, and that
means that it's not really possible to, in a general way, return it to
a known state. What other mistakes has it made?
I mean, you can say that the crash-reporter plus session restore are a
form of catching and recovery, but they're a limited one precisely
because of the difficulty of solving this problem at a deeper level.
> It seems reasonable to track on a thread a stack for extensions: push
> when entering extension code, pop when returning.
Your reasoning here is predicated on a mistaken understanding of how
extensions work. They are not isolated from the rest of Firefox once
they're loaded, and continuing after a crash is not a simple task.
They are much more like drivers being loaded into the operating system
than applications running on top of one.
One very common such crash pattern, for Firefox with extensions and
operating systems with drivers, is that the "loaded code" modifies the
state in some way that violates an invariant, and then at some point
later an unrelated piece of code trips on it.
> When an uncaught
> exception occurs, save that and act upon when there is a crash.
I'm not sure what you're suggesting here.
> If there is no protection, how can does the crash reporter get run?
> It almost never is a hard crash. Am I wrong in assuming that there is
> some hook out there that tells firefox to launch the crash reporter
> and where it can find the crash report data?
The crash reporter is an extremely isolated piece of code, which
cannot -- by the very nature of its problem -- poke around inside the
application's space and chase pointers to gather more information. By
the time it is called, the jig is very largely up, and we have to rely
on reading only the data that the operating system guarantees is safe
to inspect. We have added some things (like installed-extension
lists) to the crash report in FF3 in order to help spot trends, but
they rely on human judgement applied to aggregate data, and we don't
yet have automated ways to make educated suggestions to users based on
their reports. If you would be interested in working on improving
that (you sound like you have some development experience, or at least
interest), I suspect we'd be quite happy to be able to say "this sort
of crash is commonly associated with this extension, which you have
installed; disabling it might help". ("And if so, please let us know
so that we can report it to the extension author.")
>> In Javascript, any crash is Firefox's 'fault'. Report them, they get fixed.
>
> True, but like I mentioned in the topic at Get Satisfaction, that
> solution is very mediocre, and breeds anger and frustration. Give me,
> the user, the ability to improve my experience. Tell me what
> extensions are triggering that code so I can remove the offending add-
> ons.
We cannot tell you that, though if you're seeing a frequent crash then
you are likely going to be able to tell if it helps to disable a given
extension (or extension(s); they can interact with each other, as can
different drivers in an operating system).
> Right now, Firefox doesn't empower me at all, I'm at the mercy
> of a volunteer dev team if I don't have the ability diagnose a crash.
I think our support volunteers are pretty good at talking people
through diagnosing and dealing with extension-triggered crashes, but
maybe you're choosing a different meaning of "empower". We do not
make it trivial, or necessarily pleasant, but I don't think that's the
same thing.
> Not many do.
No, and that's why we have people who look at the incoming crash data
and work to isolate and repair bugs, including contact with plugin
vendors, extension developers and even operating system groups, where
the data points in that direction.
>> Its called Javascript and it works wonderfully. FF3 is very solid for a first release.
>
> But not well enough.
If Firefox is not stable enough for you, then you should absolutely
not use it. We are always improving the stability of the product, and
hope that you'll be happier in the future, but it is 100% not our
intention that you feel compelled to use a product that you think is
not good enough. We work very hard to encourage (and in some cases,
through our market share, even get close to _forcing_) developers to
make sites that work with a wide range of browsers, because you having
that choice is important to us.
> That said, to say
> "disable your extensions because they cause most crashes" out of one
> side of your mouth (the support team) and to say "Its called
> Javascript and it works wonderfully [to prevent crashes]" out of the
> other side is a conflicting story.
In this case JavaScript is a bit of a red herring, I think, though
certainly I would expect the number of extension-induced crashes to be
much higher if those developers had to work in C++ all the time. (Or
perhaps much lower, because they simply wouldn't make the extensions,
I guess.)
There are definitely combinations of things that you can do with our
platform, manipulating it via JavaScript from extension code, that
will result in a crash. We reduce them over time, but it's
necessarily part of a prioritization exercise, and given that
extension authors can assist with that (while they can't assist
meaningfully with other important work) means that perfecting those
API interactions may not be the right way to spend a given amount of
time. Of course, we can't tell you how to spend *your* time, and we
try to be receptive to good work, so if you think it's a high priority
for you then nobody is going to tell you otherwise. And of course the
ability to do so is predicated on knowing that it needs doing, which
is what John's point about reporting comes from.
> What percentage of extensions out
> there have native code in them? I'd like to do some more research
> about the nature of the crashes in Firefox, but am unable to browse
> around the crash stats site. Is there a place that end users can
> download the crash data directly?
No, I don't believe there is. There's work being done to make the
crash-stats site more responsive, so hopefully you'll be able to
browse around it in the rather near future.
(It's also an enormous amount of data.)
>> The number one assumption you have wrong is that Firefox can be bug free.
>
> I don't have that assumption at all. However, I do assume that
> firefox can provide a better experience than what is in place.
I'm certain that we could, and we would love to. Would you like to
assist with that? It sounds like you've thought in detail about how
to recover from the generalized case of state corruption, and it's
definitely work that we'd welcome. I caution you that it's not a
simple thing to behave reliably after an unknown violation has
occurred, which is why you see operating systems panic and dump a log
rather than risking more later corruption. (When it was overwriting
memory, did it ding the bookmarks store, such that we'll overwrite it
with cache contents later if we keep going?) Crashing is not the
worst thing that can happen to a user, as it happens.
(Somewhat off-topic, I think that if software in general tried harder
to recover from such unknown-state access violations, it would become
much easier to exploit bugs in them reliably. Right now you have to
find a way to poke a defect that doesn't crash the application before
you get things into the right state to exploit it, but if you get to
kick at the can until you get it right, things are probably a little
easier. I'm not advocating software fragility as a security measure,
but I do think we need to be _pretty_sure_ before we decide that we
can just keep going after the processor and operating system tell us
that we've done something that we should definitely not be doing.)
>> Since there are bugs that only extensions trip on, FF can crash with extensions because of these bugs. So the only, *only* way to get them fixed is for users to report them.
>
> I'm not convinced that this is the only path. Surely, we can improve
> the experience for at least _some_ crashes.
Certainly we can. The question is to what extent we should divert
resources away from other activities (like fixing the bugs that are
reported, either shallowly through the crash reporter or with more
detail by sophisticated and motivated users) to make the crashing
experience better.
> Even if it is just
> catching EXCEPTION_ACCESS_VIOLATION that are thrown underneath
> XPC_WN_CallMethod (2 of the three crashes mentioned in the topic at
> Get Satisfaction) and reporting on the nearest extension code it would
> be an improvement.
Just as Firefox can be wrong-footed by extension code, extension code
can be wrong-footed by other extensions, or of course by bugs in
Firefox. I don't think we can report usefully on the "nearest
extension code" such that it helps users make an informed decision,
and I would be quite reluctant to bake that sort of"blamecasting" into
the product. But if you can see that it's XPC_WN_CallMethod, can't
you see what extension it's calling? You have all the information we
have, there.
An experiment to install SEH above those callouts might be
interesting, though I suspect that the performance impact would be
non-trivial, and again a lot of extensions mutate Firefox-internal
state in order to do their work, so I'm not sure it's actually going
to give a better experience. High-reliability systems tend to be
built around 'fail at the first hint of trouble, get back to
completely known state, and restart operation' as an architectural
model, but it's definitely a trade-off vs performance and other
engineering forces.
Can you give an example of a similar (driver-esque) crash reporting
and isolation model that you think does a good job providing this sort
of "empowerment" to users? That might be helpful in guiding future
work on this topic.
You want the experience to be better, which is unsurprising -- and
especially unsurprising if you've been experiencing a lot of crashes
-- and uncontroversial. But I also think your tone is more accusatory
than is needed to make your point: it would be better to have a
conversation about what might practically improve it, what the effects
of those choices would be, and how they should be prioritized against
other work. If you're interested in that conversation then I'm happy
to discuss in more detail, but if not then you have certainly made
your position clear. :)
Mike
And it would have to do that knowing that memory is almost certainly
corrupted somewhere, if we're getting an access violation. It can't
just start chasing pointers into the heap (or necessarily even
trusting the stack) to see what it finds, or we'll very quickly have a
crashing crash reporter.
It's a hard problem, even without the need to send the heap to where
the information about the meaning of the crash stack is actually held.
I don't know of any production systems that do anything equivalent,
but my research into the area has been casual.
Mike
That was my initial reaction as well. But maybe not, at least for most
cases.
Consider the entire FF memory. The access violation could be a single
word -- very likely -- a modest string of memory -- unlikely -- or a
large chunk, very unlikely. So in theory you can't go looking around in
a crash, but in practice I bet you could.
Consider the stack. From the point of the access violation to the trap
handler there will be good frames and at least on bad frame (meaning
something's amiss). Any info from the good frames that lead to the js
stack will give hints as the code running at the crash. Since the JS
interpreter most likely did not crash (at least I've not seen this
often), the JS stack up to some call into a C++ API will be ok, and that
is exactly the info that would help diagnose the point of error.
Well this isn't true: these bugs are mostly caused by some C++ code that
ran in the past setting (or not setting) a value that later became the
crash site.
But the point raised by Scott still remains: users don't know which
extension to blame and binary search through new profile/install,
install install is very painful.
John
Firebug uses a part of the Mozilla code base that Firefox itself does
not use (jsdIDebuggerService). This code is extremely well tested for
normal cases. But then some folks invented Web 2.0 and a whole new class
of applications was created. These application use the jsd code in
different ways. So in FF2, the combination of Firebug, AJAX, and web
page errors crashed. Since Google ads us AJAX and they appear on lots
of pages with errors, the user experience was 'crashy as hell',
(especially if you set javascript.options.strict = true).
There isn't anything any developers could have done to prevent this,
except forbid AJAX, errors, or debugging.
John.
Whatever happened to having crash-reporter list the extensions
installed? That would be a huge advantage in crash analysis. It seems to
me well worth some developer time to have the crash-reporter site tell
us if a crash signature "only occurs with extension X present" or not.
I also think we should reconsider some attitude issues with respect to
extensions. I know with Firebug that when some reports "Firebug breaks
extension X" my reaction is immediately "not my problem, I don't use
extension X". I think that is a common and natural reaction, but one
that devalues the critically valuable extension ecology. We need a
better approach.
John.
I thought we did collect it, but I don't see it in the crash-reporter
site. Might be that it's protected (like the email address and
UserID); I don't have a privileged login.
I'll ask around.
Mike
Not? But you wrote: " So in *FF2*, the combination of Firebug, AJAX,
and web page errors crashed." clearly something must have changed in
Mozilla Firefox 3 to prevent these crashes, right?
I think FF3 fixed a number of bugs that were too small or too risky to
fix in FF2. So you are right, I should have said "...anything a fixed
set of developers could have done in a limited time...".
We did the work to make the list of add-ons included in the data sent to
the report server however I don't think the processor does anything with
it at the moment.
Dave
No problem, but it would have been great to have some workarounds to
please FF2 users, by not crashing, even if they *should* all use FF3 by now.
I think the bugs involved are very tough GC bugs and the code is no
one's baby. I wouldn't expect any fixes and I think it should be low
prioirity. Most developers can get enough work done by using FF3 mostly
then testing on FF2 but for sure with javascript.options.strict off.
John.