Mapping crash signatures to bugs

Benjamin Smedberg

unread,

May 22, 2007, 3:26:05 PM5/22/07

to alqa...@ardisson.org

In order to create useful topcrash reports, we need to be able to map crash
signatures to bug reports. I am trying to wrap my head around how people
actually use this feature currently, and how it can be improved.

For the moment, please assume that the crash signature is "perfect": that
is, it accurately represents a unique kind of crash. This obviously isn't
true yet, but I'll post separately about how I think we can accomplish it.

1) Can we map a crash signature to a single bug number?
2) Should the mapping be "automatic", by reading whiteboard information from
the bugzilla database?
2a) Or should we ask QA people to manually map crash signatures to bug numbers?
3) Beltzner's proposed reporting UI asks people to submit their email
address so that when the problem is fixed, we can email them and let them
know. What is the best way to semi-automate this process? In particular, we
obviously don't want to email them when the bug is marked FIXED, but rather
when a stable dot-release contains the fix.

In any case, this sounds like a lot of cooperation between separate datasets
on different databases. Suggestions welcome on how they can cooperate most
effectively.

Smokey, you mention this in
http://wiki.caminobrowser.org/User:Sardisson/Crash_Analysis_UI... how do you
envision this working?

--BDS

Smokey Ardisson

unread,

May 23, 2007, 1:19:42 AM5/23/07

to

In my case, I mostly had issues with the way Talkback (half-)worked;
the general paradigm seemed OK to me.

On May 22, 3:26 pm, Benjamin Smedberg <benja...@smedbergs.us> wrote:

> 1) Can we map a crash signature to a single bug number?

Ideally, yes, but I think that depends on both absolutely perfect
signatures and absolutely perfect bugs.

My main issue with this right now is that Talkback's mapping just
randomly(?) chooses one open bug and one closed bug (if applicable) to
display, and those choices aren't usually very good in my experience.

E.g., http://talkback-public.mozilla.org/reports/camino/CM11x/index.html
The PL_DHashTableOperate signature is mapped to an old Windows-only
crash (bug 234169) when all of those incidents are almost assuredly
bug 349463.

What I'd like to see is some sort of "list" of (all?) the open bugs
that are possible matches, as well as a second column/whatever with a
set of recently-closed (1 month?) bugs that also match (for
verification purposes; either the signature stops appearing and you
see the bug in the closed column and can verify the bug, or the
signature continues to appear and you know that bug NNNNNN in the
closed column is the bug you probably want to re-open).

I'd prefer to err on the side of multiple possibilities over choosing
just one to display, particularly with multiple products and branches
in play. (I've probably violated the assumption of a perfect
signature here ;) but I've just gone through lots of pain recently
with objc_msg_send crashes and the signatures Talkback provides for
it.... I'm excited to see the plan for perfect signatures :) )

> 2) Should the mapping be "automatic", by reading whiteboard information from
> the bugzilla database?

Convention seems to be we add [@ signature:ofCrash][@
additionalSignature:ofCrash] to a bug's summary; I'd default to having
automatic matching based on that.

(I'm unclear what Talkback uses right now, but one of my big issues
with it is that it very often does not match bugs that *do* have the
signature in their titles and sometimes matches bugs that don't have
the signature anywhere in the bug content. Often it won't match bugs
filed in Camino, but I do see it enough on bugs filed in Core or
Firefox.

E.g., from the same Camino topcrash report, the DesktopServicesPriv.
67.0.0 crash signature is at least vaguely related to Core bug 335061,
but Talkback doesn't manage to match up the bug.)

Having a clear, predictable method the webtool uses for attempting to
find matches, and which we can use on the Bugzilla side, would be a
huge improvement here, IMO.

> 2a) Or should we ask QA people to manually map crash signatures to bug numbers?

This was one of Jesse's proposals; I'd certainly like to see the
ability for QA to at least add matches. I think I'd like the webtool
to start us off by automatic mapping, just because it saves us humans
time.

> 3) Beltzner's proposed reporting UI asks people to submit their email
> address so that when the problem is fixed, we can email them and let them
> know. What is the best way to semi-automate this process? In particular, we
> obviously don't want to email them when the bug is marked FIXED, but rather
> when a stable dot-release contains the fix.

Based on the signature-to-bug mapping, query Bugzilla for the bug and
the bug's fixed/verified1.x.x.x, and queue those people to be emailed
when Gecko 1.x.x.x ships in their product? That seems pretty complex
(and inaccurate unless there really is a perfect incident-to-bug
mapping) already, and moreso if you mix in reports from people using
apps whose release schedules are not completely in sync with Firefox.

I'd throw my hands up in the air on that one ;)

Just to add to the mix: we may (will) want to consult this list during
the "reproduce it" stage of some bugs in order to see if we can't get
more info from the people who are seeing the crash (within the
parameters of whatever privacy policy is in place).

Hope this helps.

Smokey

Ray Kiddy

unread,

May 23, 2007, 1:23:16 PM5/23/07

to

Benjamin Smedberg wrote:
> In order to create useful topcrash reports, we need to be able to map crash
> signatures to bug reports. I am trying to wrap my head around how people
> actually use this feature currently, and how it can be improved.
>
> For the moment, please assume that the crash signature is "perfect": that
> is, it accurately represents a unique kind of crash. This obviously isn't
> true yet, but I'll post separately about how I think we can accomplish it.
>
> 1) Can we map a crash signature to a single bug number?

In theory, one can map crash signatures to a unique bug. Practically,
one can have a "stack expression" in a bug and match crashes to bugs
through that expression. The expression needs to be able to be refined,
and crashes need to be re-mapped to the bug based on that refinement.
Sometimes bugs for crashes need to be consolidated.

A "stack expression" may need to be multi-line, so it may be awkward to
put it in the whiteboard. Or maybe not.

What I found in a similar system (CrashReporter and Radar at Apple) is
that one usually wants a bug created just for each crash, as one is able
to find corresponding stack expressions for groups of crashes. That bug
should be created automatically and managed, as much as possible,
automatically. Another bug or bugs can be created to relate the human
tasks and discussion to the bug that manages the crash info.

> 2) Should the mapping be "automatic", by reading whiteboard information from
> the bugzilla database?

The bug that specifically matches the crashes to the "stack expression"
can be automatically managed and automatically created.

The fact that these bugs are automatically created is important, but the
fact that these bugs are easily differentiable from people-created bugs
is important also. People tend not to want to deal with large masses of
machine-generated bugs, when they are looking for "human intel". The
fact that the facts can be filtered out and managed makes their
automatic creation palatable.

> 2a) Or should we ask QA people to manually map crash signatures to bug numbers?

No. Nobody deserves to be sentenced to reading reams and reams of stack
traces, not even QA people.

> 3) Beltzner's proposed reporting UI asks people to submit their email
> address so that when the problem is fixed, we can email them and let them
> know. What is the best way to semi-automate this process? In particular, we
> obviously don't want to email them when the bug is marked FIXED, but rather
> when a stable dot-release contains the fix.

And how are the e-mails going to be attached to the bug, given that MoCo
will not want the e-mail addresses to be readable in the body of the
bug? Are there provisions for secure attachments, attached to
potentially non-secure bugs, in bugzilla?

> In any case, this sounds like a lot of cooperation between separate datasets
> on different databases. Suggestions welcome on how they can cooperate most
> effectively.

An important part of cooperating datasets is ensuring that things can be
uniquely identified. Do builds have a unique ID yet, since in the past
the build IDs were not actually unique if two builds came out on the
same day?

It seems as though it would be easy to generate a UUID with every single
build, done anywhere, on any machine, but I am sure there are lots of
complicated reasons not to do this....

Benjamin Smedberg

unread,

May 24, 2007, 11:55:56 AM5/24/07

to

Ray Kiddy wrote:

>> 1) Can we map a crash signature to a single bug number?
>
> In theory, one can map crash signatures to a unique bug. Practically,
> one can have a "stack expression" in a bug and match crashes to bugs
> through that expression. The expression needs to be able to be refined,
> and crashes need to be re-mapped to the bug based on that refinement.
> Sometimes bugs for crashes need to be consolidated.
>
> A "stack expression" may need to be multi-line, so it may be awkward to
> put it in the whiteboard. Or maybe not.

Let's assume that we have to stick with the bugzilla metadata that's
currently available, so we have no choice but to make it a single-line
expression.

> What I found in a similar system (CrashReporter and Radar at Apple) is
> that one usually wants a bug created just for each crash, as one is able
> to find corresponding stack expressions for groups of crashes. That bug
> should be created automatically and managed, as much as possible,
> automatically. Another bug or bugs can be created to relate the human
> tasks and discussion to the bug that manages the crash info.

I'm having trouble understanding the workflow. Could you describe what you
mean more? In particular:

1) When does the automatic bug get created? Is there a threshold? I imagine
that we have a decent number of crashes that are specific to a particular
computer/user (which may be reported multiple times)... I don't think we
have the bandwidth to sort through every one of these. Or are you proposing
that we don't expect to sort through all the automatically-generated bugs?
If so, what is their value?

2) Assuming a good automatic bug reports is created, how does a developer
indicate "I'm working on this crash"?

3) If the crash is fixed (especially if it happens to be fixed but not by a
particular checkin), how do we close out the automatic bug report?

As you can probably tell, I'm skeptical about the automatic system. I think
it might make more sense to provide tools that allow QA volunteers to
efficiently sift through the data to identify

1) top crashers
2) new crashers
3) patterns of common data for crashers

> And how are the e-mails going to be attached to the bug, given that MoCo
> will not want the e-mail addresses to be readable in the body of the
> bug? Are there provisions for secure attachments, attached to
> potentially non-secure bugs, in bugzilla?

I wouldn't attach the emails to the bug. Rather, when we release an update
that fixes a crasher, we have a form in Socorro for "let reporters of this
crash know it has been fixed".

> An important part of cooperating datasets is ensuring that things can be
> uniquely identified. Do builds have a unique ID yet, since in the past
> the build IDs were not actually unique if two builds came out on the
> same day?

The application buildid has been unique down to the hour for years. However,
we only rebuild the talkback client when we do a clobber (i.e. we don't
rebuild talkback for hourly builds). Breakpad does not have a separate "ID"
and simply reuses the app buildid.

I'm curious though... what has identifying the particular build have to do
with datasets cross database boundaries? I see how it's important to have
unique identifiers for crash signatures and for individual crash reports,
but it seems unlikely that most crashes would be unique to a particular
build of Firefox, or that bugzilla would need to have that data.

> It seems as though it would be easy to generate a UUID with every single
> build, done anywhere, on any machine, but I am sure there are lots of
> complicated reasons not to do this....

It's a possibility... I kinda like this idea, except that we need to be able
to match builds against dates in order to do some kinds of aggregate reporting.

I've been toying with a buildid of the form YYYYMMDDHH-machinename-branchname

perhaps we should expand that to
YYYYMMDDHH-machinename-branchname-UUID

--BDS

Axel Hecht

unread,

May 24, 2007, 12:42:06 PM5/24/07

to

Benjamin Smedberg wrote:
> In order to create useful topcrash reports, we need to be able to map crash
> signatures to bug reports. I am trying to wrap my head around how people
> actually use this feature currently, and how it can be improved.
>
> For the moment, please assume that the crash signature is "perfect": that
> is, it accurately represents a unique kind of crash. This obviously isn't
> true yet, but I'll post separately about how I think we can accomplish it.
>
> 1) Can we map a crash signature to a single bug number?

How about using a cryptographic hash of the stack trace and put that
into the whiteboard in a known format?

Axel

Ray Kiddy

unread,

May 25, 2007, 2:51:28 AM5/25/07

to

There can be a threshold. The point is to not have to sort through every
one of them.

The system I am familiar with stored the stack traces, symbolicated
them, and then scanned for patterns in the stack traces. If new crashes
were identifiable via a pattern from an existing bug, the new crash
information would be attached to that existing bug.

If a pattern is identified in stacks that are not tied to an existing
bug, a new bug can be automatically created to capture that pattern.

If you want to wait until some number of stack traces is identifiable
via a pattern, that is fine.

I am not sure why you use the phrase "specific to a particular user",
above. The symbolification process "generalizes" the stack. It turns
address references into offsets, for example. Then different stack
traces from different machines can be matched up. Is this not what is
already being envisioned?

> 2) Assuming a good automatic bug reports is created, how does a developer
> indicate "I'm working on this crash"?
>

The same way they do with any other bug. An automatically created bug is
just a bug. It just also has information added to it which would allow
it to be filtered in or out of a search.

Including this information does not suggest these bugs _should not_ be
looked at. It means that they do _not have_ to be dealt with by every
single person.

People have ways of working with bugzilla that take into account
bugzilla's traditional workflow, that being that bugs are expensive to
create, hard to manipulate in groups, and chatty, each one generating
lots of mail. If automatedly-generated bugs are not seperable, people
will complain. I am just saying that criticism can be cut off at the
pass by make automatedly-generated bugs identifiable as such.

> 3) If the crash is fixed (especially if it happens to be fixed but not by a
> particular checkin), how do we close out the automatic bug report?
>
> As you can probably tell, I'm skeptical about the automatic system. I think
> it might make more sense to provide tools that allow QA volunteers to
> efficiently sift through the data to identify
>
> 1) top crashers
> 2) new crashers
> 3) patterns of common data for crashers

It makes sense to provide those tools to QA also. But if you are only
going to supply those tools to QA, then you have a system where a
chaotic stream of crash traces will need to be dealt with by a finite
number of people. If there are more crashes, it takes longer to search
through them and you get less information, and this is not when you want
less information.

I would also point out that the kind of pattern matching that will be
needed to compare stacks is much easier for software than for people. So
why not let the software do it?

>
>> And how are the e-mails going to be attached to the bug, given that MoCo
>> will not want the e-mail addresses to be readable in the body of the
>> bug? Are there provisions for secure attachments, attached to
>> potentially non-secure bugs, in bugzilla?
>
> I wouldn't attach the emails to the bug. Rather, when we release an update
> that fixes a crasher, we have a form in Socorro for "let reporters of this
> crash know it has been fixed".
>

Ok. I am not sure what Socorro is, but I'll assume it is covered.

>> An important part of cooperating datasets is ensuring that things can be
>> uniquely identified. Do builds have a unique ID yet, since in the past
>> the build IDs were not actually unique if two builds came out on the
>> same day?
>
> The application buildid has been unique down to the hour for years. However,
> we only rebuild the talkback client when we do a clobber (i.e. we don't
> rebuild talkback for hourly builds). Breakpad does not have a separate "ID"
> and simply reuses the app buildid.

Down to the hour is not unique. Unique should mean really unique.
Something that is "almost unique", such as SSNs for people, is asking
for trouble.

I may not have been clear but I was talking about the app buildid, which
would be needed to report exactly which version of the app was seeing
the crash.

> I'm curious though... what has identifying the particular build have to do
> with datasets cross database boundaries? I see how it's important to have
> unique identifiers for crash signatures and for individual crash reports,
> but it seems unlikely that most crashes would be unique to a particular
> build of Firefox, or that bugzilla would need to have that data.

Crashes will probably start with a particular version of the browser,
especially after the low-hanging fruit is cleared out of the crash lists.

At least some of the time, a specific check-in will lead to crashes. One
needs to identify the build of the app uniquely to see this.

>> It seems as though it would be easy to generate a UUID with every single
>> build, done anywhere, on any machine, but I am sure there are lots of
>> complicated reasons not to do this....
>
> It's a possibility... I kinda like this idea, except that we need to be able
> to match builds against dates in order to do some kinds of aggregate reporting.
>
> I've been toying with a buildid of the form YYYYMMDDHH-machinename-branchname
>
> perhaps we should expand that to
> YYYYMMDDHH-machinename-branchname-UUID

Yes. After all, it is cheap to create a UUID. The universe is not going
to run out of them any time soon.

- ray

> --BDS

Benjamin Smedberg

unread,

May 25, 2007, 7:22:12 AM5/25/07

to

Ray Kiddy wrote:
>> Benjamin Smedberg wrote:
> Ray Kiddy wrote:

> I am not sure why you use the phrase "specific to a particular user",
> above. The symbolification process "generalizes" the stack. It turns
> address references into offsets, for example. Then different stack
> traces from different machines can be matched up. Is this not what is
> already being envisioned?

Sure... we get symbol information so we can identify the crashes by function
frame for the most part. What I mean is, if there is a single computer that
experiences 100s or 1000s of the same crash, but nobody else does, then I
wouldn't want a bug report for it.

>> 3) If the crash is fixed (especially if it happens to be fixed but not
>> by a
>> particular checkin), how do we close out the automatic bug report?
>>
>> As you can probably tell, I'm skeptical about the automatic system. I
>> think
>> it might make more sense to provide tools that allow QA volunteers to
>> efficiently sift through the data to identify
>>
>> 1) top crashers
>> 2) new crashers
>> 3) patterns of common data for crashers
>
> It makes sense to provide those tools to QA also. But if you are only
> going to supply those tools to QA, then you have a system where a
> chaotic stream of crash traces will need to be dealt with by a finite
> number of people. If there are more crashes, it takes longer to search
> through them and you get less information, and this is not when you want
> less information.
>
> I would also point out that the kind of pattern matching that will be
> needed to compare stacks is much easier for software than for people. So
> why not let the software do it?

Sure, I'm all about software doing it. Maybe the disconnect here is just
about *which* software should be doing it. We were planning on building most
of the aggregation/searching into the breakpad server itself. Therefore, you
don't need a bug until you want to track additional "human" information
(such as who's working on it, or that it blocks a particular release, or...)

Also, please note that "provide the tools to QA" means the mozilla community
QA resources: the tool is public, except for the private data we collect
which right now is just the email address.

The amount of sophisticated pattern-matching and sorting we can build into
socorro is much greater than we can add to bugzilla.

--BDS