Stacktrace hash

245 views
Skip to first unread message

BoD

unread,
Feb 18, 2012, 8:38:06 AM2/18/12
to acra-d...@googlegroups.com
Hi!
I have made a little modification to ACRA to have a new field that can be reported: STACK_TRACE_HASH. Basically it takes the stacktrace, strips line numbers (because they are variable among Android versions) and uses Java's hashcode on that.

The idea is that this number can uniquely identify a 'class' of crashes (as opposed to the report id which identifies one instance of a crash).

The patch is very small, I will send it here later.

Tell me what you think :-)

--
BoD

BoD

unread,
Feb 18, 2012, 9:40:59 AM2/18/12
to acra-d...@googlegroups.com
Here is the patch.

-- 
BoD
acra_stacktrace_hashpatch.txt.patch

Nikolay Elenkov

unread,
Feb 18, 2012, 9:59:55 AM2/18/12
to acra-d...@googlegroups.com
On Sat, Feb 18, 2012 at 11:40 PM, BoD <bodl...@gmail.com> wrote:
> Here is the patch.

You are only using the method names, and that's stripping a lot of info.
You might want to add at least the class name too, and maybe the file
name. To get an unique identifier, a real hash function (e.g., SHA-1)
might be more appropriate. That would give you a 20-byte ID, as
opposed to the int's 4. It might actually be more efficient for large
stack traces too: String.hashCode() loops over all characters and
basically adds them together, and that might take some time for
a large string.

Kevin Gaudin

unread,
Feb 18, 2012, 12:06:54 PM2/18/12
to acra-d...@googlegroups.com
That's interesting. I know Jan Berkel had some work in progress about
hashing stack traces with a fuzzy hashing, allowing to group them
depending on the level of difference between to hash values.

Maybe he'll write about it over there some day ;-)

Kevin

BoD

unread,
Feb 18, 2012, 1:59:58 PM2/18/12
to acra-d...@googlegroups.com
Thanks for your reply Nikolay,

I am not sure why you would want the hash to be more 'specific' by
adding the class names here, as opposed to just keep the method names -
I have the impression the result would actually be the same, meaning it
would uniquely identify the crash in the vast majority of cases.

About md5: having a longer hash seems to me to be an inconvenient rather
than an advantage. For instance: "I just fixed crash 12345678" vs "I
just fixed crash 123456789abcdef01234".

OTOH speed is indeed important, but remember hashcode is a native method
(I don't know if the MessageDigest md5 apis are native). Not to mention
the code is also a bit simpler :)

--
BoD

Nikolay Elenkov

unread,
Feb 19, 2012, 1:21:38 AM2/19/12
to acra-d...@googlegroups.com
On Sun, Feb 19, 2012 at 3:59 AM, BoD <bodl...@gmail.com> wrote:
> Thanks for your reply Nikolay,
>
> I am not sure why you would want the hash to be more 'specific' by adding
> the class names here, as opposed to just keep the method names - I have the
> impression the result would actually be the same, meaning it would uniquely
> identify the crash in the vast majority of cases.

Same method in multiple classes get treated the same if you omit the class name.
Considering inheritance and overriding, that could be fairly common.

>
> About md5: having a longer hash seems to me to be an inconvenient rather
> than an advantage. For instance: "I just fixed crash 12345678" vs "I just
> fixed crash 123456789abcdef01234".
>

If you want the ids to really be unique, you need to use a hash. If you want
shorter human-readable ids, you can derive them from the hash, add tags,
etc.

> OTOH speed is indeed important, but remember hashcode is a native method (I
> don't know if the MessageDigest md5 apis are native). Not to mention the
> code is also a bit simpler :)
>

Not native on Android, digests however are. Simpler is not always better,
at the costs of a few more lines it will be much more reliable. As for speed,
In any case you might want to do some real benchmarking with different
size stacks, to get a definitive answer.

BTW, the fuzzy matching thing sounds interesting, I'd like to see how
it's done.

BoD

unread,
Mar 2, 2012, 8:31:00 AM3/2/12
to acra-d...@googlegroups.com
Actually my implementation had another problem: the cause exceptions
were not processed.

Here (in attachment) is an updated version that fixes this problem and
also takes the name of the class into account as Nikolay suggested. I
didn't implement the md5 though because I'm not convinced of its
usefulness :)

--
BoD

acra_stacktrace_hash.patch

Brian Sayatovic

unread,
Feb 13, 2015, 3:54:32 PM2/13/15
to acra-d...@googlegroups.com, B...@jraf.org
Slightly off topic:

I independently arrived at this same desire.  I came across this thread while seeking a solution for my problem space: a Microsoft .NET enterprise application.

I found no existing solution, so I rolled my own.  I'm using xxHash (32).  It's fast, low on collisions, and has implementations in many languages.  It's not cryptographically secure, but I don't need that.

I'm ashing the exception type as well as the class name, method name and signature of every method in the stack.  I explicitly exclude source files and line numbers since they're not available in every stack.  I also exclude the exception message since they usually include instance-specific information.

So far, it's working out well.  I'm eager to see if this helps our support team, users, developers, etc. more easily recognize problem patterns.
Reply all
Reply to author
Forward
0 new messages