Dumping Breakpad Symbols From PDB

1,002 views
Skip to first unread message

Jake Shadle

unread,
Apr 23, 2013, 10:02:32 AM4/23/13
to google-brea...@googlegroups.com
tl;dr PDB Dumping 1 hr -> 10 seconds

So I was really annoyed with how slow it took to create symbols from a Microsoft PDB, using a ~500MB PDB it was taking up to an hour on my quite beefy (32GB, 32CPU, SSD) machine at work. When I would break inside randomly it was ALWAYS either doing an de/allocation for string data of some sort or another it seemed, so it seemed the DIA2 DLL was basically string-bound. Fantastic.

So I did some digging around and found several sources for parsing the PDB format, the most helpful being http://ccimetadata.codeplex.com/, which is actually from Microsoft, and has some PDB parsing code in it that gave me most of what I needed, the rest I found in https://code.google.com/p/pdbparse/.

So I coded up a custom implementation of the PDB parsing, that is essentially just a drop in replacement for PDBSourceLineWriter, which has no dependencies on DIA2. Dumping the same exact PDB with the new method takes about 10 seconds, and uses less than 25% of the memory.

Is anyone interested in the code?  It is just 3 files, but it doesn't follow the Google coding standards at all, and has C++11 stuff, and uses the Microsoft concurrency runtime to thread some stuff, since this started off as an experiment.

*I haven't figured out how to read thunk records or trampolines properly, but as they are usually uninteresting anyways I considered it a small price to pay for the speedup.

Bruce Dawson

unread,
Apr 23, 2013, 1:07:13 PM4/23/13
to google-brea...@googlegroups.com
I'm interested. The time to convert PDB files to .sym files is a problem in our build process. No individual PDB files take an hour, but the sum total is certainly problematic.

Ivan Penkov

unread,
Apr 23, 2013, 6:59:26 PM4/23/13
to google-brea...@googlegroups.com
I'm also interested.  I've seen problems with MS DIA in the past.  It was crashing on certain Chrome PDB files.  Microsoft claimed that the PDB files were corrupt (they were generated by Microsoft tools).  It will be nice to see whether this PDB parser can do a better job.


--
You received this message because you are subscribed to the Google Groups "google-breakpad-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-breakpad-d...@googlegroups.com.
To post to this group, send email to google-brea...@googlegroups.com.
Visit this group at http://groups.google.com/group/google-breakpad-discuss?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Jake Shadle

unread,
Apr 24, 2013, 9:15:11 AM4/24/13
to google-brea...@googlegroups.com
OK, so here is the code.  There are probably edge cases that I am missing, and depending on compiler/linker options a PDB could be wildly different than the one I tested against, but this should at least be a good start for anyone who wants to generalize this code to handle more cases.

One quick note because I forgot about it, but it actually takes about 30 seconds to do the PDB dump, part of the speedup I got was actually linking against Intel's TBB malloc proxy library so that it does much more intelligent heap allocations, since the std:: containers that I used to store things like all of the type information for such a large PDB makes the containers do quite a few same sized allocations that gets a significant performance boost from using that library (mostly in the deallocation amusingly).


On Wednesday, April 24, 2013 12:59:26 AM UTC+2, Ivan Penkov wrote:
I'm also interested.  I've seen problems with MS DIA in the past.  It was crashing on certain Chrome PDB files.  Microsoft claimed that the PDB files were corrupt (they were generated by Microsoft tools).  It will be nice to see whether this PDB parser can do a better job.
On Tue, Apr 23, 2013 at 10:07 AM, Bruce Dawson <bruce....@gmail.com> wrote:
I'm interested. The time to convert PDB files to .sym files is a problem in our build process. No individual PDB files take an hour, but the sum total is certainly problematic.

On Tuesday, April 23, 2013 7:02:32 AM UTC-7, Jake Shadle wrote:
tl;dr PDB Dumping 1 hr -> 10 seconds

So I was really annoyed with how slow it took to create symbols from a Microsoft PDB, using a ~500MB PDB it was taking up to an hour on my quite beefy (32GB, 32CPU, SSD) machine at work. When I would break inside randomly it was ALWAYS either doing an de/allocation for string data of some sort or another it seemed, so it seemed the DIA2 DLL was basically string-bound. Fantastic.

So I did some digging around and found several sources for parsing the PDB format, the most helpful being http://ccimetadata.codeplex.com/, which is actually from Microsoft, and has some PDB parsing code in it that gave me most of what I needed, the rest I found in https://code.google.com/p/pdbparse/.

So I coded up a custom implementation of the PDB parsing, that is essentially just a drop in replacement for PDBSourceLineWriter, which has no dependencies on DIA2. Dumping the same exact PDB with the new method takes about 10 seconds, and uses less than 25% of the memory.

Is anyone interested in the code?  It is just 3 files, but it doesn't follow the Google coding standards at all, and has C++11 stuff, and uses the Microsoft concurrency runtime to thread some stuff, since this started off as an experiment.

*I haven't figured out how to read thunk records or trampolines properly, but as they are usually uninteresting anyways I considered it a small price to pay for the speedup.

--
You received this message because you are subscribed to the Google Groups "google-breakpad-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-breakpad-discuss+unsub...@googlegroups.com.
PDBHeaders.h
PDBParser.cpp
PDBParser.h

Jake Shadle

unread,
Apr 24, 2013, 5:41:44 PM4/24/13
to google-brea...@googlegroups.com
Oh, and this doesn't do any name demangling since I actually didn't encounter mangled symbols, but it would be trivial to add.

Ted Mielczarek

unread,
Sep 11, 2013, 7:16:57 AM9/11/13
to google-brea...@googlegroups.com, Jake Shadle
On 4/24/2013 9:15 AM, Jake Shadle wrote:
> OK, so here is the code. There are probably edge cases that I am
> missing, and depending on compiler/linker options a PDB could be
> wildly different than the one I tested against, but this should at
> least be a good start for anyone who wants to generalize this code to
> handle more cases.
>
> One quick note because I forgot about it, but it actually takes about
> 30 seconds to do the PDB dump, part of the speedup I got was actually
> linking against Intel's TBB malloc proxy library so that it does much
> more intelligent heap allocations, since the std:: containers that I
> used to store things like all of the type information for such a large
> PDB makes the containers do quite a few same sized allocations that
> gets a significant performance boost from using that library (mostly
> in the deallocation amusingly).
>
>
Digging up an old thread, but I'm interested in using this. One sticking
point could be the license. Did you really mash up some Ms-PL code with
some GPL code? IANAL, but I have a strong suspicion that those licenses
are not actually compatible (The FSF seems to agree[1]). If you could
get that sorted out somehow so that you could distribute this code
legitimately it'd be a lot easier for us to use it.

-Ted

1. http://www.gnu.org/licenses/license-list.html#ms-pl

Jake Shadle

unread,
Sep 13, 2013, 11:30:07 AM9/13/13
to google-brea...@googlegroups.com, Jake Shadle
Shit, didn't even think about that, normally everything I write is internal tools that don't have to worry about licenses since they aren't distributed, should have thought about that before I posted the code...

After reading some license related things, it seems like it might be possible to just put the BSD license that the rest of breakpad uses on the files?  I don't think it can be considered a 'derived' work since the only thing I used them for was to determine how to parse a file format that is itself not publicly licensed, in an entirely different language for an entirely different purpose.  But like I said, I never deal with this stuff usually, so I could be completely wrong.

Also, I have made some fixes/improvements in the code since I originally posted it, the Xbox 360 compiler had some divergent output from the standard VC++ compiler.

Ted Mielczarek

unread,
Sep 13, 2013, 11:54:21 AM9/13/13
to google-brea...@googlegroups.com
On 9/13/2013 11:30 AM, Jake Shadle wrote:
> Shit, didn't even think about that, normally everything I write is
> internal tools that don't have to worry about licenses since they
> aren't distributed, should have thought about that before I posted the
> code...
>
> After reading some license related things, it seems like it might be
> possible to just put the BSD license that the rest of breakpad uses on
> the files? I don't think it can be considered a 'derived' work since
> the only thing I used them for was to determine how to parse a file
> format that is itself not publicly licensed, in an entirely different
> language for an entirely different purpose. But like I said, I never
> deal with this stuff usually, so I could be completely wrong.

IANAL, but it generally hinges on whether you actually copy/pasted code
from there vs. just reading their code and writing your own version. If
you included code verbatim then you're bound by their license. If not,
it's an original work and you can license it however you see fit.

>
> Also, I have made some fixes/improvements in the code since I
> originally posted it, the Xbox 360 compiler had some divergent output
> from the standard VC++ compiler.
>
You're the second person I've heard of using Breakpad on the Xbox360. :)

-Ted

Jake Shadle

unread,
Sep 13, 2013, 1:14:43 PM9/13/13
to google-brea...@googlegroups.com
Here are the the updated files, I just stuck an MIT license on since that was easiest, the caveats from the first drop still mostly apply, but I tried to get its output as close to the original as possible, other than how the function signatures are printed (the relevant info is still mostly there).

Yah, the reason we are starting to use Breakpad is that now with the new consoles and mobile stuff we are now targeting an ungodly number of platforms, so moving to a single format for doing callstack resolution of asserts, memory traces, crashes etc just made sense for the long run.
PDBHeaders.h
PDBParser.h
utils.cpp
PDBParser.cpp
utils.h

Ted Mielczarek

unread,
Mar 22, 2014, 10:31:34 PM3/22/14
to google-brea...@googlegroups.com
On 9/13/2013 1:14 PM, Jake Shadle wrote:
> Here are the the updated files, I just stuck an MIT license on since
> that was easiest, the caveats from the first drop still mostly apply,
> but I tried to get its output as close to the original as possible,
> other than how the function signatures are printed (the relevant info
> is still mostly there).
>
>
Thanks for the code! I finally got around to playing with this, it is
way faster than the stock dump_syms. I took the liberty of sticking the
code up on github[1]. I stuck a VC++ project and a simple main method to
run it as a binary. I also fixed a bug that was preventing it from
parsing our (rather large) main PDB file. It takes 20 seconds to parse
it vs. 2 minutes 20 seconds for Breakpad's dump_syms. It's still missing
a few things (it's not printing any STACK WIN lines with FPO unwind
data, for example), but that seems very fixable.

-Ted

1. https://github.com/luser/dump_syms

Jake Shadle

unread,
Mar 23, 2014, 3:57:06 AM3/23/14
to google-brea...@googlegroups.com
Cool, glad you found it helpful! Yah, sorry it is missing the FPO unwind data, we never run with the FPO flag on though, so I just didn't bother adding it myself.  Exercise for the reader I guess. :)

Bruce Dawson

unread,
Mar 23, 2014, 12:46:39 PM3/23/14
to google-brea...@googlegroups.com
Is FPO data only generated when compiling with /Oy? I'm always surprised by developers who still compile with that option. Being able to quickly get call stacks, in xperf traces and other tools is *so* valuable and the benefits of /Oy so illusory, it seems like a bad bet. We moved away from /Oy three years ago and it has been great. Every now and then I need to profile some external code built with /Oy and it is frustrating.


--
You received this message because you are subscribed to a topic in the Google Groups "google-breakpad-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-breakpad-discuss/F0jMWxmWk0M/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-breakpad-d...@googlegroups.com.

To post to this group, send email to google-brea...@googlegroups.com.

Jake Shadle

unread,
Mar 23, 2014, 4:02:59 PM3/23/14
to google-brea...@googlegroups.com
Yes, I believe so, I never encountered it in our symbol files since none of our game executables or tools use /Oy, for the same reasons you mention.

Ted Mielczarek

unread,
Mar 23, 2014, 4:14:03 PM3/23/14
to google-brea...@googlegroups.com
On 3/23/14 12:46 PM, Bruce Dawson wrote:
> Is FPO data only generated when compiling with /Oy? I'm always
> surprised by developers who still compile with that option. Being able
> to quickly get call stacks, in xperf traces and other tools is *so*
> valuable and the benefits of /Oy so illusory, it seems like a bad bet.
> We moved away from /Oy three years ago and it has been great. Every
> now and then I need to profile some external code built with /Oy and
> it is frustrating.
>
I'm not sure, to be honest, but we do build with /Oy and all the
resultant headaches. Life is tough when you have dozens of people
running dozens of benchmarks on your code at every release. :)

In any event, I doubt it'll be too hard to add that in. I'm also
interested in the possibility of porting this to run on Linux so I can
use it in my "fetch symbols from Microsoft's symbol server" script
without having to run the whole thing on Windows.

-Ted


Jake Shadle

unread,
Mar 23, 2014, 4:31:59 PM3/23/14
to google-brea...@googlegroups.com
Should be pretty easy to get it on Linux, I tried to just use standard C++ stuff, the only thing you'll need to take out is the Concurrency:: stuff, but that should be it I think.

Jake Shadle

unread,
Mar 23, 2014, 4:37:15 PM3/23/14
to google-brea...@googlegroups.com
Oh and the Windows API specific stuff of course...and redeclare the windows structures. Been a while since I looked at this code.

w...@chromium.org

unread,
Apr 14, 2014, 8:21:10 PM4/14/14
to google-brea...@googlegroups.com, iva...@chromium.org
Hi,

I've been doing some recent work on breakpad symbol parsing for the Chromium Win64 project and we've hit also some issues parsing some of the larger 64bit binaries.  This work here looks very promising!

It looks like to get this ready for general use, we would have to add the STACK WIN (for FPO) and STACK CFI (for Win64 calls).  Jake, how much more work are you planning to do on this tool - e.g. adding those two things, or would it be best for me to just continue from where you left off?

Regards,

Will

Jake Shadle

unread,
Apr 15, 2014, 7:44:25 AM4/15/14
to google-brea...@googlegroups.com, iva...@chromium.org
I am still working on this code as the need arises (I submitted a crashfix in the git project Ted made), but I personally had no plans to add STACK WIN/CFI since we never run with FPO and don't use exceptions. I think the basis for that work would be fairly easy though, since the code itself is pretty simple, the only hard part would be figuring out where/how to read that data, since the PDB is essentially just a dumping ground for different data, so some sections have alignment requirements, some don't etc.
Reply all
Reply to author
Forward
0 new messages