pdb file lookup

1,120 views
Skip to first unread message

Paweł Hajdan, Jr.

unread,
Jan 31, 2013, 11:07:45 AM1/31/13
to chromium-dev
I'm working on https://code.google.com/p/chromium/issues/detail?id=168411 so that we can get meaningful stack traces on Windows - this is pretty important for debugging crashes only seen on bots (including trybots) etc.

Now everything works when the build is used on the same machine (as on trybots - I've confirmed that they get a correct symbolized stack trace), but when it is transferred from the builder to tester (which is how most of the waterfall works) the symbol lookup fails.

I suspect some path information is written into .exe/.dll files to point to .pdbs, and it becomes off when moved to the bot. Is there any way to make Windows find the .pdbs? I've noticed the naming of pdb for say chrome.exe is chrome.exe.pdb, but it used to be chrome.pdb I think. Another possible reason I can see is some kind of timestamp mismatch (one file gets extracted slightly earlier than the other) that makes Windows discard a good pdb "thinking" it is mismatched.

Do you have some ideas how to further debug it, and where to look for more info? I've tried MSDN, but didn't really learn anything new.

Paweł

Sigurður Ásgeirsson

unread,
Jan 31, 2013, 11:15:54 AM1/31/13
to Paweł Hajdan, Jr., chromium-dev
The path and identifier of the PDB file (GUID and generation) are written into the executable. The simplest way to make this just work (tm), is to expose a HTTP symbol server someplace, and set the bots up with a symbol path that'll fetch the symbols automatically by HTTP at need. Alternatively you can expose a symbol server on a file share.
See e.g. http://msdn.microsoft.com/en-us/library/windows/desktop/ms680693(v=vs.85).aspx for a description of the M$-provided tools for this. There are also some python scripts around that'll peek into the generated executables, extract the PDB's ID and path to allow you to generate a symbol-server compliant directory hierarcy. 




Paweł

--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev

Chris Hamilton

unread,
Jan 31, 2013, 11:16:06 AM1/31/13
to phajd...@chromium.org, chromium-dev
The absolute path to the PDB on the builder is stored directly in the
.exe/.dll, as well as a GUID. When searching for PDBs the OS will
first look in the exact path specified, and then it will use the
basename (chrome.dll.pdb, for example), and start searching elsewhere.
It will look in the paths specified by the _NT_SYMBOL_PATH environment
variable, then the _NT_ALT_SYMBOL_PATH variable, and finally in the
directory alongside the module. It looks for files with the same
basename as the PDB specified in the module itself, and looks for a
matching GUID.

http://msdn.microsoft.com/en-us/library/windows/desktop/ms680689(v=vs.85).aspx

You can get information as to what PDB the module wants by using
"dumpbin /headers foo.dll" and looking at the resulting Debug
Directories information (the GUID and path will be printed there).

The Syzygy project has a little 'pdbfind' utility which gives you
detailed information as the result of a PDB search, using the same
mechanism that the various debugging tools use (the DbgHelp library).
You can find that tool in our latest release archive here:
http://syzygy-archive.commondatastorage.googleapis.com/index.html?path=builds/official/1287/binaries.zip

Cheers,

Chris

On Thu, Jan 31, 2013 at 11:07 AM, Paweł Hajdan, Jr.
<phajd...@chromium.org> wrote:

Ryan Sleevi

unread,
Jan 31, 2013, 11:50:59 AM1/31/13
to chr...@chromium.org, Chromium-dev, Paweł Hajdan, Jr.

Why do we not just use /PDBALTPATH:%_PDB%.%_EXT% to force the PDB path to be relative (no absolute paths at all)

I thought the issue was more about getting PDBs from builders to testers.

Note that if you are running into ZIP file limits (I think xusydoc@ suggested this was an issue?) You can precompress the PDB using makecab to generate .PD_ files

Marc-Antoine Ruel

unread,
Jan 31, 2013, 1:16:35 PM1/31/13
to Ryan Sleevi, Chris Hamilton, Chromium-dev, Paweł Hajdan, Jr.
A few notes here;
  • Reenabling PDB support for try jobs could be done as an optional flag, if someone wants it badly. In practice it wouldn't be very hard to implement. Generating PDB at all, even if they are not archived, has a significant performance cost so it needs to be off by default.
  • For try jobs, archival are going to be done on isolateserver.appspot.com, as described in the test isolation design doc.
  • The PDB's content will need to be independent of the time of the day, phase of the moon, etc. Chris forgot to say but he wrote a tool to do it. It's just not wired in the chromium build process yet.
  • Then it'll be possible to archive the PDB on isolateserver along the executable. This is separate from a standard symbol server but this is better; the 7 days caching is done properly and no maintenance / clean up overhead is needed.
  • My goal is to kill zip files, but it's not done yet so sorry for the maintenance crap window still left.
Note that "I don't have a workstation <OS>" is kinda lame reason for debugging an issue on the Try Server. Get over it and get one. The only good reason is "I can't reproduce locally" and this is usually (but not always) a side-effect of a race condition because your workstation is too fast. Then anyway the answer is http://go/chrometryserver and takes a try slave off the network.

Pawel, do not enable PDBs unless strictly necessary. The performance cost vastly outweighs the occasional benefit. Please measure exactly the performance degradation before going forward with it.

Thanks,

M-A



2013/1/31 Ryan Sleevi <rsl...@chromium.org>

Paweł Hajdan, Jr.

unread,
Feb 1, 2013, 8:31:08 AM2/1/13
to Marc-Antoine Ruel, Ryan Sleevi, Chris Hamilton, Chromium-dev
FYI, PDB support is currently re-enabled, with the following notes:

1. With fastbuild=1 (which bots use), only linker generates debug info and not the compiler. This is the minimum needed to get stack trace symbolization to work. See https://codereview.chromium.org/12038100 for implementation.

2. So far nobody complained about things, so if possible I'd like to avoid "commit wars" with people reverting things without notice or anything like that. I can fix outstanding issues if there are complaints.

3. This is not so much about debugging issues on the try server or on the bots, but just getting a meaningful data from say chromium-build-logs.appspot.com . If you are fixing a test that was disabled because of a crash, often one look at the stack trace is very helpful to diagnose the issue, it can actually lead to a simple and straightforward fix. We also have stack traces for assertion failures, it's important to keep them symbolized as well. Compared to that, taking a trybot for debugging is much more work, and actually doesn't always result in a repro.

4. Performance measurements:

4.1. On "Win Builder" package_build went from 3 minutes to 6 minutes. Total cycle time of the bot is 21 minutes.
4.2. On "Win Builder (dbg)" package_build went from 2 minutes to 3 minutes. Total cycle time is 9 minutes.
4.3. On "Win 7 Tests (1)" extract_build went from 0.5 minute to 1 minute. Total cycle time is over 30 minutes.
4.4. On "Win 7 Tests (dbg)(1)" extract_build went from 1 minute to 2 minutes. Total cycle time is 30 minutes.

4.5. My conclusion is that this is a fair tradeoff. 1-2 minute build time differences are nothing compared to time spent on retrying flaky browser tests. Two retries are all it takes to break even (and we often retry more). Having symbolized stack traces will help fixing flakiness. I guess some people may disagree, but please take into account that disabling important features (symbolized stack traces in this case) for build speed is not necessarily the best idea. It is a bit similar to using unsafe compiler flags that may result in broken program because it runs faster (the analogy is obviously far from perfect).

4.6. This can be switched off on buildbot using package_pdb_files=False factory property. The cost seems higher on "Win Builder", so I'm fine with for example disabling pdbs for Release builds but preserving them for Debug as a middle-ground solution. I can make necessary changes if needed.

5. I've fixed zip file issues, see https://code.google.com/p/chromium/issues/detail?id=168411 . This will be a benefit to our build infrastructure anyway.

6. Saving .pdb files somewhere for reference is one thing, but for now my goal is to have symbolized stack traces in the logs.

Paweł

Isaac Levy

unread,
Feb 1, 2013, 3:37:21 PM2/1/13
to phajd...@chromium.org, Marc-Antoine Ruel, Ryan Sleevi, Chris Hamilton, Chromium-dev
On Friday, February 1, 2013, Paweł Hajdan, Jr. wrote:
FYI, PDB support is currently re-enabled, with the following notes:

1. With fastbuild=1 (which bots use), only linker generates debug info and not the compiler. This is the minimum needed to get stack trace symbolization to work. See https://codereview.chromium.org/12038100 for implementation.

2. So far nobody complained about things, so if possible I'd like to avoid "commit wars" with people reverting things without notice or anything like that. I can fix outstanding issues if there are complaints.

People don't always complain about infra regressions. I don't think this is a good metric to rely on.

3. This is not so much about debugging issues on the try server or on the bots, but just getting a meaningful data from say chromium-build-logs.appspot.com . If you are fixing a test that was disabled because of a crash, often one look at the stack trace is very helpful to diagnose the issue, it can actually lead to a simple and straightforward fix. We also have stack traces for assertion failures, it's important to keep them symbolized as well. Compared to that, taking a trybot for debugging is much more work, and actually doesn't always result in a repro.

4. Performance measurements:

4.1. On "Win Builder" package_build went from 3 minutes to 6 minutes. Total cycle time of the bot is 21 minutes.
4.2. On "Win Builder (dbg)" package_build went from 2 minutes to 3 minutes. Total cycle time is 9 minutes.
4.3. On "Win 7 Tests (1)" extract_build went from 0.5 minute to 1 minute. Total cycle time is over 30 minutes.
4.4. On "Win 7 Tests (dbg)(1)" extract_build went from 1 minute to 2 minutes. Total cycle time is 30 minutes.

4.5. My conclusion is that this is a fair tradeoff. 1-2 minute build time differences are nothing compared to time spent on retrying flaky browser tests. Two retries are all it takes to break even (and we often retry more). Having symbolized stack traces will help fixing flakiness. I guess some people may disagree, but please take into account that disabling important features (symbolized stack traces in this case) for build speed is not necessarily the best idea. It is a bit similar to using unsafe compiler flags that may result in broken program because it runs faster (the analogy is obviously far from perfect).

This might be the right conclusion, but it does not seem obvious to me.  Symbolization will only help fix flakiness if people are actively looking at the traces. An extra 4 minutes in the overall time from CL landing to windows tests finishing is significant. 

4.6. This can be switched off on buildbot using package_pdb_files=False factory property. The cost seems higher on "Win Builder", so I'm fine with for example disabling pdbs for Release builds but preserving them for Debug as a middle-ground solution. I can make necessary changes if needed.

Most likely the speed difference is because the two builders use machines with different performance characteristics, not because of the build type.  What's the total time delta on the builders?  You mention cost to package_build but I'd expect slower compiles too. 

I think a better approach would be to default pdb generation to off, turn it on on one builder, and evaluate the benefits over a period of time (weeks). Changing all the main windows bots seems premature. 

5. I've fixed zip file issues, see https://code.google.com/p/chromium/issues/detail?id=168411 . This will be a benefit to our build infrastructure anyway.

Good fixes, but not relevant to this discussion, I think.  
--

Alex Pakhunov

unread,
Feb 1, 2013, 3:50:58 PM2/1/13
to il...@chromium.org, phajd...@chromium.org, Marc-Antoine Ruel, Ryan Sleevi, Chris Hamilton, Chromium-dev
This might be the right conclusion, but it does not seem obvious to me.  Symbolization will only help fix flakiness if people are actively looking at the traces. 

Just my 2c. A meaningful stack trace if the first thing I'm looking for when investigating why some test failed. If I didn't write the test (which is most of the time) a stack trace is a key to understanding what the test does and how the changes that wen't in could affect it. I find that the output of the test is often irrelevant and rather obscure for a person that is not familiar with the test.

--
Alex.

Randy Smith

unread,
Feb 1, 2013, 6:50:29 PM2/1/13
to maruel...@google.com, Ryan Sleevi, Chris Hamilton, Chromium-dev, Paweł Hajdan, Jr.
On Thu, Jan 31, 2013 at 1:16 PM, Marc-Antoine Ruel <mar...@chromium.org> wrote:
A few notes here;
  • Reenabling PDB support for try jobs could be done as an optional flag, if someone wants it badly. In practice it wouldn't be very hard to implement. Generating PDB at all, even if they are not archived, has a significant performance cost so it needs to be off by default.
  • For try jobs, archival are going to be done on isolateserver.appspot.com, as described in the test isolation design doc.
  • The PDB's content will need to be independent of the time of the day, phase of the moon, etc. Chris forgot to say but he wrote a tool to do it. It's just not wired in the chromium build process yet.
  • Then it'll be possible to archive the PDB on isolateserver along the executable. This is separate from a standard symbol server but this is better; the 7 days caching is done properly and no maintenance / clean up overhead is needed.
  • My goal is to kill zip files, but it's not done yet so sorry for the maintenance crap window still left.
Note that "I don't have a workstation <OS>" is kinda lame reason for debugging an issue on the Try Server. Get over it and get one. The only good reason is "I can't reproduce locally" and this is usually (but not always) a side-effect of a race condition because your workstation is too fast. Then anyway the answer is http://go/chrometryserver and takes a try slave off the network.

From my perspective, you've left out a possibility, which is flaky test failures that rarely reproduce, and then only on the bots.  Maybe the answer there is go/chrometryserver + --gtest_repeat=-1, but that's a fair amount of a hassle to go through, and most people will instead just let the flake keep happening.  So if we can get symbolized stack dumps without "too much" performance overhead for failing tests, I'd really like to see it.

There's certainly an overhead cost on the bots that wouldn't be worth paying for getting the symbolized stack dumps.  But techniques and tools that help us baseline reduce test flakiness are worth some pain. 

-- Randy

Isaac Levy

unread,
Feb 4, 2013, 4:59:29 AM2/4/13
to rds...@google.com, maruel...@google.com, Ryan Sleevi, Chris Hamilton, Chromium-dev, Paweł Hajdan, Jr.
There is general agreement of the value from symbolization of stacks.
But a lot of time has been invested into making the minimal windows
build (and buildbots) fast, so changing the defaults should be done
carefully. Do we want symbolization when building with fastbuild=1?
[crrev.com/179129]

One of my colleagues says we can avoid the package build costs by
hosting a symbolization server on the builder, rather than
transmitting symbols to the testers as has been recently added.

-Isaac

Marc-Antoine Ruel

unread,
Mar 30, 2013, 8:37:01 AM3/30/13
to Isaac Levy, Randy Smith, Ryan Sleevi, Chris Hamilton, Chromium-dev, Paweł Hajdan, Jr.
I'd like to revisit this decision. Linking on Windows try slaves is sadly very slow and it doesn't make sense to link debug symbols for a compile only slave, like win_x64_rel. Look at the compile times for yourself;
As an example,
barely compiled and mostly linked, yet it took almost 17 minutes to complete. That's too long for a compile only slave and need to be optimized. A non-trivial compile quickly jumps well over 30 minutes and ultimately in the 1:30 hour range for a full build on both win_rel and win_x64_rel.

Note that I'd be fine with a fastbuild=0,1,2 where 2 is no symbol at all. This could be dynamically set depending on the testfilter.

M-A


2013/2/4 Isaac Levy <il...@chromium.org>

Paweł Hajdan, Jr.

unread,
Apr 1, 2013, 8:17:01 PM4/1/13
to Marc-Antoine Ruel, Isaac Levy, Randy Smith, Ryan Sleevi, Chris Hamilton, Chromium-dev
On Sat, Mar 30, 2013 at 5:37 AM, Marc-Antoine Ruel <mar...@google.com> wrote:
Note that I'd be fine with a fastbuild=0,1,2 where 2 is no symbol at all. This could be dynamically set depending on the testfilter.

Reply all
Reply to author
Forward
0 new messages