Missing symbols on XP waterfall bots

32 views
Skip to first unread message

Gabriel Charette

unread,
Feb 19, 2015, 8:56:09 PM2/19/15
to Chromium-dev
Hello folks,

looks like symbols are missing (or at least stacks are not symbolizing) on XP waterfall bots.

All we know is that this has been the case for at least 8 days (200th run @ http://build.chromium.org/p/chromium.win/builders/XP%20Tests%20%281%29?numbuilds=200 shows missing symbols)...

Any ideas?

Thanks,
Gab

Gabriel Charette

unread,
Feb 19, 2015, 8:58:46 PM2/19/15
to Gabriel Charette, Chromium-dev, cpu, wit...@chromium.org, bruce...@chromium.org
Bruce confirmed that all the PDBs are dropped where they should be, so the issue is with the stacks somehow not being symbolized..?

Scott Graham

unread,
Feb 19, 2015, 10:07:47 PM2/19/15
to Gabriel Charette, Chromium-dev, cpu, Mike Wittman, Bruce Dawson
There was something about "or" not working in .isolate files, maybe https://code.google.com/p/chromium/codesearch#chromium/src/chrome/browser_tests.isolate&l=204 isn't actually shipping it over?

(Or is that what was confirmed? I don't know how to see what files are actually packaged or copied to the swarming agent.)

--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev

Scott Graham

unread,
Feb 19, 2015, 10:25:15 PM2/19/15
to Gabriel Charette, Chromium-dev, cpu, Mike Wittman, Bruce Dawson
I confirmed by running

python swarming.py reproduce -S https://chromium-swarm.appspot.com 25bbc9e581982d10

and then suspending all the processes that browser_tests.exe.pdb is making it (at least to my machine). Debugging that browser_tests.exe locally in the same directory finds minimal symbols

...

Scott Graham

unread,
Feb 19, 2015, 10:28:29 PM2/19/15
to Gabriel Charette, Chromium-dev, cpu, Mike Wittman, Bruce Dawson
I don't much context in the CL that added that, but incorrect or old versions of dbghelp would be my guess; the functionality from stock XP changed a lot over time.

Gabriel Charette

unread,
Feb 19, 2015, 10:35:27 PM2/19/15
to Scott Graham, Gabriel Charette, Chromium-dev, cpu, Mike Wittman, Bruce Dawson
Also noteworthy, from inspecting the PDBs in my local build, the two biggest are unit_tests.exe.pdb and browser_tests.exe.pdb both of which are slightly bigger than 1 GB (component+incremental+Release build).

cpu@ was saying that the 1GB PDB size threshold had been a problem in the past with some Windows tools.

Could it be that we use an old version of dbghelp that has such an issue (FWIW, the stacks on the Vista bots are fine).

Bruce had some ideas on how to strip down the PDBs we upload to the bots if we're willing to let go of some live debugging abilities.

Scott Graham

unread,
Feb 19, 2015, 11:27:15 PM2/19/15
to Gabriel Charette, Paweł Hajdan, Jr., Chromium-dev, cpu, Mike Wittman, Bruce Dawson
On Thu, Feb 19, 2015 at 7:34 PM, Gabriel Charette <g...@chromium.org> wrote:
Also noteworthy, from inspecting the PDBs in my local build, the two biggest are unit_tests.exe.pdb and browser_tests.exe.pdb both of which are slightly bigger than 1 GB (component+incremental+Release build).

cpu@ was saying that the 1GB PDB size threshold had been a problem in the past with some Windows tools.

The pdb limit is 4G in current VS2013 (fixed after Update 2).

The fastbuild=1 pdb is only 260M in any case, so I don't think that's affecting the bots (none are fastbuild=0).
 

Could it be that we use an old version of dbghelp that has such an issue (FWIW, the stacks on the Vista bots are fine).

I'm pretty sure the stacks on the XP bots aren't working because of the code in stack_trace_win.cc that I linked to.

Or do you mean why that code was added? An old dbghelp seems plausible. Maybe +phajdan.jr recalls what the symptoms/repro of the hang were. We could try either pulling a redist of dbghelp into the isolate, or install a recent Debugging Tools for Windows on the XP VM image.

Bruce Dawson

unread,
Feb 20, 2015, 12:28:13 AM2/20/15
to Scott Graham, Gabriel Charette, Paweł Hajdan, Jr., Chromium-dev, cpu, Mike Wittman
If that line of code is the problem (and it sure seems suspicious) then symbols have been broken on XP since April 2013 (CL link is https://codereview.chromium.org/13863008). Does that seem possible?

One interesting (maybe?) data point is that sometimes the symbol lookup fails and the code knows it:
Backtrace (build 35559):
(No symbol) [0x5001E406]
(No symbol) [0x0073008B]
(No symbol) [0x006C0061]
(No symbol) [0x8BC80304]

Other times the lookup gives rubbish results but the code thinks it succeeded:
Backtrace (build 35743):
(No symbol) [0x364A7002]
RelaunchChromeBrowserWithNewCommandLineIfNeeded [0x02DD0C98+33831016]
RelaunchChromeBrowserWithNewCommandLineIfNeeded [0x029F6B84+29792084]
RelaunchChromeBrowserWithNewCommandLineIfNeeded [0x02DD0BB1+33830785]
ovly_debug_event [0x00CA76EA+5261994]

In the second case it appears to have found an exported function and is doing all offsets off of there, which isn't helpful.

In both cases we could enhance the results by printing the module name and address. VirtualQuery and GetModuleFileName can give us that. That allows the functions to be decoded if necessary. Of course this is really just a poor man's crash dump.


Paweł Hajdan, Jr.

unread,
Feb 20, 2015, 5:34:43 AM2/20/15
to Bruce Dawson, infr...@chromium.org, Scott Graham, Gabriel Charette, Chromium-dev, cpu, Mike Wittman
+infra-dev

I think the hang was immediately obvious after committing the CL on main waterfall bots.

You can try removing the early exit for XP and see what happens these days.

Help would be welcome with this, I'm not a Windows expert, and the code works on recent Windows.

Note that we may be able to retire XP soon, not sure how worth it is to invest more in it.

Paweł

Timur Iskhodzhanov

unread,
Feb 20, 2015, 8:00:34 AM2/20/15
to phajd...@chromium.org, Bruce Dawson, infr...@chromium.org, Scott Graham, Gabriel Charette, Chromium-dev, cpu, Mike Wittman, Nico Weber
Does the symbolization work without swarming on XP?

Not sure if this is related, but we're having other problems with PDBs on Windows when ASan component builds are swarmed:

Scott Graham

unread,
Feb 20, 2015, 12:31:59 PM2/20/15
to Timur Iskhodzhanov, Paweł Hajdan, Jr., Bruce Dawson, infr...@chromium.org, Gabriel Charette, Chromium-dev, cpu, Mike Wittman, Nico Weber

Carlos Pizano

unread,
Feb 20, 2015, 1:45:07 PM2/20/15
to Scott Graham, Timur Iskhodzhanov, Paweł Hajdan, Jr., Bruce Dawson, infr...@chromium.org, Gabriel Charette, Chromium-dev, cpu, Mike Wittman, Nico Weber
// Work around a mysterious hang on Windows XP.

I think I might have fixed that while I was sheriffing like 6 months ago.

On Fri, Feb 20, 2015 at 10:43 AM, Carlos Pizano <c...@google.com> wrote:
// Work around a mysterious hang on Windows XP.

I think I might have fixed that while I was sheriffing like 6 months ago.

Scott Graham

unread,
Feb 23, 2015, 6:24:35 PM2/23/15
to Carlos Pizano, Timur Iskhodzhanov, Paweł Hajdan, Jr., Bruce Dawson, infr...@chromium.org, Gabriel Charette, Chromium-dev, Mike Wittman, Nico Weber
XP symbolization on bots should work after https://chromium.googlesource.com/chromium/src/+/92d69f7733f4028cf8b3d5b4b97a31173e687594

Cross your fingers that you're not the one that gets to find out if it works properly.

Gabriel Charette

unread,
Feb 23, 2015, 8:00:26 PM2/23/15
to Scott Graham, Carlos Pizano, Timur Iskhodzhanov, Paweł Hajdan, Jr., Bruce Dawson, infr...@chromium.org, Gabriel Charette, Chromium-dev, Mike Wittman, Nico Weber

Awesome, thanks Scott!

Reply all
Reply to author
Forward
0 new messages