Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

Question about unwind tables with clang on Windows

40 views
Skip to first unread message

Erik Chen

unread,
Nov 26, 2019, 6:53:49 PM11/26/19
to Clang maintainers, Albert J. Wong (王重傑), Daniel Cheng, danakj chromium, Takuto Ikuta
Hi Clang experts,
I'm currently looking at speeding up the win10 builder on the Chromium CQ. Is there some flag I can pass to windows clang that will get us the equivalent of "-funwind-tables" while still using symbol_level = 0?

On Linux, we build with symbol_level=0. The resulting binaries can successfully run both StackTrace() and print backtraces on segfaults. This seems to be because we're setting -funwind-tables here.

On the other hand, we're building with symbol_level=1 on Windows, and even add line numbers since this CL. This produces binaries that are much larger, with longer isolate times, etc. I recently ran some numbers and it's quite bad compared to the numbers that tikuta@ ran for win7.

Thanks,
Erik

Reid Kleckner

unread,
Nov 26, 2019, 7:06:29 PM11/26/19
to Erik Chen, Clang maintainers, Albert J. Wong (王重傑), Daniel Cheng, danakj chromium, Takuto Ikuta
Yes, it is possible.

I think what matters on Linux is whether the binary has been stripped or not, and it sounds like with symbol_level=0 no stripping occurs, so function names are available in .symtab. I think -funwind-tables is separable. I would expect clang to produce unwind tables in .eh_frame all the time by default, except perhaps on 32-bit x86. In any case, unwind tables are clearly always desirable whether the binary is stripped or not.

On Windows, symbol_level=1 used to get you the behavior that it sounds like Linux is getting at symbol_level=0. This was implemented by not emitting debug info in the compiler, and then asking the linker to produce a PDB. The only information it had was function names, so that's all the user (StackTrace()) would get. I changed this behavior some time ago to match Linux symbol_level=1 which gives you line info.

I had been assuming all along that line tables were 1. desirable in tryjobs and 2. small enough to isolate, but I'm not an infra expert. If these assumptions don't hold, maybe we want a new symbol_level for "function names only".

One last thing you could try to get the size down is the new flag Amy added, -gno-inline-line-tables. I think it's in the last clang roll. We haven't had time to gather metrics yet.

Sorry, I'm off corp so I couldn't click through on most of the links. =/

Good luck. :)

--
To unsubscribe from this group and stop receiving emails from it, send an email to clang+un...@chromium.org.

Albert J. Wong (王重傑)

unread,
Nov 26, 2019, 7:11:09 PM11/26/19
to Reid Kleckner, Erik Chen, Clang maintainers, Daniel Cheng, danakj chromium, Takuto Ikuta
Thanks Reid!

On linux, symbol_level=0 implies -g0 to Clang. I don't believe your build process has a strip step in general outside of android.

Do you have any ideas which rough flags we're looking to dump into clang to get the "function name only" output into the pdb?

`-g0 -gno-inline-line-tables`

sounds like the first thing to try?

-Albert

Bruce Dawson

unread,
Nov 26, 2019, 7:37:50 PM11/26/19
to Albert J. Wong (王重傑), Reid Kleckner, Erik Chen, Clang maintainers, Daniel Cheng, danakj chromium, Takuto Ikuta
It sounds like what you want is the old symbol_level=1 behavior. That is, leave the compiler alone (no debug flags) and then link with /DEBUG.

It is easy to get what you want, although it does make me wonder if we then need a symbol_level=-1 to indicate "no, seriously, I don't want any debug information", but maybe we never want that.

Alternately, if it turns out that the cost of having line numbers for symbol_level=1 is too great then we could turn that off. It increases the sizes of the PDBs significantly and it greatly slows stack walking on Windows 7 - but line numbers in stack walks and a limited ability to do source-level debugging is nice.
--
Bruce Dawson

Albert J. Wong (王重傑)

unread,
Nov 26, 2019, 7:40:36 PM11/26/19
to Bruce Dawson, Reid Kleckner, Erik Chen, Clang maintainers, Daniel Cheng, danakj chromium, Takuto Ikuta
To be clear, we're discussing doing this for CQ only,  I'd want all the debug symbols I could get in everything else. :)

So the usage cost would be no linenumbers in crashes on CQ runs.

-Albert

Bruce Dawson

unread,
Nov 26, 2019, 8:08:13 PM11/26/19
to Albert J. Wong (王重傑), Reid Kleckner, Erik Chen, Clang maintainers, Daniel Cheng, danakj chromium, Takuto Ikuta
Yep. Reid's change to the meaning of symbol_level=1 mostly just affects the bots - I think most humans select symbol_level=2. So, if the cost of those line numbers exceeds the benefit then we know what to do.

In hindsight we should have defined values from 0 to 3 as none, names-only, names-linenumbers-files, everything. I don't think it's worth renumbering things now.
--
Bruce Dawson

Reid Kleckner

unread,
Nov 26, 2019, 8:46:51 PM11/26/19
to Bruce Dawson, Albert J. Wong (王重傑), Erik Chen, Clang maintainers, Daniel Cheng, danakj chromium, Takuto Ikuta
It sounds like we should:
1. move the bots to symbol_level=0 consistently on all platforms
2. pass /debug to the linker at symbol_level=0 on Windows in gn
This will make all the levels behave consistently across platforms, and get function-names-only on the CQ.

Now that we have LLD, I don't think we need to worry about the extra time cost of writing a PDB anymore. But, someone should measure. You can use win_linker_timing=true to see how much time PDB writing takes.

I had been assuming that line numbers on CQ were really helpful to developers, but it sounds like they are not worth the latency increase and infra costs.

I don't think there's any real use case for a new symbol_level=-1 that strips the binary.

Albert J. Wong (王重傑)

unread,
Nov 27, 2019, 1:12:56 AM11/27/19
to Reid Kleckner, Bruce Dawson, Erik Chen, Clang maintainers, Daniel Cheng, danakj chromium, Takuto Ikuta
On Tue, Nov 26, 2019 at 5:46 PM Reid Kleckner <r...@google.com> wrote:
It sounds like we should:
1. move the bots to symbol_level=0 consistently on all platforms
2. pass /debug to the linker at symbol_level=0 on Windows in gn
This will make all the levels behave consistently across platforms, and get function-names-only on the CQ.

Parroting this back, that means tomorrow I could add a GN change to put /debug into the link-line for windows while simultaneously setting symbol_level=0 and we'd have what we want?
 
Now that we have LLD, I don't think we need to worry about the extra time cost of writing a PDB anymore. But, someone should measure. You can use win_linker_timing=true to see how much time PDB writing takes.

I had been assuming that line numbers on CQ were really helpful to developers, but it sounds like they are not worth the latency increase and infra costs.

It's an open question... one of the reasons it wasn't an issue earlier is that there were many other factors causing CQ to be slow. Over the past month, a bunch of us have been beating down CQ runtimes across a number of builders such that compile on windows (or really, zipping/uploading/downloading/unzipping of artificts on windows) starts to look like a significant enough time sink to evaluate. Hence the question.

If it's okay with y'all, I'd suggest we move to function-names-only for a bit and then ask if people are annoyed about it. We can always move the symbol_level back up.

-Albert

Nico Weber

unread,
Nov 27, 2019, 8:32:01 AM11/27/19
to Albert J. Wong (王重傑), Reid Kleckner, Bruce Dawson, Erik Chen, Clang maintainers, Daniel Cheng, danakj chromium, Takuto Ikuta
On Wed, Nov 27, 2019 at 1:12 AM Albert J. Wong (王重傑) <ajw...@chromium.org> wrote:
On Tue, Nov 26, 2019 at 5:46 PM Reid Kleckner <r...@google.com> wrote:
It sounds like we should:
1. move the bots to symbol_level=0 consistently on all platforms
2. pass /debug to the linker at symbol_level=0 on Windows in gn
This will make all the levels behave consistently across platforms, and get function-names-only on the CQ.

Parroting this back, that means tomorrow I could add a GN change to put /debug into the link-line for windows while simultaneously setting symbol_level=0 and we'd have what we want?

Yes. (/debug is alread on the link line for symbol_levels > 0; we could just add it unconditionally.)

But if that's what we want, I'm confused what symbol_level=1 is supposed to be there for.

Historically, the lay of land was:

a) symbol_level=2: Full symbols, and slow-ish compiles and links. For use by devs who like interactive debuggers.

b) symbol_level=1: On non-Win, enough symbols to get stacks with line numbers, and on Win enough to get stacks with function names without line numbers (and after rnk's change you link to, Win behaved like non-Win). The non-Win (and now Win) behavior still adds some amount of debug info to compiles, so compiles and links slow down a bit, but you get good stacks. The old Win behavior was that compiles didn't emit debug info, but the link step did (but since no type debug info needed linking, that was also fast). This was for use on bots, and devs regularly complained to us when we broke it, so there's some evidence that having working stacks (with line numbers) is considered useful by devs. Hence rnk's change -- the decision at the time was that this is a good machine time / dev time tradeoff.

c) symbol_level=0: No debug info at all. For local use by devs who don't like using interactive debuggers, very fast links. (But since bots use symbol_level=1, no shared goma cache with bots, only with other devs).

Using symbol_level=0 on Linux is a recent change, yes? Do you have more details on that? 

Takuto Ikuta

unread,
Nov 27, 2019, 8:41:54 AM11/27/19
to Nico Weber, Albert J. Wong (王重傑), Reid Kleckner, Bruce Dawson, Erik Chen, Clang maintainers, Daniel Cheng, danakj chromium
On Wed, Nov 27, 2019 at 10:32 PM Nico Weber <tha...@chromium.org> wrote:
On Wed, Nov 27, 2019 at 1:12 AM Albert J. Wong (王重傑) <ajw...@chromium.org> wrote:
On Tue, Nov 26, 2019 at 5:46 PM Reid Kleckner <r...@google.com> wrote:
It sounds like we should:
1. move the bots to symbol_level=0 consistently on all platforms
2. pass /debug to the linker at symbol_level=0 on Windows in gn
This will make all the levels behave consistently across platforms, and get function-names-only on the CQ.

Parroting this back, that means tomorrow I could add a GN change to put /debug into the link-line for windows while simultaneously setting symbol_level=0 and we'd have what we want?

Yes. (/debug is alread on the link line for symbol_levels > 0; we could just add it unconditionally.)

But if that's what we want, I'm confused what symbol_level=1 is supposed to be there for.

Historically, the lay of land was:

a) symbol_level=2: Full symbols, and slow-ish compiles and links. For use by devs who like interactive debuggers.

b) symbol_level=1: On non-Win, enough symbols to get stacks with line numbers, and on Win enough to get stacks with function names without line numbers (and after rnk's change you link to, Win behaved like non-Win). The non-Win (and now Win) behavior still adds some amount of debug info to compiles, so compiles and links slow down a bit, but you get good stacks. The old Win behavior was that compiles didn't emit debug info, but the link step did (but since no type debug info needed linking, that was also fast). This was for use on bots, and devs regularly complained to us when we broke it, so there's some evidence that having working stacks (with line numbers) is considered useful by devs. Hence rnk's change -- the decision at the time was that this is a good machine time / dev time tradeoff.

c) symbol_level=0: No debug info at all. For local use by devs who don't like using interactive debuggers, very fast links. (But since bots use symbol_level=1, no shared goma cache with bots, only with other devs).

Using symbol_level=0 on Linux is a recent change, yes? Do you have more details on that? 

symbol_level for x86 linux was reduced to 0 for more than 1 year ago in crrev.com/c/938883. I think this thread is raised independently from that but to reduce CQ cycle time of win10 builder.

Nico Weber

unread,
Dec 3, 2019, 10:40:22 AM12/3/19
to Albert J. Wong (王重傑), Reid Kleckner, Bruce Dawson, Erik Chen, Clang maintainers, Daniel Cheng, danakj chromium, Takuto Ikuta
What's the status here? Did anything happen after this thread?

Bruce Dawson

unread,
Dec 19, 2019, 6:24:05 PM12/19/19
to Nico Weber, Albert J. Wong (王重傑), Reid Kleckner, Erik Chen, Clang maintainers, Daniel Cheng, danakj chromium, Takuto Ikuta
Nothing has happened. I got asked to look at crbug.com/985255 (the timeouts on Windows 7 debug tests when generating call stacks). I tested with -gno-inline-line-tables and this helps greatly with release builds (dropping the size of base_unittests.exe.pdb by 43%) but doesn't help at all for debug builds (due to the lack of inlining). And it is debug builds where we have the problem, so that's no use. I put details in the bug.

So, I think we have two choices:
1) Change symbol_level=0 on Windows to have function names (add /DEBUG everywhere) and switch the bots to use symbol_level=0 everywhere. This means that we will not get line numbers on bots, ever. I'm not sure if we currently get them on Linux.
2) Change symbol_level=1 to only have filename/line-number information for release builds, at least on the bots.

I'm inclined to prefer option #2 because it is more tightly scoped and gives us filename/line-number information in most cases. I guess we could change symbol_level=0 on Windows to always have function names and then just change the Windows builders, or debug Windows builders, to use symbol_level=0. So really there are lots of variant choices.

crbug.com/985255 is assigned to me right now but if somebody else wants to work on it over the holidays that's fine with me. It should be a simple change or no change in Chromium and then a possible change in the build machine configurations.
--
Bruce Dawson

Bruce Dawson

unread,
Dec 20, 2019, 6:07:28 PM12/20/19
to Nico Weber, Albert J. Wong (王重傑), Reid Kleckner, Erik Chen, Clang maintainers, Daniel Cheng, danakj chromium, Takuto Ikuta
Changing the meaning of symbol_level=0 on Windows is fairly easy, but changing all of the Windows builders that use symbol_level=1 to symbol_level=0 looks very messy. I put details in crbug.com/985255. It's probably best to continue discussion there.
--
Bruce Dawson

Reply all
Reply to author
Forward
0 new messages