Why are kConstants ending up in .data rather than .rodata?

273 views
Skip to first unread message

Andrew Grieve

unread,
Jul 20, 2017, 2:32:34 PM7/20/17
to c...@chromium.org
Wondering if anyone has insight into this:


Description copied for convenience:
Discovered that model_prefixes here:
https://cs.chromium.org/chromium/src/media/base/android/media_codec_util.cc?type=cs&q=IsSurfaceViewOutputSupported&sq=package:chromium&l=348

  const char* model_prefixes[] = {// Exynos 4 (Mali-400)
                                  "GT-I9300", "GT-I9305", "SHV-E210",
                                  // Snapdragon S4 (Adreno-225)
                                  "SCH-I535", "SCH-J201", "SCH-R530",
                                  "SCH-I960", "SCH-S968", "SGH-T999",
                                  "SGH-I747", "SGH-N064", 0};

is defined in .data:

>>> Print(size_info.symbols.WhereFullNameMatches('model_prefixes'))
Showing 1 symbols (1 unique) with total pss: 48 bytes
.text=0 bytes    .rodata=0 bytes    .data*=48 bytes   .bss=0 bytes    total=48 bytes
Number of unique paths: 1

Index | Running Total | Section@Address | PSS | Path
------------------------------------------------------------
0)         48 (100.0%) d@0x2cd23b0  48             media/base/android/media_codec_util.cc
             .L_ZZN5media14MediaCodecUtil28IsSurfaceViewOutputSupportedEvE14model_prefixes


None of these changes moved it to rodata:
 "const char* const",
 "static const char* const",
 "static constexpr const char* const", d


Here's a codesearch showing we have 540 such arrays of strings, and the ones I spot-checked are all in .data rather than .rodata.
https://cs.chromium.org/search/?q=const%5C+char%5C*%5C+%5Ba-zA-Z0-9_%5D%2B%5C%5B%5C%5D%5C+%3D%5C+%7B+-file:third_party&sq=package:chromium&type=cs


Using supersize on libmonochrome.so, searching for kFoo names show that many "constants" that are not in .rodata:
>>> len(size_info.symbols.WhereFullNameMatches(r'\bk[A-Z]').WhereInSection('r'))
10829
>>> len(size_info.symbols.WhereFullNameMatches(r'\bk[A-Z]').WhereInSection('d'))
3086

Figuring out how to move these into .rodata should cause a bit less per-process RAM to be used.

Chris Blume

unread,
Jul 20, 2017, 3:13:20 PM7/20/17
to Andrew Grieve, cxx
The thing in .data is the pointer, not the string.
There are 12 pointers there and the .data size is 48. That's 4 bytes per pointer.

Even though the strings can be shared between processes, they could be loaded into different areas of memory and thus the pointers can't be known at compile-time, I believe.

What happens if you store char[][]?


Chris Blume |
 Software Engineer | cbl...@google.com | +1-614-929-9221

--
You received this message because you are subscribed to the Google Groups "cxx" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cxx+uns...@chromium.org.
To post to this group, send email to c...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/cxx/CABiQX1WMKCbKQ48LHoS49gDhfT2aYKqhGhqw8L946d6aq%2BNsjg%40mail.gmail.com.

Chris Blume

unread,
Jul 20, 2017, 3:21:33 PM7/20/17
to Andrew Grieve, cxx
Sorry, I meant "they could be loaded into different areas of memory on different app launches".
The point being it isn't known at compile-time.


Chris Blume |
 Software Engineer | cbl...@google.com | +1-614-929-9221

Andrew Grieve

unread,
Jul 20, 2017, 3:51:47 PM7/20/17
to Chris Blume, Andrew Grieve, cxx
Ah, that makes sense! Thanks!

And if anyone else needs it spelled out, this tip lead me to find a decent article on it here: http://nullprogram.com/blog/2016/10/27/

David Benjamin

unread,
Jul 20, 2017, 4:38:27 PM7/20/17
to Andrew Grieve, Chris Blume, cxx
Even if not building position-independent code, model_prefixes as written can't go in .rodata because the pointers themselves are not const. If you do const char* const, you at least get the benefits where the compiler stops you from modifying them. They'll also go in a .data.rel.ro section, rather than .data.rel, which makes the runtime linker mark the pages read-only after the relocations have run. It gets you a little more const enforcement, though either way the relocations prevent sharing between processes.

Mark Mentovai

unread,
Jul 20, 2017, 4:53:13 PM7/20/17
to David Benjamin, Andrew Grieve, Chris Blume, cxx
David Benjamin wrote:

Even if not building position-independent code, model_prefixes as written can't go in .rodata because the pointers themselves are not const.

It actually can go in .rodata in this case (or .data.rel.ro with -fPIC). model_prefixes is a local variable and the compiler can see all points of use, ascertain that it’s never written to, and stick it in the appropriate section. I see this happening even at -O0.

% cat t.cc
void F() {
  const char* s[] = {"1", "22"};
}
% clang++ -S t.cc -o-
[…]
        .type   .L.str,@object          # @.str
        .section        .rodata.str1.1,"aMS",@progbits,1
.L.str:
        .asciz  "1"
        .size   .L.str, 2

        .type   .L.str.1,@object        # @.str.1
.L.str.1:
        .asciz  "22"
        .size   .L.str.1, 3

        .type   .L_ZZ1FvE1s,@object     # @_ZZ1FvE1s
        .section        .rodata,"a",@progbits
        .p2align        4
.L_ZZ1FvE1s:
        .quad   .L.str
        .quad   .L.str.1
        .size   .L_ZZ1FvE1s, 16

Although, for proper protection, “const char* const” should really be used here at the C++ source level, to prevent accidentally modifying the pointers in model_prefixes.

Move “s” out to (not-anonymous) namespace scope and write extern const char* s[] = {"1", "22"}; instead (no extra “const”), and you’ll definitely get .data pointers (with or without -fPIC) pointing to .rodata characters. This is also what you’d get for similarly-declared static member variables.

bruce...@google.com

unread,
Jul 20, 2017, 11:33:17 PM7/20/17
to cxx, davi...@chromium.org, agr...@chromium.org, cbl...@google.com
Please declare them as const char* const so that both the pointers and the strings go into the read-only data segment. I have been tracking down violations of this best-practice and fixing them but there are always more. Having the pointers be const is pure win for so many reasons, across platforms.

You can also go constexpr const char* which is arguably even better, but a bit confusing. The constexpr applies to the pointers and the const applies to the char, but that's not obvious at first.

Obviously if the pointers need to be modified then this doesn't apply.

Andrew Grieve

unread,
Jul 21, 2017, 9:41:18 AM7/21/17
to Bruce Dawson, cxx, davi...@chromium.org, Andrew Grieve, Chris Blume
Are there any tools we could use (clang plugin?) that could enforce a rule of "all variables with the name kFoo must be const"?

Bruce Dawson

unread,
Jul 21, 2017, 1:27:31 PM7/21/17
to Andrew Grieve, cxx, David Benjamin, Chris Blume
I have a Windows tool that dumps data on global variables from PDBs and it reports on the size, name, and section of all globals. I look over the results occasionally and fix glitches. Doing it as a post-link process like this is advantageous because it detects cases that static analysis will miss. For instance, it found a large const array that was not in the read-only section because it was being initialized at run-time, so fixing that saved a lot of code as well as moving data to read-only. It also found a VC++ compiler quirk that can easily force const data into the read-write section (which we worked around in the most important cases).

Another case this tool detects is duplicate variables. This happens when a static/const variable of non-integral type* is declared in a header file and the compiler generates multiple instances, potentially one per translation unit.
​​
I think the low hanging fruit for these two inefficiencies are dealt with, but if somebody wanted to put in a build verification step to make sure things stay clean that would be great.

I've attached the reports for chrome.dll and chrome_child.dll from the latest 64-bit canary. The first section shows duplicated variables, with the second column being how many bytes are wasted. I just noticed that chrome_child.dll has two copies of ft_adobe_glyph_list which is wasting ~56 kB. This is because this array is defined in a header file. I guess I didn't fix it because it's in freetype so a bit more work.

Bruce

* Integral types declared as const in a header file generally don't cause storage to be allocated but if they have a constructor they sometimes do because software is hard
--
Bruce Dawson
chrome_globals.zip

Andrew Grieve

unread,
Jul 21, 2017, 2:12:56 PM7/21/17
to Bruce Dawson, Andrew Grieve, cxx, David Benjamin, Chris Blume
Nice (well... except for the double glyph thing :P). I've added this check to the wishlist for our binary size bot:

Bruce Dawson

unread,
Jul 21, 2017, 4:24:07 PM7/21/17
to Andrew Grieve, cxx, David Benjamin, Chris Blume
I checked my email and discovered that I reported the double-glyph problem in freetype in December 2016, and it was "fixed" then:

http://savannah.nongnu.org/bugs/?49949

Unfortunately the fix was incorrect and I failed to double-check it. The fix omits the "extern" on the array definition and the array declaration. This means that we end up with two definitions, one of which is initialized and the other which is not! I'm surprised this didn't break something. I'm talking to the author of the original fix about getting it fixed for real, which saves 67,200 bytes of data in chrome_child.dll.
--
Bruce Dawson
Reply all
Reply to author
Forward
0 new messages