Various strings leaking into the binary even when built against cap'n'proto lite

26 views
Skip to first unread message

Vitali Lovich

unread,
Mar 21, 2020, 8:52:19 AM3/21/20
to Cap'n Proto
I'm trying to figure out where these strings are coming from but I can't quite make it out.

I have the following test.capnp file:
@0x95058ce93b4e8e0a;
const name :Text = "Foo";

> capnp compile -oc++ test.capnp
> g++ -DCAPNP_LITE -c -std=c++17 -o test.capnp.o test.capnp.c++ -I <path to capnp headers>
> strings -o test.capnp.o | grep name
test.capnp:name

If the test.capnp is in another folder that folder name appears as well (e.g. if I do capnp compile -oc++ foo/bar/test.capnp I get foo/bar/test.capnp:name in the compiled object file).

I'm building as far as I can tell in LITE mode but I can't quite figure out where these strings are coming from. They're not plain ASCII strings in the generated code, nor can I find any obvious hex representation of this string in the c++. Is there a way to strip these strings? What's their purpose?

Thanks,
Vitali

Vitali Lovich

unread,
Mar 21, 2020, 9:11:12 AM3/21/20
to Cap'n Proto
Bah. Still waking up. Was looking for hex instead of decimal. The text is in the b_e1f50d377b95ad12 variable & exported through the bp_ capnp::word pointer.

I'm still not really clear on the purpose this table is serving. Is it just for reflection? Is there an expectation that the compiler will strip these as dead symbols if I'm not using them?

Vitali Lovich

unread,
Mar 21, 2020, 11:00:02 AM3/21/20
to Cap'n Proto
And to answer my own question, these symbols get stripped if you remember to use -ffunction-sections & -fdata-sections when building and -Wl,--gc-sections when linking if you don't use them. Still not sure I understand the purpose is of putting the path to the file & the symbolic constant name into the .cpp in the first place.

Kenton Varda

unread,
Mar 21, 2020, 11:17:45 AM3/21/20
to Vitali Lovich, Cap'n Proto
This is the encoded schema, in the format defined by `struct Node` in schema.capnp. Yes, it's used for reflection.

It's weird that they aren't omitted in lite mode. I can't remember if that was an intentional decision. That said, I think we should deprecate lite mode anyway, and instead recommend link-time GC. I think it's too hard to choose the right feature boundary for all use cases, and breaking things down into fine-grained feature flags creates too much combinatorial complexity.

-Kenton

--
You received this message because you are subscribed to the Google Groups "Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/capnproto/CAF8PYMj0ZKqHPENR2DcYycU7LsCQSuX1me2jvR3UeevXVv25Ew%40mail.gmail.com.

Vitali Lovich

unread,
Mar 21, 2020, 4:41:38 PM3/21/20
to Kenton Varda, Cap'n Proto
So what's interesting is that if I actually use the constant (and maybe it has to be a struct? not clear if this impacts other types), link-time GC is insufficient and doesn't get rid of the reflected schema even though I don't use that portion (at least AFAICT). I'm happy if link-time GC gets rid of this (& us not having this on for our binary was our own oversight). Is there some adjustment needed so that just using the constant doesn't pull in the schema along with it? Or maybe I'm accessing it incorrectly? I have 2 spots in the code. 1 is:

fooStruct.setValue(SOME_CONSTANT)

the other is

SOME_CONSTANT.get().getField()

where fooStruct is an instance of a capnproto struct builder and SOME_CONSTANT is a constant struct that has the same type as fooStruct:value does. I believe both accesses cause the schema to be pulled in.

Kenton Varda

unread,
Mar 21, 2020, 7:14:58 PM3/21/20
to Vitali Lovich, Cap'n Proto
Ah, yes. In the case of constants, the constant's schema is actually the backing store for the constant itself -- since the constant's value appears inside its own schema, and we didn't want to have redundant copies.

What is the issue with having these exactly? Is it bloat, or are you worried about reverse engineering?

-Kenton

Vitali Lovich

unread,
Mar 21, 2020, 7:17:09 PM3/21/20
to Kenton Varda, Cap'n Proto
Code names of upcoming products just because of the filename in our source tree. Bloat isn’t ideal but meh. It’s probably less than 1kb total.

Kenton Varda

unread,
Mar 22, 2020, 8:59:33 PM3/22/20
to Vitali Lovich, Cap'n Proto
As a quick hack, you could redact the names using a sed command on the generated code that overwrites them with X's (with the same length).

Otherwise we need to design and implement some feature to remove these, I guess, in particular deciding what to do about constants.

-Kenton
Reply all
Reply to author
Forward
0 new messages