We've had several forms of these discussions, and I'd really like to avoid adding an escape hatch.
I think we should be investing in (5) in parallel with (4)
From my side, I've done two things:
I made a rewriting tool that automatically reverses the out-of-lined special member functions:
crrev.com/c/5784484Unfortunately, Chrome /really/ loves forward declares, which means the current rewriter doesn't work—many types don't have a complete definition in the header, so the result just doesn't compile. I'm working on limiting the scope of the rewrites using some different criteria, but I've been a bit distracted by other things that came up in the last week. The goal here is to try to get an idea of how much binary size will be impacted—and if we feel it's useful, we can use it for whatever rewrites we deem appropriate.
The second thing is to abuse Mojo code generation as a proxy for measuring the size impact. Since the plugin does not warn on generated code, we can use it to conduct some quick experiments. Some numbers measuring the effect when changing how the destructor is defined [1]:
| arm32 | arm64 | CL
------------------------+---------------+----------------+----------------------------
From these results:
- It's not exactly clear how clang decides to inline things when there's no ALWAYS_INLINE or NOINLINE attribute. The results for both `= default` and a destructor with an explicitly empty body (meaning that the destructor would never be trivial) both lie somewhere in between the ALWAYS_INLINE and NOINLINE experiments. It would be really nice if someone could help shed light on this.
- It's interesting that the most straightforward options for defaulting destructors have minimal effects on arm32. It would be interesting to understand what arm32 build flags are affecting that and how.
- NOINLINE + `= default` looks somewhat viable, but the problem is NOINLINE overrides PGO inlining decisions, and that is probably not desirable. If we could fix it... maybe this could be an option?
- Can we somehow default to whatever codegen behavior arm32 is using for special member functions that are defaulted in the header?
We should also get some updated desktop numbers. Previously, the impact on Linux desktop was much higher (+600KB for a stripped binary with a similar Mojo change). That seems quite large, if those numbers still hold up :)
I haven't checked how many instances/instantiations were affected. Knowing that would be useful for getting a better estimate of the actual effect if this restriction were lifted. It is also important to note that this is only the effect of *destructors*; we do not measure the effect of using = default for constructors or copy operations or move operations.
Being able to understand and (ideally) improve codegen would be nice.
There are a couple other concerns that would have to be addressed/mitigated:
- more types need to be completely defined in the headers, which means more includes.
- ignoring the increase in textual includes, the compiler also needs to do more redundant work to instantiate the special member functions in each TU that includes a header with defaulted special operations.
These would both affect build times negatively. I would suggest that in most cases, we still want to default special members out-of-line in the .cc, and we would only allow explicit = default in headers (or omitting it altogether) for structs.
Finally, if we decide that we can tolerate the size/build time increase without any further work, it would be nice to have tooling to measure how this is affecting the binary size/build times so we could determine if follow up toolchain work is needed.
Daniel
[1] We do not measure other special members here, because it's harder and the underlying Mojo struct itself may not actually be intrinsically copyable or movable.