[llvm-dev] [RFC] LLVM Busybox Proposal

184 views
Skip to first unread message

Leonard Chan via llvm-dev

unread,
Jun 21, 2021, 1:55:07 PM6/21/21
to llvm-dev
Hello all,

When building LLVM tools, including Clang and lld, it's currently possible to use either static or shared linking for LLVM libraries. The latter can significantly reduce the size of the toolchain since we aren't duplicating the same code in every binary, but the dynamic relocations can affect performance. The former doesn't affect performance but significantly increases the size of our toolchain.

We would like to implement a support for a third approach which we call, for a lack of better term, "busybox" feature, where everything is compiled into a single binary which then dispatches into an appropriate tool depending on the first command. This approach can significantly reduce the size by deduplicating all of the shared code without affecting the performance.

In terms of implementation, the build would produce a single binary called `llvm` and the first command would identify the tool. For example, instead of invoking `llvm-nm` you'd invoke `llvm nm`. Ideally we would also support creation of `llvm-nm` symlink which redirects to `llvm` for backwards compatibility.
This functionality would ideally be implemented as an option in the CMake build that toolchain vendors can opt into.

The implementation would have to replace `main` function of each tool with an entrypoint regular function which is registered into a tool registry. This could be wrapped in a macro for convenience. When the "busybox" feature is disabled, the macro would expand to a `main` function as before and redirect to the entrypoint function. When the "busybox" feature is enabled, it would register the entrypoint function into the registry, which would be responsible for the dispatching based on the tool name. Ideally, toolchain maintainers would also be able to control which tools they could add to the "busybox" binary via CMake build options, so toolchains will only include the tools they use.

One implementation detail we think will be an issue is merging arguments in individual tools that use `cl::opt`. `cl::opt` works by maintaining a global state of flags, but we aren’t confident of what the resulting behavior will be when merging them together in the dispatching `main`. What we would like to avoid is having flags used by one specific tool available on other tools. To address this issue, we would like to migrate all tools to use `OptTable` which doesn't have this issue and has been the general direction most tools have been already moving into.

A second issue would be resolving symlinks. For example, llvm-objcopy will check argv[0] and behave as llvm-strip (ie. use the right flags + configuration) if it is called via a symlink that “looks like” a strip tool, but for all other cases it will run under the default objcopy mode. The “looks like” function is usually an `Is` function copied in multiple tools that is essentially a substring check: so symlinks like `llvm-strip`, strip.exe, and `gnu-llvm-strip-10` all result in using the strip “mode” while all other names use the objcopy mode. To replicate the same behavior, we will need to take great care in making sure symlinks to the busybox tool dispatch correctly to the appropriate llvm tool, which might mean exposing and merging these `Is` functions.

Some open questions:
- People's initial thoughts/opinions?
- Are there existing tools in LLVM that already do this?
- Other implementation details/global states that we would also need to account for?

- Leonard

Tom Stellard via llvm-dev

unread,
Jun 21, 2021, 2:05:11 PM6/21/21
to Leonard Chan, llvm-dev

I think it's an interesting idea. My main concern is that adding a new CMake
option for this going to complicate the build system and make future CMake
improvements more difficult.

Do you have any idea of how much performance /
toolchain size gains you will get from this approach?

-Tom

> - Are there existing tools in LLVM that already do this?
> - Other implementation details/global states that we would also need to account for?
>
> - Leonard
>

> _______________________________________________
> LLVM Developers mailing list
> llvm...@lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Leonard Chan via llvm-dev

unread,
Jun 21, 2021, 2:15:54 PM6/21/21
to Tom Stellard, llvm-dev
I think it's an interesting idea.  My main concern is that adding a new CMake
option for this going to complicate the build system and make future CMake
improvements more difficult.

That's fair. I'm working on a WIP version now and attempting to mitigate the amount of CMake changes. Ideally, this would be controlled behind a single CMake option that doesn't end-user behavior, and we would have an upstream buildbot that could just enable this flag and ensure tools dispatched through busybox work as-is.
 
Do you have any idea of how much performance /
toolchain size gains you will get from this approach?

Locally we've found resolving dynamic relocations takes about 20% of the runtime for various dynamically linked LLVM tools. We'd have to double check if this is still the case because recently there have been some changes around semantic interposition that may help with this. I'm working on a WIP version that we can compare against for size (that is, size of separate tools + LLVM shared libs vs combined busybox size).

Fangrui Song via llvm-dev

unread,
Jun 21, 2021, 2:17:35 PM6/21/21
to Leonard Chan, llvm-dev
On 2021-06-21, Leonard Chan via llvm-dev wrote:
>Hello all,
>
>When building LLVM tools, including Clang and lld, it's currently possible
>to use either static or shared linking for LLVM libraries. The latter can
>significantly reduce the size of the toolchain since we aren't duplicating
>the same code in every binary, but the dynamic relocations can affect
>performance. The former doesn't affect performance but significantly
>increases the size of our toolchain.

The dynamic relocation claim is not true.

A thin executable using just -Bsymbolic libLLVM-13git.so is almost
identical to a mostly statically linked PIE.

I added -Bsymbolic-functions to libLLVM.so and libclang-cpp.so which
has claimed most of the -Bsymbolic benefits.

The shared object approach *can be* inferior to static linking plus
-Wl,--gc-sections because with libLLVM.so and libclang-cpp.so we are
making many many API dynamic and that inhibits the --gc-sections
benefits. However, if clang and lld are shipped together with
llvm-objdump/llvm-readobj/llvm-objcopy/.... , I expect the non-GCable
code due to shared objects will be significantly smaller.

I am conservative on adding yet another mechanism.

crunchgen. As you said, argv[0] checking code needs to be taken care of.
We should make these executables' main file not have colliding symbols.
I have cleaned up a lot of files.

John Criswell via llvm-dev

unread,
Jun 21, 2021, 4:00:46 PM6/21/21
to Leonard Chan, llvm-dev
Dear Leonard et al.,

Will Dietz built a multiplexing tool using LLVM that does just this: it takes several programs and merges them together into one “busy box-esque” program that determines which main() function to call based on the argv[0] string.

The relevant paper is here: https://dl.acm.org/doi/abs/10.1145/3276524.

Will included the multiplexer code in the ALLVM code base.  You can look at it here: https://publish.illinois.edu/allvm-project/software/.  I believe the Github link is https://github.com/allvm/allvm-tools.  I’ve been told that the code was built with LLVM 4.0, so it’d need to be updated to mainline.

I haven’t used it myself, but the idea of having LLVM multiplex itself seems cool, and it might make sense to give LLVM the ability to multiplex programs instead of expending effort doing it manually for LLVM and only getting the benefit in LLVM.

Regards,

John Criswell

--
John Criswell
Associate Professor
University of Rochester





Fangrui Song via llvm-dev

unread,
Jun 21, 2021, 4:42:32 PM6/21/21
to Leonard Chan, llvm-dev

A few points.

In an ideal ELF world only external function calls need PLT entries.
Currently shared objects have PLT entries for in-dso function calls
because default visibility non-local symbols are preemptible by default
and the linker will produce PLT entries. -Bsymbolic-functions suppresses
PLT entries for in-dso symbols.

---

I have an approach for users whose libLLVM.so libclang-cpp are closed
sets and want to get GC benefits. I'll use libLLVM.so as an example.

* Identify the list of executables which link against libLLVM.so: exe0, exe1, exe2.
* For each exe, do a relocatable link of its own code (usually llvm/tools/llvm-foobar/*.o). Get the undefined symbol list.
* Take the union of the undefined symbol lists of all exe. Create a version script file with these symbols `global:` and `local: *`.
* Re-link libLLVM.so with --version-script.

The resulting libLLVM.so only provides dynamic symbols needed by these executables.

This is still tricky and I am not sure how much it can decrease the size.

Ben Craig via llvm-dev

unread,
Jun 21, 2021, 5:19:10 PM6/21/21
to llvm...@lists.llvm.org

Do you have a plan for Windows?  Sym links on Windows are mostly limited to administrators and developer mode.

Alexandre Ganea via llvm-dev

unread,
Jun 22, 2021, 10:48:34 AM6/22/21
to Ben Craig, llvm-dev, Leonard Chan

For pure compatibility purposes, in place of symlinks we could have facade executables on Windows. But that isn’t favorable in terms of performance, the cost of launching additional executables is quite high on Windows. I wonder if the LLVM installer could have a way to switch between both schemes: if admin mode is available, create symlinks; otherwise fall back to facades.

 

De : llvm-dev <llvm-dev...@lists.llvm.org> De la part de Ben Craig via llvm-dev
Envoyé : June 21, 2021 5:19 PM
À : llvm...@lists.llvm.org
Objet : Re: [llvm-dev] [RFC] LLVM Busybox Proposal

Alexandre Ganea via llvm-dev

unread,
Jun 22, 2021, 11:51:48 AM6/22/21
to Leonard Chan, llvm-dev

Hello Leonard,

 

That is a very interesting idea! This will particularly favor Windows where the LLVM bin/ folder is huge (3.5 GiB) since we don’t have working symlinks out-of-box. This is also going towards the direction that we are pursuing, having Clang and LLD together into an embedded application as suggested by llvm-buildozer [1], however we’re also considering the multi-threading aspect. We took a different route for now, which is loading the existing executables as shared libraries inside our application, but our concern was less the binary size on disk, and more about runtime performance (building time).

 

Regarding migrating every option to `OptTable`, are you suggesting removing `cl::opt` and `CommandLineParser` altogether? I can count 3,597 instances of `cl::opt` in the whole monorepo. This can be a tedious task even with automation, since it would need some level of classification into the appropriate .td file. What would be the approach for the migration? To alleviate the issue of having `cl::opt`s cross the tool domain, we could temporarily auto-generate a dictionary of `cl::opt`s available for each tool? That could be a quick intermediary step, while waiting for a complete migration.

 

Once other issue I can see is symbols clashing at link time. Having everything in the same executable requires internal ABI compatibly throughout, ie. compiling with the same #defines and linking with the same (system) libraries. I’m wondering if there was a analysis done in that regards? But maybe that is not an issue.

 

Best,

Alex.

 

[1] https://reviews.llvm.org/D86351

 

De : llvm-dev <llvm-dev...@lists.llvm.org> De la part de Leonard Chan via llvm-dev
Envoyé : June 21, 2021 1:55 PM
À : llvm-dev <llvm...@lists.llvm.org>
Objet : [llvm-dev] [RFC] LLVM Busybox Proposal

Leonard Chan via llvm-dev

unread,
Jun 22, 2021, 7:24:55 PM6/22/21
to llvm-dev
Small update: I have a WIP prototype of the tool at https://reviews.llvm.org/D104686. The prototype only includes llvm-objcopy and llvm-objdump packed together, but we're seeing size benefits from busyboxing those two compared against having two separate tools. (More details in the prototype's description.) I don't plan on landing this as-is anytime soon and there's still some things I'd like to improve/change and get feedback on.

To answer some replies:

- Ideally, we could start off with an incremental approach and not package large tools like clang/lld off the bat. The llvm-* tools seem like a good place to start since they're generally a bunch of relatively small binaries that all share a subset of functions in libLLVM, but don't necessarily use all of libLLVM, so statically linking them together (with --gc-sections) can help dedup a lot of shared components vs having separate statically compiled tools. In my measurements, the busybox tool containing llvm-objcopy+objdump is negligibly larger than llvm-objdump on its own (a couple KB difference) indicating a lot of shared code between objdump and objcopy.

- Will Dietz's multiplexing tool looks like a good place to start from. The only concern I can see though is mostly the amount of work needed to update it to LLVM 13.

- We don't have plans for windows support now, but it's not off the table. (Been mostly focusing on *nix for now). Depending on overall traction for this idea, we could approach incrementally and add support for different platforms over time.

- I'm starting to think the `cl::opt` to `OptTable` issue might be orthogonal to the busybox implementation. The tool essentially dispatches to different "main" functions in different tools, but as long as we don't do anything within busybox after exiting that tool's main, then the global state issues we weren't sure of with `cl::opt` might not be of any concern now. It may be an issue down the line if, let's say, the tool flags moved from being "owned" by the tools themselves to instead being "owned" by busybox, and then we'd have to merge similarly-named flags together. In that case, migrating these tools to use `OptTable` may be necessary since (I think) `OptTable` should handle this. This may be a tedious task, but this is just to say that busybox won't need to be immediately blocked on it.

- I haven't seen any issues with colliding symbols when linking (although I've only merged two tools for now). I suspect that with small-ish llvm-* tools, the bulk of their code is shared from libLLVM, and they have their own distinct logic built on top of it, which could mean a low chance of conflicting internal ABIs.

Fangrui Song via llvm-dev

unread,
Jun 22, 2021, 8:00:17 PM6/22/21
to Leonard Chan, llvm-dev

-DLLVM_LINK_LLVM_DYLIB=on -DCLANG_LINK_CLANG_DYLIB=on -DLLVM_TARGETS_TO_BUILD=X86 (custom1)
vs
-DLLVM_TARGETS_TO_BUILD=X86 (custom2)


# This is the lower bound for any multiplexing approach. clang is the largest executable.
% stat -c %s /tmp/out/custom2/bin/clang-13
102900408

I have built clang, lld and a bunch of ELF binary utilities.

% stat -c %s /tmp/out/custom1/lib/libLLVM-13git.so /tmp/out/custom1/lib/libclang-cpp.so.13git /tmp/out/custom1/bin/{clang-13,lld,llvm-{ar,cov,cxxfilt,nm,objcopy,objdump,readobj,size,strings,symbolizer}} | awk '{s+=$1}END{print s}'
138896544

% stat -c %s /tmp/out/custom2/bin/{clang-13,lld,llvm-{ar,cov,cxxfilt,nm,objcopy,objdump,readobj,size,strings,symbolizer}} | awk '{s+=$1}END{print s}'
209054440


The -DLLVM_LINK_LLVM_DYLIB=on -DCLANG_LINK_CLANG_DYLIB=on build is doing a really good job.

A multiplexing approach can squeeze some bytes from 138896544 toward 102900408,
but how much can it do?


>- I'm starting to think the `cl::opt` to `OptTable` issue might be
>orthogonal to the busybox implementation. The tool essentially dispatches
>to different "main" functions in different tools, but as long as we don't
>do anything within busybox after exiting that tool's main, then the global
>state issues we weren't sure of with `cl::opt` might not be of any concern
>now. It may be an issue down the line if, let's say, the tool flags moved
>from being "owned" by the tools themselves to instead being "owned" by
>busybox, and then we'd have to merge similarly-named flags together. In
>that case, migrating these tools to use `OptTable` may be necessary since
>(I think) `OptTable` should handle this. This may be a tedious task, but
>this is just to say that busybox won't need to be immediately blocked on it.

Such improvement is useful even if we don't do multiplexing.
I switched llvm-symbolizer. thakis switched llvm-objdump.
I can look at some binary utilities.

>_______________________________________________

Petr Hosek via llvm-dev

unread,
Jun 23, 2021, 1:00:46 AM6/23/21
to Fangrui Song, llvm-dev
From our perspective as a toolchain vendor, even if using shared libraries could get us closer to static linking in terms of performance, we'd still prefer static linking for the ease of distribution. Dealing with a single statically linked executable is much easier than dealing with multiple shared libraries. This is especially important in distributed compilation environments like Goma.

When comparing performance between static and dynamic linking, I'd also recommend doing a comparison between binaries built with PGO+LTO. Plain -O3 leaves a lot of performance on the table and as far as I'm aware, most toolchain vendors use PGO+LTO.

David Blaikie via llvm-dev

unread,
Jun 23, 2021, 1:10:00 AM6/23/21
to Petr Hosek, llvm-dev
On Tue, Jun 22, 2021 at 10:00 PM Petr Hosek via llvm-dev <llvm...@lists.llvm.org> wrote:
From our perspective as a toolchain vendor, even if using shared libraries could get us closer to static linking in terms of performance, we'd still prefer static linking for the ease of distribution. Dealing with a single statically linked executable is much easier than dealing with multiple shared libraries. This is especially important in distributed compilation environments like Goma.

What makes it especially complicated for distributed compilation environments? (I'd expect a toolchain contains so many files that whether it's one binary, or a binary and a handful of shared libraries wouldn't change the general implementation complexity of a distributed build system?)

Petr Hosek via llvm-dev

unread,
Jun 23, 2021, 1:20:15 AM6/23/21
to David Blaikie, llvm-dev
I guess this depends on a particular implementation of the distributed build system. In the case of Goma, we only supply the compiler binary which was invoked as the command (that binary links glibc as a shared library but we assume that one is supplied by the host system), all other files like headers are passed together with the compiler invocation as inputs. If we used dynamic linking, Goma would need to figure out what other shared libraries need to be sent to the server. It's certainly doable but it's an extra complexity we would like to avoid.

David Blaikie via llvm-dev

unread,
Jun 23, 2021, 1:55:18 AM6/23/21
to Petr Hosek, llvm-dev
On Tue, Jun 22, 2021 at 10:20 PM Petr Hosek <pho...@google.com> wrote:
I guess this depends on a particular implementation of the distributed build system. In the case of Goma, we only supply the compiler binary which was invoked as the command (that binary links glibc as a shared library but we assume that one is supplied by the host system), all other files like headers are passed together with the compiler invocation as inputs. If we used dynamic linking, Goma would need to figure out what other shared libraries need to be sent to the server. It's certainly doable but it's an extra complexity we would like to avoid.

Curious/fair enough - good to know!
 

Fāng-ruì Sòng via llvm-dev

unread,
Jun 23, 2021, 2:08:57 AM6/23/21
to Petr Hosek, llvm-dev
On Tue, Jun 22, 2021 at 10:20 PM Petr Hosek <pho...@google.com> wrote:
>
> I guess this depends on a particular implementation of the distributed build system. In the case of Goma, we only supply the compiler binary which was invoked as the command (that binary links glibc as a shared library but we assume that one is supplied by the host system), all other files like headers are passed together with the compiler invocation as inputs. If we used dynamic linking, Goma would need to figure out what other shared libraries need to be sent to the server. It's certainly doable but it's an extra complexity we would like to avoid.

For non-clang executables, -DLLVM_LINK_LLVM_DYLIB=on just adds one
more DT_NEEDED.
The DT_NEEDED entry can use a $ORIGIN based DT_RUNPATH. Can Goma
detect the libraries shipped with the tools?
I asked because I feel this could be an artificial limitation which
could be straightforwardly addressed in Goma.
A toolchain executable using a accompanying shared object is not rare
(thinking of plugins).

Multiplexing LLVM tools is one alternative but I am a bit concerned
with the extra complexity and the new configuration the build system
needs to support.

https://lists.llvm.org/pipermail/llvm-dev/2021-June/151338.html
mentioned another approach which doesn't require intrusive
modification to the tools.

As for PGO+LTO, you can apply them to libLLVM-13git.so as well.

--
宋方睿

David Chisnall via llvm-dev

unread,
Jun 23, 2021, 4:36:03 AM6/23/21
to llvm...@lists.llvm.org
On 21/06/2021 22:18, Ben Craig via llvm-dev wrote:
> Do you have a plan for Windows?  Sym links on Windows are mostly limited
> to administrators and developer mode.

Is that a problem? Installers generally run with administrator rights
(choco, for example, requires running from an Administrator PowerShell
and that's how most folks I know install LLVM on Windows).

Developers generally need to enable developer mode if they want to run
things that they've built (and doing so is a single toggle switch in
Settings, so it's not a massive obstacle). It should be fairly easy to
try running mklink during CMake if this option is enabled and, if it
fails, error out and tell the person running the build to either enable
developer mode or switch to separate-program builds.

David

Ben Craig via llvm-dev

unread,
Jun 23, 2021, 9:28:30 AM6/23/21
to llvm...@lists.llvm.org
I agree that the official installation case probably isn't an issue.

There are unofficial installation cases that are more annoying. I wouldn't be able to just zip up my llvm dir and hand it to someone else to unzip like I can today.

The just-built case is a bigger deal. I do most of my development on Windows from a standard account (non-admin, non-developer). That's largely by choice, but some IT departments are much more picky. If I need to install something, then I open a distinct admin command prompt.

Requiring development mode to be turned on for LLVM dev is similar to requiring Linux devs to build as root (or at least making a few new programs setuid root).
> https://urldefense.com/v3/__https://lists.llvm.org/cgi-
> bin/mailman/listinfo/llvm-
> dev__;!!FbZ0ZwI3Qg!6h0mnQEDuJEjR37XcJw0w_kT9LkmzSqroG_Y-4zlfJ-
> yjsTgoJCy0w04M6Cy$

David Blaikie via llvm-dev

unread,
Jun 23, 2021, 9:57:52 AM6/23/21
to Ben Craig, llvm...@lists.llvm.org
On Wed, Jun 23, 2021 at 6:28 AM Ben Craig via llvm-dev <llvm...@lists.llvm.org> wrote:
I agree that the official installation case probably isn't an issue.

There are unofficial installation cases that are more annoying.  I wouldn't be able to just zip up my llvm dir and hand it to someone else to unzip like I can today.

The just-built case is a bigger deal.  I do most of my development on Windows from a standard account (non-admin, non-developer).  That's largely by choice, but some IT departments are much more picky.  If I need to install something, then I open a distinct admin command prompt.

Requiring development mode to be turned on for LLVM dev is similar to requiring Linux devs to build as root (or at least making a few new programs setuid root).

None of this would be required - it looks like the discussion is only about an optional build mode that would be opt-in and beneficial to some folks.
 

Mehdi AMINI via llvm-dev

unread,
Jun 23, 2021, 6:43:52 PM6/23/21
to Fāng-ruì Sòng, llvm-dev
On Tue, Jun 22, 2021 at 11:09 PM Fāng-ruì Sòng via llvm-dev <llvm...@lists.llvm.org> wrote:
On Tue, Jun 22, 2021 at 10:20 PM Petr Hosek <pho...@google.com> wrote:
>
> I guess this depends on a particular implementation of the distributed build system. In the case of Goma, we only supply the compiler binary which was invoked as the command (that binary links glibc as a shared library but we assume that one is supplied by the host system), all other files like headers are passed together with the compiler invocation as inputs. If we used dynamic linking, Goma would need to figure out what other shared libraries need to be sent to the server. It's certainly doable but it's an extra complexity we would like to avoid.

For non-clang executables, -DLLVM_LINK_LLVM_DYLIB=on just adds one
more DT_NEEDED.
The DT_NEEDED entry can use a $ORIGIN based DT_RUNPATH. Can Goma
detect the libraries shipped with the tools?
I asked because I feel this could be an artificial limitation which
could be straightforwardly addressed in Goma.
A toolchain executable using a accompanying shared object is not rare
(thinking of plugins).

Multiplexing LLVM tools is one alternative but I am a bit concerned
with the extra complexity and the new configuration the build system
needs to support.

https://lists.llvm.org/pipermail/llvm-dev/2021-June/151338.html
mentioned another approach which doesn't require intrusive
modification to the tools.

As for PGO+LTO, you can apply them to libLLVM-13git.so as well.

Some thoughts if we're getting into PGO+LTO territory, I feel that both methods presented here will be at a disadvantage compared to building clang and lld into their own binaries.
For example I remember that on Mac an important optimization for clang builds was to order the functions in the binary roughly in the order in which they are first encountered during execution, assuming the same behavior for lld you can see the conflicting optimization goal... You can also think about how libSupport may be differently "hot" on a clang PGO profile compared to lld and would result in different optimization. 

LTO also benefits from "internalizing", basically building a static binary where only `main` is exported and everything else becomes an internal linkage is the best case: pointer escaping, global analysis, etc all become more powerful. Optimizing a shared library kind of makes every symbol public, and I suspect the busybox approach may be better on this aspect (you get back to a single public main, but it can reach much more code though).

-- 
Mehdi

Fāng-ruì Sòng via llvm-dev

unread,
Jun 23, 2021, 6:52:07 PM6/23/21
to Mehdi AMINI, llvm-dev
On Wed, Jun 23, 2021 at 3:43 PM Mehdi AMINI <joke...@gmail.com> wrote:
>
>
>
> On Tue, Jun 22, 2021 at 11:09 PM Fāng-ruì Sòng via llvm-dev <llvm...@lists.llvm.org> wrote:
>>
>> On Tue, Jun 22, 2021 at 10:20 PM Petr Hosek <pho...@google.com> wrote:
>> >
>> > I guess this depends on a particular implementation of the distributed build system. In the case of Goma, we only supply the compiler binary which was invoked as the command (that binary links glibc as a shared library but we assume that one is supplied by the host system), all other files like headers are passed together with the compiler invocation as inputs. If we used dynamic linking, Goma would need to figure out what other shared libraries need to be sent to the server. It's certainly doable but it's an extra complexity we would like to avoid.
>>
>> For non-clang executables, -DLLVM_LINK_LLVM_DYLIB=on just adds one
>> more DT_NEEDED.
>> The DT_NEEDED entry can use a $ORIGIN based DT_RUNPATH. Can Goma
>> detect the libraries shipped with the tools?
>> I asked because I feel this could be an artificial limitation which
>> could be straightforwardly addressed in Goma.
>> A toolchain executable using a accompanying shared object is not rare
>> (thinking of plugins).
>>
>> Multiplexing LLVM tools is one alternative but I am a bit concerned
>> with the extra complexity and the new configuration the build system
>> needs to support.
>>
>> https://lists.llvm.org/pipermail/llvm-dev/2021-June/151338.html
>> mentioned another approach which doesn't require intrusive
>> modification to the tools.
>>
>> As for PGO+LTO, you can apply them to libLLVM-13git.so as well.
>
>
> Some thoughts if we're getting into PGO+LTO territory, I feel that both methods presented here will be at a disadvantage compared to building clang and lld into their own binaries.
> For example I remember that on Mac an important optimization for clang builds was to order the functions in the binary roughly in the order in which they are first encountered during execution, assuming the same behavior for lld you can see the conflicting optimization goal... You can also think about how libSupport may be differently "hot" on a clang PGO profile compared to lld and would result in different optimization.

If PGO+LTO is desired, the executables can be split this way, assuming
the performance of
llvm-{ar,cov,cxxfilt,nm,objcopy,objdump,readobj,size,strings,symbolizer}
doesn't matter.

* clang (libLLVM*.a)
* lld + llvm-{ar,cov,cxxfilt,nm,objcopy,objdump,readobj,size,strings,symbolizer}
(libLLVM-13git.so)

> LTO also benefits from "internalizing", basically building a static binary where only `main` is exported and everything else becomes an internal linkage is the best case: pointer escaping, global analysis, etc all become more powerful. Optimizing a shared library kind of makes every symbol public, and I suspect the busybox approach may be better on this aspect (you get back to a single public main, but it can reach much more code though).

With --version-script we can internalize shared object symbols as
well. For example, this has been used to facilitate whole-program
devirtualization (https://reviews.llvm.org/D98686).
With https://lists.llvm.org/pipermail/llvm-dev/2021-June/151338.html
we can get a list of roots which need to be exported.
A thin executable plus a -fvisibility-inlines-hidden +
-Bsymbolic-functions shared object is almost identical to a PIE.

Mehdi AMINI via llvm-dev

unread,
Jun 23, 2021, 7:33:11 PM6/23/21
to Fāng-ruì Sòng, llvm-dev
You can get closer to it but note that:

- You have some non-trivial and non-standard build setup and scripts to workaround the problem (finding roots, etc.), the busybox solution is much more "clean" from this point of view if one can structure it in "normal" C++.
- How does it work on non-ELF platforms?
- It still isn't equivalent: you're still having a large surface API exported by the .so which limits what the optimizer can do (alias analysis, etc.). You won't be able to inject context from the callers there, or inline across the libLLVM.so boundary.

-- 
Mehdi

Fāng-ruì Sòng via llvm-dev

unread,
Jul 2, 2021, 1:15:01 PM7/2/21
to llvm-dev
llvm/tools/ include some binary utilities used as replacement for GNU binutils, e.g. llvm-objcopy, llvm-symbolizer, llvm-nm.
In some old threads people discussed some drawbacks of using cl::opt for user-facing utilities (I cannot find them now).
Switching to OptTable is an appealing solution. I have prepared two patches for two binary utilities: llvm-nm and llvm-strings.

* llvm-strings https://reviews.llvm.org/D104889
* llvm-nm https://reviews.llvm.org/D105330

llvm-symbolizer was switched last year. llvm-objdump was switched by thakis earlier this year.

The switch can fix some corners with lib/Support/CommandLine.cpp. Here is a summary:

* -t=d is removed (equal sign after a short option). Use -t d instead.
* --demangle=0 (=0 to disable a boolean option) is removed. Omit the option or use --no-demangle instead.
* To support boolean options (e.g. --demangle --no-demangle), we don't need to compare their positions (if (NoDemangle.getPosition() > Demangle.getPosition()) , see llvm-nm.cpp)
* grouped short options can be specified with one line `setGroupedShortOptions`, instead of adding cl::Grouping to every short options.
* We don't need to add cl::cat to every option and call `HideUnrelatedOptions` to hide unrelated options from --help. The issue would happen with cl::opt tools if linker garbage collection is disabled or libLLVM-13git.so is used. (See https://reviews.llvm.org/D104363)
* If we decide to support binary utility multiplexting (https://reviews.llvm.org/D104686), we will not get conflicting options. An option may have different meanings in different utilities (especially for one-letter options).

I expect that most users will not observe any difference.

There is a related topic whether we should disallow the single-dash `-long-option` form.
(Discussed in 2019: https://lists.llvm.org/pipermail/llvm-dev/2019-April/131786.html Accept --long-option but not -long-option for llvm binary utilities)
I'd like to disallow -long-option but may want to do this in a separate change.
The main point is that (1) grouped short options have syntax conflict with one-dash long options. (2) the GNU getopt_long style two-dash long option is much more popular.

I can think of potential pushback for some Mach-O specific options, e.g. nm -arch
http://www.manpagez.com/man/1/nm/osx-10.12.6.php says `-arch` has one dash.
If such options may have problems, we can keep supporting one dash forms.
With OptTable, allowing one-dash forms for a specific option is easy.

David Blaikie via llvm-dev

unread,
Jul 2, 2021, 2:17:23 PM7/2/21
to Fāng-ruì Sòng, llvm-dev
On Fri, Jul 2, 2021 at 10:15 AM Fāng-ruì Sòng via llvm-dev <llvm...@lists.llvm.org> wrote:
llvm/tools/ include some binary utilities used as replacement for GNU binutils, e.g. llvm-objcopy, llvm-symbolizer, llvm-nm.
In some old threads people discussed some drawbacks of using cl::opt for user-facing utilities (I cannot find them now).

Would be good to describe some of the known drawbacks/expected benefits.

One potential one (though I don't recall it being discussed recently) would be that maybe this addresses the issue of global ctors in cl::opt? Does OptTable avoid/not use global constructors? That would be nice - it's an ongoing issue that LLVM library users pay for command line argument support they have no need for in the form of global ctor execution time.
 
Switching to OptTable is an appealing solution. I have prepared two patches for two binary utilities: llvm-nm and llvm-strings.

* llvm-strings https://reviews.llvm.org/D104889
* llvm-nm https://reviews.llvm.org/D105330

llvm-symbolizer was switched last year. llvm-objdump was switched by thakis earlier this year.

The switch can fix some corners with lib/Support/CommandLine.cpp. Here is a summary:

* -t=d is removed (equal sign after a short option). Use -t d instead.
* --demangle=0 (=0 to disable a boolean option) is removed. Omit the option or use --no-demangle instead.
* To support boolean options (e.g. --demangle --no-demangle), we don't need to compare their positions (if (NoDemangle.getPosition() > Demangle.getPosition()) , see llvm-nm.cpp)
* grouped short options can be specified with one line `setGroupedShortOptions`, instead of adding cl::Grouping to every short options.
* We don't need to add cl::cat to every option and call `HideUnrelatedOptions` to hide unrelated options from --help. The issue would happen with cl::opt tools if linker garbage collection is disabled or libLLVM-13git.so is used. (See https://reviews.llvm.org/D104363)
* If we decide to support binary utility multiplexting (https://reviews.llvm.org/D104686), we will not get conflicting options. An option may have different meanings in different utilities (especially for one-letter options).

I expect that most users will not observe any difference.

There is a related topic whether we should disallow the single-dash `-long-option` form.
(Discussed in 2019: https://lists.llvm.org/pipermail/llvm-dev/2019-April/131786.html Accept --long-option but not -long-option for llvm binary utilities)
I'd like to disallow -long-option but may want to do this in a separate change.

I'd say definitely do this as a separate change. I expect there'd be a long tail of users after this change ships in an LLVM release, etc, such that we may want to undo some amount of it a long time after the change is made.
 

The main point is that (1) grouped short options have syntax conflict with one-dash long options. (2) the GNU getopt_long style two-dash long option is much more popular.

I can think of potential pushback for some Mach-O specific options, e.g. nm -arch
http://www.manpagez.com/man/1/nm/osx-10.12.6.php says `-arch` has one dash.
If such options may have problems, we can keep supporting one dash forms.
With OptTable, allowing one-dash forms for a specific option is easy.

Fāng-ruì Sòng via llvm-dev

unread,
Jul 2, 2021, 2:27:20 PM7/2/21
to David Blaikie, llvm-dev
On Fri, Jul 2, 2021 at 11:17 AM David Blaikie <dbla...@gmail.com> wrote:
On Fri, Jul 2, 2021 at 10:15 AM Fāng-ruì Sòng via llvm-dev <llvm...@lists.llvm.org> wrote:
llvm/tools/ include some binary utilities used as replacement for GNU binutils, e.g. llvm-objcopy, llvm-symbolizer, llvm-nm.
In some old threads people discussed some drawbacks of using cl::opt for user-facing utilities (I cannot find them now).

Would be good to describe some of the known drawbacks/expected benefits.

The summary is the list of benefits:)
The drawback is some initial boilerplate (e.g. llvm-tblgen -gen-opt-parser-defs in CMakeLists.txt, class NmOptTable in code).
The handling of comma separated options -arch=x86_64,arm64 doesn't have direct OptTable support. llvm::SplitString is needed (just search for SplitString in https://reviews.llvm.org/D105330)
But this doesn't tend to increase complexity because the cl::list<std::string> will need per-value verification anyway.
 
One potential one (though I don't recall it being discussed recently) would be that maybe this addresses the issue of global ctors in cl::opt? Does OptTable avoid/not use global constructors? That would be nice - it's an ongoing issue that LLVM library users pay for command line argument support they have no need for in the form of global ctor execution time.

OptTable is used as a local variable. So yes, it avoids global constructors, thus avoiding cl::opt option name collision.
"If we decide to support binary utility multiplexing" below mentioned this point.

Switching to OptTable is an appealing solution. I have prepared two patches for two binary utilities: llvm-nm and llvm-strings.

* llvm-strings https://reviews.llvm.org/D104889
* llvm-nm https://reviews.llvm.org/D105330

llvm-symbolizer was switched last year. llvm-objdump was switched by thakis earlier this year.

The switch can fix some corners with lib/Support/CommandLine.cpp. Here is a summary:

* -t=d is removed (equal sign after a short option). Use -t d instead.
* --demangle=0 (=0 to disable a boolean option) is removed. Omit the option or use --no-demangle instead.
* To support boolean options (e.g. --demangle --no-demangle), we don't need to compare their positions (if (NoDemangle.getPosition() > Demangle.getPosition()) , see llvm-nm.cpp)
* grouped short options can be specified with one line `setGroupedShortOptions`, instead of adding cl::Grouping to every short options.
* We don't need to add cl::cat to every option and call `HideUnrelatedOptions` to hide unrelated options from --help. The issue would happen with cl::opt tools if linker garbage collection is disabled or libLLVM-13git.so is used. (See https://reviews.llvm.org/D104363)
* If we decide to support binary utility multiplexting (https://reviews.llvm.org/D104686), we will not get conflicting options. An option may have different meanings in different utilities (especially for one-letter options).

I expect that most users will not observe any difference.

There is a related topic whether we should disallow the single-dash `-long-option` form.
(Discussed in 2019: https://lists.llvm.org/pipermail/llvm-dev/2019-April/131786.html Accept --long-option but not -long-option for llvm binary utilities)
I'd like to disallow -long-option but may want to do this in a separate change.

I'd say definitely do this as a separate change. I expect there'd be a long tail of users after this change ships in an LLVM release, etc, such that we may want to undo some amount of it a long time after the change is made.

Thanks for chiming in. 

David Blaikie via llvm-dev

unread,
Jul 2, 2021, 2:40:59 PM7/2/21
to Fāng-ruì Sòng, llvm-dev
On Fri, Jul 2, 2021 at 11:27 AM Fāng-ruì Sòng <mas...@google.com> wrote:
On Fri, Jul 2, 2021 at 11:17 AM David Blaikie <dbla...@gmail.com> wrote:
On Fri, Jul 2, 2021 at 10:15 AM Fāng-ruì Sòng via llvm-dev <llvm...@lists.llvm.org> wrote:
llvm/tools/ include some binary utilities used as replacement for GNU binutils, e.g. llvm-objcopy, llvm-symbolizer, llvm-nm.
In some old threads people discussed some drawbacks of using cl::opt for user-facing utilities (I cannot find them now).

Would be good to describe some of the known drawbacks/expected benefits.

The summary is the list of benefits:)

Ah, it looks more like a list of changes, not necessarily benefits - removing certain syntaxes seems generally like a cost to me (potential to break existing users), rather than an outright benefit.

The API benefits sound nice, though presumably some could be retrofitted to cl::opt if that was the only goal. Side benefits in addition to removing global ctors are nice to have.
 
The drawback is some initial boilerplate (e.g. llvm-tblgen -gen-opt-parser-defs in CMakeLists.txt, class NmOptTable in code).
The handling of comma separated options -arch=x86_64,arm64 doesn't have direct OptTable support. llvm::SplitString is needed (just search for SplitString in https://reviews.llvm.org/D105330)
But this doesn't tend to increase complexity because the cl::list<std::string> will need per-value verification anyway.
 
One potential one (though I don't recall it being discussed recently) would be that maybe this addresses the issue of global ctors in cl::opt? Does OptTable avoid/not use global constructors? That would be nice - it's an ongoing issue that LLVM library users pay for command line argument support they have no need for in the form of global ctor execution time.

OptTable is used as a local variable. So yes, it avoids global constructors,

Nice :)

Mehdi AMINI via llvm-dev

unread,
Jul 2, 2021, 2:58:29 PM7/2/21
to David Blaikie, llvm-dev
On Fri, Jul 2, 2021 at 11:41 AM David Blaikie via llvm-dev <llvm...@lists.llvm.org> wrote:
On Fri, Jul 2, 2021 at 11:27 AM Fāng-ruì Sòng <mas...@google.com> wrote:
On Fri, Jul 2, 2021 at 11:17 AM David Blaikie <dbla...@gmail.com> wrote:
On Fri, Jul 2, 2021 at 10:15 AM Fāng-ruì Sòng via llvm-dev <llvm...@lists.llvm.org> wrote:
llvm/tools/ include some binary utilities used as replacement for GNU binutils, e.g. llvm-objcopy, llvm-symbolizer, llvm-nm.
In some old threads people discussed some drawbacks of using cl::opt for user-facing utilities (I cannot find them now).

Would be good to describe some of the known drawbacks/expected benefits.

The summary is the list of benefits:)

Ah, it looks more like a list of changes, not necessarily benefits - removing certain syntaxes seems generally like a cost to me (potential to break existing users), rather than an outright benefit.

Indeed: it isn't clear to me that these are outright "benefits".

 

The API benefits sound nice, though presumably some could be retrofitted to cl::opt if that was the only goal. Side benefits in addition to removing global ctors are nice to have.
 
The drawback is some initial boilerplate (e.g. llvm-tblgen -gen-opt-parser-defs in CMakeLists.txt, class NmOptTable in code).
The handling of comma separated options -arch=x86_64,arm64 doesn't have direct OptTable support. llvm::SplitString is needed (just search for SplitString in https://reviews.llvm.org/D105330)
But this doesn't tend to increase complexity because the cl::list<std::string> will need per-value verification anyway.
 
One potential one (though I don't recall it being discussed recently) would be that maybe this addresses the issue of global ctors in cl::opt? Does OptTable avoid/not use global constructors? That would be nice - it's an ongoing issue that LLVM library users pay for command line argument support they have no need for in the form of global ctor execution time.

OptTable is used as a local variable. So yes, it avoids global constructors,

Nice :)

Note that MLIR is using cl::opt without global ctor (we build with `-Werror=global-constructors`).

The pattern we use to write a tool with cl::opt and avoid global ctor (and can be used to avoid collision) looks like: https://github.com/llvm/llvm-project/blob/main/mlir/lib/IR/MLIRContext.cpp#L57-L83

The tool that wants to expose the MLIRContext options to the command line calls registerMLIRContextCLOptions() before parsing the command line.
Wouldn't this translate directly to LLVM tools as well with some minor refactoring?

The same applies to all of the infrastructure in MLIR, passes are registered explicitly, etc. This decouples the "is this code linked in" from "options are loaded" annoying part of the global constructors.

-- 
Mehdi





 
 
thus avoiding cl::opt option name collision.
"If we decide to support binary utility multiplexing" below mentioned this point.

Switching to OptTable is an appealing solution. I have prepared two patches for two binary utilities: llvm-nm and llvm-strings.

* llvm-strings https://reviews.llvm.org/D104889
* llvm-nm https://reviews.llvm.org/D105330

llvm-symbolizer was switched last year. llvm-objdump was switched by thakis earlier this year.

The switch can fix some corners with lib/Support/CommandLine.cpp. Here is a summary:

* -t=d is removed (equal sign after a short option). Use -t d instead.
* --demangle=0 (=0 to disable a boolean option) is removed. Omit the option or use --no-demangle instead.
* To support boolean options (e.g. --demangle --no-demangle), we don't need to compare their positions (if (NoDemangle.getPosition() > Demangle.getPosition()) , see llvm-nm.cpp)
* grouped short options can be specified with one line `setGroupedShortOptions`, instead of adding cl::Grouping to every short options.
* We don't need to add cl::cat to every option and call `HideUnrelatedOptions` to hide unrelated options from --help. The issue would happen with cl::opt tools if linker garbage collection is disabled or libLLVM-13git.so is used. (See https://reviews.llvm.org/D104363)
* If we decide to support binary utility multiplexting (https://reviews.llvm.org/D104686), we will not get conflicting options. An option may have different meanings in different utilities (especially for one-letter options).

I expect that most users will not observe any difference.

There is a related topic whether we should disallow the single-dash `-long-option` form.
(Discussed in 2019: https://lists.llvm.org/pipermail/llvm-dev/2019-April/131786.html Accept --long-option but not -long-option for llvm binary utilities)
I'd like to disallow -long-option but may want to do this in a separate change.

I'd say definitely do this as a separate change. I expect there'd be a long tail of users after this change ships in an LLVM release, etc, such that we may want to undo some amount of it a long time after the change is made.

Thanks for chiming in. 
 

The main point is that (1) grouped short options have syntax conflict with one-dash long options. (2) the GNU getopt_long style two-dash long option is much more popular.

I can think of potential pushback for some Mach-O specific options, e.g. nm -arch
http://www.manpagez.com/man/1/nm/osx-10.12.6.php says `-arch` has one dash.
If such options may have problems, we can keep supporting one dash forms.
With OptTable, allowing one-dash forms for a specific option is easy.

Alexandre Ganea via llvm-dev

unread,
Jul 2, 2021, 3:49:39 PM7/2/21
to Mehdi AMINI, David Blaikie, Fangrui Song, llvm-dev

The API benefits sound nice, though presumably some could be retrofitted to cl::opt if that was the only goal. Side benefits in addition to removing global ctors are nice to have.
 

The drawback is some initial boilerplate (e.g. llvm-tblgen -gen-opt-parser-defs in CMakeLists.txt, class NmOptTable in code).

The handling of comma separated options -arch=x86_64,arm64 doesn't have direct OptTable support. llvm::SplitString is needed (just search for SplitString in https://reviews.llvm.org/D105330)

But this doesn't tend to increase complexity because the cl::list<std::string> will need per-value verification anyway.

 

One potential one (though I don't recall it being discussed recently) would be that maybe this addresses the issue of global ctors in cl::opt? Does OptTable avoid/not use global constructors? That would be nice - it's an ongoing issue that LLVM library users pay for command line argument support they have no need for in the form of global ctor execution time.

 

OptTable is used as a local variable. So yes, it avoids global constructors,

 

Nice :)

 

Note that MLIR is using cl::opt without global ctor (we build with `-Werror=global-constructors`).

 

The pattern we use to write a tool with cl::opt and avoid global ctor (and can be used to avoid collision) looks like: https://github.com/llvm/llvm-project/blob/main/mlir/lib/IR/MLIRContext.cpp#L57-L83

 

The tool that wants to expose the MLIRContext options to the command line calls registerMLIRContextCLOptions() before parsing the command line.

Wouldn't this translate directly to LLVM tools as well with some minor refactoring?

 

The same applies to all of the infrastructure in MLIR, passes are registered explicitly, etc. This decouples the "is this code linked in" from "options are loaded" annoying part of the global constructors.

 

-- 

Mehdi

 

[Alexandre Ganea] I think one other issue with cl::opt is that it aggregates the “command-line argument definition” and the “runtime parameter” de facto in a single object (unless cl::location is manually specified to every cl::opt). What MLIR does solves the issue mentioned by David, the fact that every tool pulls/initializes every cl::opt out there. However OptTable solves both problems, and makes the entry point thread-safe.

 

Mehdi AMINI via llvm-dev

unread,
Jul 2, 2021, 4:10:46 PM7/2/21
to Alexandre Ganea, llvm-dev
I agree that removing the global state would be great!
Right now what I see proposed with OptTable (like https://reviews.llvm.org/D104889) seems to just address the tools-specific options, and the value isn't clear to me for these cases, since these options aren't exposed through library entry points.
I don't quite get right now how OptTable would compose at the LLVM scale? Are there examples of libraries exposing pluggable hooks for a tool to aggregate multiple libraries' options and expose them on the command line?

Thanks,

-- 
Mehdi

Fāng-ruì Sòng via llvm-dev

unread,
Jul 2, 2021, 4:27:28 PM7/2/21
to Mehdi AMINI, llvm-dev
The first message listed:

* -t=d is removed (equal sign after a short option). Use -t d instead.
* --demangle=0 (=0 to disable a boolean option) is removed. Omit the option or use --no-demangle instead.
* To support boolean options (e.g. --demangle --no-demangle), we don't need to compare their positions (if (NoDemangle.getPosition() > Demangle.getPosition()) , see llvm-nm.cpp)
* grouped short options can be specified with one line `setGroupedShortOptions`, instead of adding cl::Grouping to every short options.
* We don't need to add cl::cat to every option and call `HideUnrelatedOptions` to hide unrelated options from --help. The issue would happen with cl::opt tools if linker garbage collection is disabled or libLLVM-13git.so is used. (See https://reviews.llvm.org/D104363)

To me these points are all usability issues of cl::opt. I care about not exposing unnecessary interfaces so cl::opt accepting the weird -t=d looks a downside to me.

--demangle=0 is weird and some llvm/llvm/test tests do use cl::opt options this way, so we cannot just remove this usage. As a workaround, we could add a cl::foobar_toggle to a cl::opt to disallow =0.
We would end with more customization for one option, cl::cat (for hiding unrelated options), cl::foobar_toggle (for disallowing =0), and potentially others for other ad-hoc tasks.

I can highlight another thing about the global state of cl::opt => library cl::opt and binary utility cl::opt share the same namespace.
So cl::opt options (usually for debugging or testing) in library code can end up in a tool's list of command line options.
This is usually undesired (e.g. llvm-objdump --x86-asm-syntax in https://reviews.llvm.org/D100433).
People may not notice this if they always use -DLLVM_LINK_LLVM_DYLIB=off and don't use linker garbage collection.

 

Mehdi AMINI via llvm-dev

unread,
Jul 2, 2021, 5:03:50 PM7/2/21
to Fāng-ruì Sòng, llvm-dev
You're not answering my question here, are you? Are you answering to what I mentioned 3 emails before in an answer to David when I wrote "Indeed: it isn't clear to me that these are outright "benefits"."?

Because I still don't see clearly how to build something like `opt` with all the pass and the options with OptTable, how does it all compose?

Thanks,

-- 
Mehdi

Fāng-ruì Sòng via llvm-dev

unread,
Jul 2, 2021, 5:10:04 PM7/2/21
to Mehdi AMINI, llvm-dev
OptTable doesn't have the listed usability issues. Not having the issues is a large benefit to me.
 
Because I still don't see clearly how to build something like `opt` with all the pass and the options with OptTable, how does it all compose?

The proposed changes are specific to binary utilities: llvm-{ar,cxxfilt,nm,objcopy,objdump,readobj,size,strings,symbolizer}.
'opt', 'llc', etc are not in the scope.
(I guess I should have named the utilities more specifically to not cause confusion.)

For llvm-{ar,cxxfilt,nm,objcopy,objdump,readobj,size,strings,symbolizer}, we don't want a random cl::opt in lib/Transforms, lib/MC, orlib/LTO to be accidentally specifiable on the command line.
(In the rare debugging case where such functionality is needed, it needs a -mllvm prefix like what ld.lld does.)


Mehdi AMINI via llvm-dev

unread,
Jul 2, 2021, 6:21:46 PM7/2/21
to Fāng-ruì Sòng, llvm-dev
Maybe, these are subjective though.
(And the confusion above came from the fact that  you just answered the wrong email)
 
 
Because I still don't see clearly how to build something like `opt` with all the pass and the options with OptTable, how does it all compose?

The proposed changes are specific to binary utilities: llvm-{ar,cxxfilt,nm,objcopy,objdump,readobj,size,strings,symbolizer}.
'opt', 'llc', etc are not in the scope.
(I guess I should have named the utilities more specifically to not cause confusion.)

For llvm-{ar,cxxfilt,nm,objcopy,objdump,readobj,size,strings,symbolizer}, we don't want a random cl::opt in lib/Transforms, lib/MC, orlib/LTO to be accidentally specifiable on the command line.
(In the rare debugging case where such functionality is needed, it needs a -mllvm prefix like what ld.lld does.)

Have you looked at what I mentioned with MLIR on how to use cl::opt without global constructor? This has exactly the behavior you're looking for.


-- 
Mehdi

 

Fāng-ruì Sòng via llvm-dev

unread,
Jul 2, 2021, 6:26:28 PM7/2/21
to Mehdi AMINI, llvm-dev
Unless it does something more specially, I don't think it can avoid the issue I mentioned in a previous message:

> I can highlight another thing about the global state of cl::opt => library cl::opt and binary utility cl::opt share the same namespace.

This can be worked around with linker garbage collection by discarding unreferenced cl::opt.

Fāng-ruì Sòng via llvm-dev

unread,
Jul 2, 2021, 7:07:28 PM7/2/21
to Mehdi AMINI, llvm-dev
I re-read these messages. I think you probably meant something more generic - how to design decentralized command line option registry where every library can register some options.
The binary utilities command line parsing issue I intended to address is fairly different:
We want a single registry of all options and don't want to inherit options from llvm/lib/A just because the tool happens to depend on llvm/lib/A directly or indirectly.
E.g. llvm-nm needs to depend on bitcode reader because it needs to handle LLVM bitcode files, however, I don't want a random cl::opt in bitcode reader to appear in llvm-nm's command line option list.

So I just built mlir-opt and inspected its --help output. It has exactly the problem I called out in my first message:

* We don't need to add cl::cat to every option and call
`HideUnrelatedOptions` to hide unrelated options from --help. The issue
would happen with cl::opt tools if linker garbage collection is disabled or
libLLVM-13git.so is used. (See https://reviews.llvm.org/D104363)
There are many unrelated options like --amdgpu-bypass-slow-div which is from llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp

Mehdi AMINI via llvm-dev

unread,
Jul 2, 2021, 7:08:39 PM7/2/21
to Fāng-ruì Sòng, llvm-dev
Sure, but that isn't the problem you were raising that I was answering to, let's not move the goal post here.

How do you solve this "namespacing" issue though? This was the sense of my question earlier in this thread: is OptTable providing me a solution to build library options that can be re-exported through tools command line interface? (bonus point if it manages some sort of namespacing).
The intent of cl::opt in libraries is that they can be exposed on the command line. It seems to me that this namespacing issue is quite intrinsic to the flat command line interface itself.
The way we work around this to avoid collision in mlir is through convention, making sure passes are prefixed according to their library (or dialects in MLIR).

Another thing we have in MLIR is cl::opt wrapped into external storage and custom parser, they aren't exposed in the global namespace.
For example, a pass class will define a member:

  ::mlir::Pass::Option<uint64_t> fastMemoryCapacity{*this, "fast-mem-capacity", ::llvm::cl::desc("Set fast memory space capacity in KiB (default: unlimited)"), ::llvm::cl::init(std::numeric_limits<uint64_t>::max())};

Where mlir::Pass::Option is a inheriting from cl::opt here: https://github.com/llvm/llvm-project/blob/main/mlir/include/mlir/Pass/PassOptions.h#L111

In `mlir-opt --help` it shows up as:

  Compiler passes to run
    --pass-pipeline                                     -   A textual description of a pass pipeline to run
    Passes:
      --affine-data-copy-generate                       -   Generate explicit copying for affine memory operations
        --fast-mem-capacity=<ulong>                     - Set fast memory space capacity in KiB (default: unlimited)

Note how the  "fast-mem-capacity" is nested under the "affine-data-copy-generate" (which is the pass name).
On the command line it won't be present at the top-level and you end up invoking it this way:

// RUN: mlir-opt %s -split-input-file -affine-data-copy-generate="generate-dma=false fast-mem-space=0 skip-non-unit-stride-loops" | FileCheck %s

It also has the advantage that you can invoke the same pass twice in the pipeline with a different value for the same cl::opt since the storage is private to a pass instance.


 

This can be worked around with linker garbage collection by discarding unreferenced cl::opt.

I don't understand how this works actually: cl::opt that rely on global constructors aren't referenced ever, as long as the file is linked in they will be involved. This is an incredibly clunky situation to play with linker semantics here, we end relying on the way files are organized in static archives (and it breaks when you build libLLVM.so as you mentioned before).

-- 
Mejdi

 

Mehdi AMINI via llvm-dev

unread,
Jul 2, 2021, 7:13:52 PM7/2/21
to Fāng-ruì Sòng, llvm-dev
Looks like our email crossed :)

Yes indeed!

I'm not against using OptTable for the sole purpose of binary tools, but that does not seem to provide us with a path forward. So I am afraid that this is a local optima that is short sighted.

 
The binary utilities command line parsing issue I intended to address is fairly different:
We want a single registry of all options and don't want to inherit options from llvm/lib/A just because the tool happens to depend on llvm/lib/A directly or indirectly.

I agree, what I pointed at in MLIR was an attempt to achieve this, the tool has to explicitly call a function at runtime to make the options available.
 
E.g. llvm-nm needs to depend on bitcode reader because it needs to handle LLVM bitcode files, however, I don't want a random cl::opt in bitcode reader to appear in llvm-nm's command line option list.

So I just built mlir-opt and inspected its --help output. It has exactly the problem I called out in my first message: 

* We don't need to add cl::cat to every option and call
`HideUnrelatedOptions` to hide unrelated options from --help. The issue
would happen with cl::opt tools if linker garbage collection is disabled or
libLLVM-13git.so is used. (See https://reviews.llvm.org/D104363)
There are many unrelated options like --amdgpu-bypass-slow-div which is from llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp

I think you're missing the point here: mlir-opt does not have this problem for mlir components. It pays the price of the LLVM components using global constructors, but I pointed to the solution we use for mlir components that aren't using global constructors and avoid this problem.


 

Fāng-ruì Sòng via llvm-dev

unread,
Jul 2, 2021, 7:49:47 PM7/2/21
to Mehdi AMINI, llvm-dev
[This topic of this subthread has shifted from binary utilities to generic library options]

> Sure, but that isn't the problem you were raising that I was answering
> to, let's not move the goal post here.
>
> How do you solve this "namespacing" issue though? This was the sense of my
> question earlier in this thread: is OptTable providing me a solution to
> build library options that can be re-exported through tools command line
> interface? (bonus point if it manages some sort of namespacing).

No, OptTable options are not composable.

> The intent of cl::opt in libraries is that they can be exposed on the
> command line. It seems to me that this namespacing issue is quite intrinsic
> to the flat command line interface itself.

Yes.

> The way we work around this to avoid collision in mlir is through
> convention, making sure passes are prefixed according to their library (or
> dialects in MLIR).

Yes. llvm-readobj has a similar pattern: calling a function which
defines cl::opt local variables.


OptTable is for tools like clang, flang, lld where

* they want high configurability, e.g. -long-option for some options while --long-option for others.
* they don't want library cl::opt options.[1]
* they want to avoid cl::opt's loosing behavior (-t=d and -demangle=0 are two examples I raised; hey, -enable-new-pm=0:) it is convenient internally but externally we'd better use -no- instead)

[1]: As you may have implied, this
is alternatively solvable by reorganizing llvm/lib/* options. This is
a huge task and the solution isn't particular clear yet.
--riscv-no-aliases is an example that cl::opt in library code can affect
multiple tools: llc/llvm-mc/llvm-objdump (llvm-mc/llvm-objdump usage was
unintentional). Someone signing up for the work needs to be careful on
letting utilities call the relevant register functions.

My proposal is like: binary utilities should move in the direction of
clang/lld as well.

> > This can be worked around with linker garbage collection by discarding
> > unreferenced cl::opt.
> >
>

> I don't understand how this works actually: cl::opt that rely on global
> constructors aren't referenced ever, as long as the file is linked in they
> will be involved. This is an incredibly clunky situation to play with
> linker semantics here, we end relying on the way files are organized in
> static archives (and it breaks when you build libLLVM.so as you mentioned
> before).

Yep, the reliance on linker garbage collection makes hiding unrelated options.

>>>>>>>>> *[Alexandre Ganea] *I think one other issue with cl::opt is that


>>>>>>>>> it aggregates the “command-line argument definition” and the “runtime

>>>>>>>>> parameter” *de facto* in a single object (unless cl::location is

>>>>>>> To me *these points are all usability issues of cl::opt*. I care


>>>>>>> about not exposing unnecessary interfaces so cl::opt accepting the weird
>>>>>>> -t=d looks a downside to me.
>>>>>>>
>>>>>>> --demangle=0 is weird and some llvm/llvm/test tests do use cl::opt
>>>>>>> options this way, so we cannot just remove this usage. As a workaround, we
>>>>>>> could add a cl::foobar_toggle to a cl::opt to disallow =0.
>>>>>>> We would end with more customization for one option, cl::cat (for
>>>>>>> hiding unrelated options), cl::foobar_toggle (for disallowing =0), and
>>>>>>> potentially others for other ad-hoc tasks.
>>>>>>>

>>>>>>> I can highlight another thing about the global state of cl::opt => *library
>>>>>>> cl::opt and binary utility cl::opt share the same namespace*.

>>> > I can highlight another thing about the global state of cl::opt => *library
>>> cl::opt and binary utility cl::opt share the same namespace*.


>>>
>>> This can be worked around with linker garbage collection by discarding
>>> unreferenced cl::opt.
>>>
>>
>> I re-read these messages. I think you probably meant something more
>> generic - how to design decentralized command line option registry where
>> every library can register some options.
>>
>
>Yes indeed!
>
>I'm not against using OptTable for the sole purpose of binary tools, but
>that does not seem to provide us with a path forward. So I am afraid that
>this is a local optima that is short sighted.

Mentioned previously, I don't try to alter current cl::opt usage in
library code and non-binary-utility tools (e.g. opt,llc).

Though I am not sure I agree with clang/lld/llvm-symbolizer/llvm-objdump
are using a local optima. I think cl::opt just don't fit for their use cases.
OptTable is just the appropriate solution for them.

Bogus: since options are data with the OptTable approach, we can re-use
something like clang-tblgen -gen-opt-docs to help ensure the documentation
doesn't diverge from the reality.

>> The binary utilities command line parsing issue I intended to address is
>> fairly different:
>> We want a single registry of all options and don't want to inherit options
>> from llvm/lib/A just because the tool happens to depend on llvm/lib/A
>> directly or indirectly.
>>
>
>I agree, what I pointed at in MLIR was an attempt to achieve this, the tool
>has to explicitly call a function at runtime to make the options available.
>
>
>> E.g. llvm-nm needs to depend on bitcode reader because it needs to handle
>> LLVM bitcode files, however, I don't want a random cl::opt in bitcode
>> reader to appear in llvm-nm's command line option list.
>>
>> So I just built mlir-opt and inspected its --help output. It has exactly
>> the problem I called out in my first message:
>>
>
>> * We don't need to add cl::cat to every option and call
>>
>> `HideUnrelatedOptions` to hide unrelated options from --help. The issue
>> would happen with cl::opt tools if linker garbage collection is disabled or
>> libLLVM-13git.so is used. (See https://reviews.llvm.org/D104363)
>>
>> There are many unrelated options like --amdgpu-bypass-slow-div which is from llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
>>
>>

>I think you're missing the point here: mlir-opt does not have this problem *for
>mlir components.* It pays the price of the LLVM components using global


>constructors, but I pointed to the solution we use for mlir components that
>aren't using global constructors and avoid this problem.

See [1] above

Mehdi AMINI via llvm-dev

unread,
Jul 2, 2021, 8:22:34 PM7/2/21
to Fāng-ruì Sòng, llvm-dev

I think part of the confusion on my side in this thread is that when I read "binary utilities" I thought first and foremost about `opt` and `lld`, while you're using "binary utilities" to refer to what I would call "end-user tools".
I agree with you that tools like clang and lld are in a different category than `opt`.

cl::opt as it may not be suitable as-is, but OptTable being not composable and not offering any facility to someone building a tool to re-expose library options is also quite limited. It seems to me that we need such a solution, and it also seems that if we had such solutions it would be suitable for all the tools we intend to build using LLVM-based libraries.
Without any plan to build this though, I'm not trying to block progress on your cleanup/improvement of these end-user tools :)

Cheers,

-- 
Mehdi

Philip Reames via llvm-dev

unread,
Jul 5, 2021, 12:43:38 PM7/5/21
to Fāng-ruì Sòng, llvm-dev


On 7/2/21 10:14 AM, Fāng-ruì Sòng via llvm-dev wrote:
llvm/tools/ include some binary utilities used as replacement for GNU binutils, e.g. llvm-objcopy, llvm-symbolizer, llvm-nm.
In some old threads people discussed some drawbacks of using cl::opt for user-facing utilities (I cannot find them now).
Switching to OptTable is an appealing solution. I have prepared two patches for two binary utilities: llvm-nm and llvm-strings.

* llvm-strings https://reviews.llvm.org/D104889
* llvm-nm https://reviews.llvm.org/D105330

llvm-symbolizer was switched last year. llvm-objdump was switched by thakis earlier this year.

The switch can fix some corners with lib/Support/CommandLine.cpp. Here is a summary:

* -t=d is removed (equal sign after a short option). Use -t d instead.
* --demangle=0 (=0 to disable a boolean option) is removed. Omit the option or use --no-demangle instead.

To me, removing these would make the interface *worse*.  This is purely subjective, but I use the second item regularly when locally debugging to swap back and forth between two modes easily. 

* To support boolean options (e.g. --demangle --no-demangle), we don't need to compare their positions (if (NoDemangle.getPosition() > Demangle.getPosition()) , see llvm-nm.cpp)
* grouped short options can be specified with one line `setGroupedShortOptions`, instead of adding cl::Grouping to every short options.
* We don't need to add cl::cat to every option and call `HideUnrelatedOptions` to hide unrelated options from --help. The issue would happen with cl::opt tools if linker garbage collection is disabled or libLLVM-13git.so is used. (See https://reviews.llvm.org/D104363)
* If we decide to support binary utility multiplexting (https://reviews.llvm.org/D104686), we will not get conflicting options. An option may have different meanings in different utilities (especially for one-letter options).

I expect that most users will not observe any difference.

There is a related topic whether we should disallow the single-dash `-long-option` form.
(Discussed in 2019: https://lists.llvm.org/pipermail/llvm-dev/2019-April/131786.html Accept --long-option but not -long-option for llvm binary utilities)
I'd like to disallow -long-option but may want to do this in a separate change.
The main point is that (1) grouped short options have syntax conflict with one-dash long options. (2) the GNU getopt_long style two-dash long option is much more popular.

I can think of potential pushback for some Mach-O specific options, e.g. nm -arch
http://www.manpagez.com/man/1/nm/osx-10.12.6.php says `-arch` has one dash.
If such options may have problems, we can keep supporting one dash forms.
With OptTable, allowing one-dash forms for a specific option is easy.

Fāng-ruì Sòng via llvm-dev

unread,
Jul 5, 2021, 1:14:29 PM7/5/21
to Philip Reames, llvm-dev
On Mon, Jul 5, 2021 at 9:43 AM Philip Reames <list...@philipreames.com> wrote:


On 7/2/21 10:14 AM, Fāng-ruì Sòng via llvm-dev wrote:
llvm/tools/ include some binary utilities used as replacement for GNU binutils, e.g. llvm-objcopy, llvm-symbolizer, llvm-nm.
In some old threads people discussed some drawbacks of using cl::opt for user-facing utilities (I cannot find them now).
Switching to OptTable is an appealing solution. I have prepared two patches for two binary utilities: llvm-nm and llvm-strings.

* llvm-strings https://reviews.llvm.org/D104889
* llvm-nm https://reviews.llvm.org/D105330

llvm-symbolizer was switched last year. llvm-objdump was switched by thakis earlier this year.

The switch can fix some corners with lib/Support/CommandLine.cpp. Here is a summary:

* -t=d is removed (equal sign after a short option). Use -t d instead.
* --demangle=0 (=0 to disable a boolean option) is removed. Omit the option or use --no-demangle instead.

To me, removing these would make the interface *worse*.  This is purely subjective, but I use the second item regularly when locally debugging to swap back and forth between two modes easily

See Medhi's message: "I think part of the confusion on my side in this thread is that when I read "binary utilities" I thought first and foremost about `opt` and `lld`, while you're using "binary utilities" to refer to what I would call "end-user tools". I agree with you that tools like clang and lld are in a different category than `opt`."

The proposal is for llvm-{ar,cov,cxxfilt,nm,objcopy,objdump,readobj,size,strings,symbolizer}. The options mostly follow GNU, with a few LLVM extensions. There are really few options which default to true and may be toggled by users to false. When they have toggles, there are `--no-*` options.

It's not like opt or llc where you need something like -enable-new-pm=0 or -enable-lto-internalization=0

* To support boolean options (e.g. --demangle --no-demangle), we don't need to compare their positions (if (NoDemangle.getPosition() > Demangle.getPosition()) , see llvm-nm.cpp)

* grouped short options can be specified with one line `setGroupedShortOptions`, instead of adding cl::Grouping to every short options.
* We don't need to add cl::cat to every option and call `HideUnrelatedOptions` to hide unrelated options from --help. The issue would happen with cl::opt tools if linker garbage collection is disabled or libLLVM-13git.so is used. (See https://reviews.llvm.org/D104363)
* If we decide to support binary utility multiplexting (https://reviews.llvm.org/D104686), we will not get conflicting options. An option may have different meanings in different utilities (especially for one-letter options).

I expect that most users will not observe any difference.

There is a related topic whether we should disallow the single-dash `-long-option` form.
(Discussed in 2019: https://lists.llvm.org/pipermail/llvm-dev/2019-April/131786.html Accept --long-option but not -long-option for llvm binary utilities)
I'd like to disallow -long-option but may want to do this in a separate change.
The main point is that (1) grouped short options have syntax conflict with one-dash long options. (2) the GNU getopt_long style two-dash long option is much more popular.

I can think of potential pushback for some Mach-O specific options, e.g. nm -arch
http://www.manpagez.com/man/1/nm/osx-10.12.6.php says `-arch` has one dash.
If such options may have problems, we can keep supporting one dash forms.
With OptTable, allowing one-dash forms for a specific option is easy.

_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


--
宋方睿

Philip Reames via llvm-dev

unread,
Jul 5, 2021, 1:54:37 PM7/5/21
to Fāng-ruì Sòng, llvm-dev


On 7/5/21 10:14 AM, Fāng-ruì Sòng wrote:
On Mon, Jul 5, 2021 at 9:43 AM Philip Reames <list...@philipreames.com> wrote:


On 7/2/21 10:14 AM, Fāng-ruì Sòng via llvm-dev wrote:
llvm/tools/ include some binary utilities used as replacement for GNU binutils, e.g. llvm-objcopy, llvm-symbolizer, llvm-nm.
In some old threads people discussed some drawbacks of using cl::opt for user-facing utilities (I cannot find them now).
Switching to OptTable is an appealing solution. I have prepared two patches for two binary utilities: llvm-nm and llvm-strings.

* llvm-strings https://reviews.llvm.org/D104889
* llvm-nm https://reviews.llvm.org/D105330

llvm-symbolizer was switched last year. llvm-objdump was switched by thakis earlier this year.

The switch can fix some corners with lib/Support/CommandLine.cpp. Here is a summary:

* -t=d is removed (equal sign after a short option). Use -t d instead.
* --demangle=0 (=0 to disable a boolean option) is removed. Omit the option or use --no-demangle instead.

To me, removing these would make the interface *worse*.  This is purely subjective, but I use the second item regularly when locally debugging to swap back and forth between two modes easily

See Medhi's message: "I think part of the confusion on my side in this thread is that when I read "binary utilities" I thought first and foremost about `opt` and `lld`, while you're using "binary utilities" to refer to what I would call "end-user tools". I agree with you that tools like clang and lld are in a different category than `opt`."

The proposal is for llvm-{ar,cov,cxxfilt,nm,objcopy,objdump,readobj,size,strings,symbolizer}. The options mostly follow GNU, with a few LLVM extensions. There are really few options which default to true and may be toggled by users to false. When they have toggles, there are `--no-*` options.

It's not like opt or llc where you need something like -enable-new-pm=0 or -enable-lto-internalization=0
You're right, I had missed that, and it completely resolves my concern.  Sorry for the noise. 

Reid Kleckner via llvm-dev

unread,
Jul 7, 2021, 3:45:12 PM7/7/21
to Mehdi AMINI, llvm-dev
On Fri, Jul 2, 2021 at 5:22 PM Mehdi AMINI via llvm-dev <llvm...@lists.llvm.org> wrote:
I think part of the confusion on my side in this thread is that when I read "binary utilities" I thought first and foremost about `opt` and `lld`, while you're using "binary utilities" to refer to what I would call "end-user tools".
I agree with you that tools like clang and lld are in a different category than `opt`.

cl::opt as it may not be suitable as-is, but OptTable being not composable and not offering any facility to someone building a tool to re-expose library options is also quite limited. It seems to me that we need such a solution, and it also seems that if we had such solutions it would be suitable for all the tools we intend to build using LLVM-based libraries.
Without any plan to build this though, I'm not trying to block progress on your cleanup/improvement of these end-user tools :)
 
I wanted to express my agreement:
  • OptTable is reasonable when attempting to implement a closed, compatible command line interface
  • We should not view cl::opt's library registration design as somehow "evil": There is much room for improvement
So, yeah, for all our GNU-compatible tools, go ahead and use the Option library.

I think cl::opt has a future in LLVM, but that's worth it's own discussion thread. The MLIR conventions help a lot, but they still appear to use a global command line registry, which isn't ideal for some use cases.

Mehdi AMINI via llvm-dev

unread,
Jul 14, 2021, 1:44:00 AM7/14/21
to Reid Kleckner, llvm-dev
Indeed, getting rid of the global state would be ideal, but that also seems much more challenging from an API design point of view: do you have ideas on this?

In the meantime, to improve the situation with libSupport, I just wrote  https://reviews.llvm.org/D105959 
(we're reusing libSupport in various situation and get hit by the eager registration of cl::opt and other globals in some cases)

-- 
Mehdi



 

Chris Bieneman via llvm-dev

unread,
Sep 16, 2021, 6:40:18 PM9/16/21
to llvm-dev
Hi all,

Apologies for reviving a long-aged thread here. I hadn't followed this thread when it came up, and independently started playing with the same basic idea. I have a prototype implementation on my GitHub[1], which creates an llvm-driver that can execute clang, dsymutil, llvm-ar, llvm-cxxfilt, llvm-dwarfdump, and llvm-objcopy.

The pattern can be applied fairly simply to additional tools (with one _huge_ caveat that I'll go into below). For most tools to be built into the multicall binary the only required changes are adding the `GENERATE_DRIVER` option to the `add_llvm_tool` CMake call, and changing the `main` function to be prefixed by the tool's name (i.e. llvm_objcopy_main, clang_main, etc). 

As an example, the full diffs for llvm-objcopy are:

```
diff --git a/llvm/tools/llvm-objcopy/CMakeLists.txt b/llvm/tools/llvm-objcopy/CMakeLists.txt
index d14d2135f5db..644dec79bc50 100644
--- a/llvm/tools/llvm-objcopy/CMakeLists.txt
+++ b/llvm/tools/llvm-objcopy/CMakeLists.txt
@@ -43,6 +43,7 @@ add_llvm_tool(llvm-objcopy
   ObjcopyOptsTableGen
   InstallNameToolOptsTableGen
   StripOptsTableGen
+  GENERATE_DRIVER
   )
 
 add_llvm_tool_symlink(llvm-install-name-tool llvm-objcopy)
diff --git a/llvm/tools/llvm-objcopy/llvm-objcopy.cpp b/llvm/tools/llvm-objcopy/llvm-objcopy.cpp
index ad166487eb78..bd5556f225b2 100644
--- a/llvm/tools/llvm-objcopy/llvm-objcopy.cpp
+++ b/llvm/tools/llvm-objcopy/llvm-objcopy.cpp
@@ -401,7 +401,7 @@ static Error executeObjcopy(ConfigManager &ConfigMgr) {
   return Error::success();
 }
 
-int main(int argc, char **argv) {
+int llvm_objcopy_main(int argc, char **argv) {
   InitLLVM X(argc, argv);
   ToolName = argv[0];
```

With some clever CMake goop, any tool that opts into being part of the merged driver gets a generated template main function, and much of the other boilerplate code is generated too. As implemented, the tools all get built into their own tools _and_ the llvm-driver tool. If this is a desirable route part of the patch to make this "real" would be adding an option to disable building the tool and instead generate a symlink from the tool to llvm-driver.

This implementation dose require CMake 3.12 or later, since CMake 3.12 allows linkage dependencies for object libraries, which the implementation depends on.

The _huge_ caveat is that cl::opt haunts all things I do in LLVM. I tried adding clang-tidy to the tools, and it will build fine, but crashes on launch because of duplicate command line options being registered (d'oh!). cl::opt's continued reliance on globals means that it is ill-suited for the construction of a mega-llvm-driver.

This is something that has come up time and time again in many different contexts, but we've never really had the community effort behind resolving it.

Using OptTable has been suggested, but one of the common complaints is that OptTable for tool options is unwieldy and overly complicated for small simple tools. Also, there isn't a good way to handle options that are buried inside libraries, and many of the cl::opt options are.

Many years ago I initiated lengthy discussions on llvm-dev and at llvm socials about an alternate approach [2], but it was a half measure at best. One of the challenges that it didn't solve was the need for registering command line options for all the debugging values buried in the passes. I don't mean to derail this effort by sending us all down a rabbit hole, but I also think that for a real tractable solution to an llvm Busybox/multicall binary solution, we really need to do something about cl::opt.

-Chris

Fāng-ruì Sòng via llvm-dev

unread,
Sep 16, 2021, 7:14:26 PM9/16/21
to Chris Bieneman, llvm-dev

Sadly agree :(

> Using OptTable has been suggested, but one of the common complaints is that OptTable for tool options is unwieldy and overly complicated for small simple tools. Also, there isn't a good way to handle options that are buried inside libraries, and many of the cl::opt options are.

Short options were a big problem of these user-facing binary utilities.
On the good side, most binary utilities are ready for the
crunchgen/busybox/multiplexer proposal now.
I have switched llvm-readobj (https://reviews.llvm.org/D105532)
llvm-size (https://reviews.llvm.org/D105598)
llvm-cxxfilt (https://reviews.llvm.org/D105605) llvm-nm
(https://reviews.llvm.org/D105330) llvm-strings
(https://reviews.llvm.org/D104889)
(and from the previous release llvm-symbolizer).
The descriptions say more on why OptTable is better than cl::opt for
user-facing options.

> Many years ago I initiated lengthy discussions on llvm-dev and at llvm socials about an alternate approach [2], but it was a half measure at best. One of the challenges that it didn't solve was the need for registering command line options for all the debugging values buried in the passes. I don't mean to derail this effort by sending us all down a rabbit hole, but I also think that for a real tractable solution to an llvm Busybox/multicall binary solution, we really need to do something about cl::opt.

In mlir, mlir/lib/Support/Timing.cpp uses this style

namespace {
struct DefaultTimingManagerOptions {
llvm::cl::opt<bool> timing{"mlir-timing",
llvm::cl::desc("Display execution times"),
llvm::cl::init(false)};
llvm::cl::opt<DisplayMode> displayMode{
"mlir-timing-display", llvm::cl::desc("Display method for timing data"),
llvm::cl::init(DisplayMode::Tree),
llvm::cl::values(
clEnumValN(DisplayMode::List, "list",
"display the results in a list sorted by total time"),
clEnumValN(DisplayMode::Tree, "tree",
"display the results ina with a nested tree view"))};
};
} // end anonymous namespace

static llvm::ManagedStatic<DefaultTimingManagerOptions> options;

void mlir::registerDefaultTimingManagerCLOptions() {
// Make sure that the options struct has been constructed.
*options;
}

Still cl::opt, but if register*CLOptions functions are well
controlled, the global option name space problem should be fine.

Leonard Chan via llvm-dev

unread,
Sep 16, 2021, 7:19:27 PM9/16/21
to Chris Bieneman, llvm-dev
Thanks for sharing your prototype! Glad to see that other people are on board with this idea. For an incremental approach, it seems that Fangrui has migrated many llvm tools to use OptTable, so it shouldn't be a blocker for those at least. Do you also happen to be landing any of your code sometime soon? We have an intern who will be picking up this work and we should probably coordinate to make sure no work is duplicated. 

On Thu, Sep 16, 2021 at 3:40 PM Chris Bieneman via llvm-dev <llvm...@lists.llvm.org> wrote:

Chris Bieneman via llvm-dev

unread,
Sep 16, 2021, 8:09:05 PM9/16/21
to Leonard Chan, llvm-dev
I have a few bits of cleanup and fleshing out that I’d like to do for the llvm-driver tool in my patch. I can work on that tonight and tomorrow, and probably post a patch for review by Monday.

My general approach to this would be to add the llvm-driver as an excluded from all target that is always configured in the build.

Subsequent patches would add support for making it replace the tool builds with symlinks, and ensuring compatibility with important build system functionality like `LLVM_DISTRIBUTION_COMPONENTS`.

I’ll start working on the patch cleanup, and if that approach sounds reasonable we can move on from there.

-Chris

On Sep 16, 2021, at 6:19 PM, Leonard Chan <leona...@google.com> wrote:



Chris Bieneman via llvm-dev

unread,
Sep 17, 2021, 12:38:02 PM9/17/21
to Chris Bieneman, llvm-dev
I've posted my prototype for review here:


Feedback is greatly appreciated :)

-Chris
Reply all
Reply to author
Forward
0 new messages