I think it's an interesting idea. My main concern is that adding a new CMake
option for this going to complicate the build system and make future CMake
improvements more difficult.
Do you have any idea of how much performance /
toolchain size gains you will get from this approach?
-Tom
> - Are there existing tools in LLVM that already do this?
> - Other implementation details/global states that we would also need to account for?
>
> - Leonard
>
> _______________________________________________
> LLVM Developers mailing list
> llvm...@lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
I think it's an interesting idea. My main concern is that adding a new CMake
option for this going to complicate the build system and make future CMake
improvements more difficult.
Do you have any idea of how much performance /
toolchain size gains you will get from this approach?
The dynamic relocation claim is not true.
A thin executable using just -Bsymbolic libLLVM-13git.so is almost
identical to a mostly statically linked PIE.
I added -Bsymbolic-functions to libLLVM.so and libclang-cpp.so which
has claimed most of the -Bsymbolic benefits.
The shared object approach *can be* inferior to static linking plus
-Wl,--gc-sections because with libLLVM.so and libclang-cpp.so we are
making many many API dynamic and that inhibits the --gc-sections
benefits. However, if clang and lld are shipped together with
llvm-objdump/llvm-readobj/llvm-objcopy/.... , I expect the non-GCable
code due to shared objects will be significantly smaller.
I am conservative on adding yet another mechanism.
crunchgen. As you said, argv[0] checking code needs to be taken care of.
We should make these executables' main file not have colliding symbols.
I have cleaned up a lot of files.
A few points.
In an ideal ELF world only external function calls need PLT entries.
Currently shared objects have PLT entries for in-dso function calls
because default visibility non-local symbols are preemptible by default
and the linker will produce PLT entries. -Bsymbolic-functions suppresses
PLT entries for in-dso symbols.
---
I have an approach for users whose libLLVM.so libclang-cpp are closed
sets and want to get GC benefits. I'll use libLLVM.so as an example.
* Identify the list of executables which link against libLLVM.so: exe0, exe1, exe2.
* For each exe, do a relocatable link of its own code (usually llvm/tools/llvm-foobar/*.o). Get the undefined symbol list.
* Take the union of the undefined symbol lists of all exe. Create a version script file with these symbols `global:` and `local: *`.
* Re-link libLLVM.so with --version-script.
The resulting libLLVM.so only provides dynamic symbols needed by these executables.
This is still tricky and I am not sure how much it can decrease the size.
Do you have a plan for Windows? Sym links on Windows are mostly limited to administrators and developer mode.
For pure compatibility purposes, in place of symlinks we could have facade executables on Windows. But that isn’t favorable in terms of performance, the cost of launching additional executables is quite high on Windows. I wonder if the LLVM installer could have a way to switch between both schemes: if admin mode is available, create symlinks; otherwise fall back to facades.
De : llvm-dev <llvm-dev...@lists.llvm.org>
De la part de Ben Craig via llvm-dev
Envoyé : June 21, 2021 5:19 PM
À : llvm...@lists.llvm.org
Objet : Re: [llvm-dev] [RFC] LLVM Busybox Proposal
Hello Leonard,
That is a very interesting idea! This will particularly favor Windows where the LLVM bin/ folder is huge (3.5 GiB) since we don’t have working symlinks out-of-box. This is also going towards the direction that we are pursuing, having Clang and LLD together into an embedded application as suggested by llvm-buildozer [1], however we’re also considering the multi-threading aspect. We took a different route for now, which is loading the existing executables as shared libraries inside our application, but our concern was less the binary size on disk, and more about runtime performance (building time).
Regarding migrating every option to `OptTable`, are you suggesting removing `cl::opt` and `CommandLineParser` altogether? I can count 3,597 instances of `cl::opt` in the whole monorepo. This can be a tedious task even with automation, since it would need some level of classification into the appropriate .td file. What would be the approach for the migration? To alleviate the issue of having `cl::opt`s cross the tool domain, we could temporarily auto-generate a dictionary of `cl::opt`s available for each tool? That could be a quick intermediary step, while waiting for a complete migration.
Once other issue I can see is symbols clashing at link time. Having everything in the same executable requires internal ABI compatibly throughout, ie. compiling with the same #defines and linking with the same (system) libraries. I’m wondering if there was a analysis done in that regards? But maybe that is not an issue.
Best,
Alex.
[1] https://reviews.llvm.org/D86351
De : llvm-dev <llvm-dev...@lists.llvm.org>
De la part de Leonard Chan via llvm-dev
Envoyé : June 21, 2021 1:55 PM
À : llvm-dev <llvm...@lists.llvm.org>
Objet : [llvm-dev] [RFC] LLVM Busybox Proposal
-DLLVM_LINK_LLVM_DYLIB=on -DCLANG_LINK_CLANG_DYLIB=on -DLLVM_TARGETS_TO_BUILD=X86 (custom1)
vs
-DLLVM_TARGETS_TO_BUILD=X86 (custom2)
# This is the lower bound for any multiplexing approach. clang is the largest executable.
% stat -c %s /tmp/out/custom2/bin/clang-13
102900408
I have built clang, lld and a bunch of ELF binary utilities.
% stat -c %s /tmp/out/custom1/lib/libLLVM-13git.so /tmp/out/custom1/lib/libclang-cpp.so.13git /tmp/out/custom1/bin/{clang-13,lld,llvm-{ar,cov,cxxfilt,nm,objcopy,objdump,readobj,size,strings,symbolizer}} | awk '{s+=$1}END{print s}'
138896544
% stat -c %s /tmp/out/custom2/bin/{clang-13,lld,llvm-{ar,cov,cxxfilt,nm,objcopy,objdump,readobj,size,strings,symbolizer}} | awk '{s+=$1}END{print s}'
209054440
The -DLLVM_LINK_LLVM_DYLIB=on -DCLANG_LINK_CLANG_DYLIB=on build is doing a really good job.
A multiplexing approach can squeeze some bytes from 138896544 toward 102900408,
but how much can it do?
>- I'm starting to think the `cl::opt` to `OptTable` issue might be
>orthogonal to the busybox implementation. The tool essentially dispatches
>to different "main" functions in different tools, but as long as we don't
>do anything within busybox after exiting that tool's main, then the global
>state issues we weren't sure of with `cl::opt` might not be of any concern
>now. It may be an issue down the line if, let's say, the tool flags moved
>from being "owned" by the tools themselves to instead being "owned" by
>busybox, and then we'd have to merge similarly-named flags together. In
>that case, migrating these tools to use `OptTable` may be necessary since
>(I think) `OptTable` should handle this. This may be a tedious task, but
>this is just to say that busybox won't need to be immediately blocked on it.
Such improvement is useful even if we don't do multiplexing.
I switched llvm-symbolizer. thakis switched llvm-objdump.
I can look at some binary utilities.
>_______________________________________________
From our perspective as a toolchain vendor, even if using shared libraries could get us closer to static linking in terms of performance, we'd still prefer static linking for the ease of distribution. Dealing with a single statically linked executable is much easier than dealing with multiple shared libraries. This is especially important in distributed compilation environments like Goma.
I guess this depends on a particular implementation of the distributed build system. In the case of Goma, we only supply the compiler binary which was invoked as the command (that binary links glibc as a shared library but we assume that one is supplied by the host system), all other files like headers are passed together with the compiler invocation as inputs. If we used dynamic linking, Goma would need to figure out what other shared libraries need to be sent to the server. It's certainly doable but it's an extra complexity we would like to avoid.
For non-clang executables, -DLLVM_LINK_LLVM_DYLIB=on just adds one
more DT_NEEDED.
The DT_NEEDED entry can use a $ORIGIN based DT_RUNPATH. Can Goma
detect the libraries shipped with the tools?
I asked because I feel this could be an artificial limitation which
could be straightforwardly addressed in Goma.
A toolchain executable using a accompanying shared object is not rare
(thinking of plugins).
Multiplexing LLVM tools is one alternative but I am a bit concerned
with the extra complexity and the new configuration the build system
needs to support.
https://lists.llvm.org/pipermail/llvm-dev/2021-June/151338.html
mentioned another approach which doesn't require intrusive
modification to the tools.
As for PGO+LTO, you can apply them to libLLVM-13git.so as well.
--
宋方睿
Is that a problem? Installers generally run with administrator rights
(choco, for example, requires running from an Administrator PowerShell
and that's how most folks I know install LLVM on Windows).
Developers generally need to enable developer mode if they want to run
things that they've built (and doing so is a single toggle switch in
Settings, so it's not a massive obstacle). It should be fairly easy to
try running mklink during CMake if this option is enabled and, if it
fails, error out and tell the person running the build to either enable
developer mode or switch to separate-program builds.
David
I agree that the official installation case probably isn't an issue.
There are unofficial installation cases that are more annoying. I wouldn't be able to just zip up my llvm dir and hand it to someone else to unzip like I can today.
The just-built case is a bigger deal. I do most of my development on Windows from a standard account (non-admin, non-developer). That's largely by choice, but some IT departments are much more picky. If I need to install something, then I open a distinct admin command prompt.
Requiring development mode to be turned on for LLVM dev is similar to requiring Linux devs to build as root (or at least making a few new programs setuid root).
On Tue, Jun 22, 2021 at 10:20 PM Petr Hosek <pho...@google.com> wrote:
>
> I guess this depends on a particular implementation of the distributed build system. In the case of Goma, we only supply the compiler binary which was invoked as the command (that binary links glibc as a shared library but we assume that one is supplied by the host system), all other files like headers are passed together with the compiler invocation as inputs. If we used dynamic linking, Goma would need to figure out what other shared libraries need to be sent to the server. It's certainly doable but it's an extra complexity we would like to avoid.
For non-clang executables, -DLLVM_LINK_LLVM_DYLIB=on just adds one
more DT_NEEDED.
The DT_NEEDED entry can use a $ORIGIN based DT_RUNPATH. Can Goma
detect the libraries shipped with the tools?
I asked because I feel this could be an artificial limitation which
could be straightforwardly addressed in Goma.
A toolchain executable using a accompanying shared object is not rare
(thinking of plugins).
Multiplexing LLVM tools is one alternative but I am a bit concerned
with the extra complexity and the new configuration the build system
needs to support.
https://lists.llvm.org/pipermail/llvm-dev/2021-June/151338.html
mentioned another approach which doesn't require intrusive
modification to the tools.
As for PGO+LTO, you can apply them to libLLVM-13git.so as well.
If PGO+LTO is desired, the executables can be split this way, assuming
the performance of
llvm-{ar,cov,cxxfilt,nm,objcopy,objdump,readobj,size,strings,symbolizer}
doesn't matter.
* clang (libLLVM*.a)
* lld + llvm-{ar,cov,cxxfilt,nm,objcopy,objdump,readobj,size,strings,symbolizer}
(libLLVM-13git.so)
> LTO also benefits from "internalizing", basically building a static binary where only `main` is exported and everything else becomes an internal linkage is the best case: pointer escaping, global analysis, etc all become more powerful. Optimizing a shared library kind of makes every symbol public, and I suspect the busybox approach may be better on this aspect (you get back to a single public main, but it can reach much more code though).
With --version-script we can internalize shared object symbols as
well. For example, this has been used to facilitate whole-program
devirtualization (https://reviews.llvm.org/D98686).
With https://lists.llvm.org/pipermail/llvm-dev/2021-June/151338.html
we can get a list of roots which need to be exported.
A thin executable plus a -fvisibility-inlines-hidden +
-Bsymbolic-functions shared object is almost identical to a PIE.
llvm/tools/ include some binary utilities used as replacement for GNU binutils, e.g. llvm-objcopy, llvm-symbolizer, llvm-nm.
In some old threads people discussed some drawbacks of using cl::opt for user-facing utilities (I cannot find them now).
Switching to OptTable is an appealing solution. I have prepared two patches for two binary utilities: llvm-nm and llvm-strings.
* llvm-strings https://reviews.llvm.org/D104889
* llvm-nm https://reviews.llvm.org/D105330
llvm-symbolizer was switched last year. llvm-objdump was switched by thakis earlier this year.
The switch can fix some corners with lib/Support/CommandLine.cpp. Here is a summary:
* -t=d is removed (equal sign after a short option). Use -t d instead.
* --demangle=0 (=0 to disable a boolean option) is removed. Omit the option or use --no-demangle instead.
* To support boolean options (e.g. --demangle --no-demangle), we don't need to compare their positions (if (NoDemangle.getPosition() > Demangle.getPosition()) , see llvm-nm.cpp)
* grouped short options can be specified with one line `setGroupedShortOptions`, instead of adding cl::Grouping to every short options.
* We don't need to add cl::cat to every option and call `HideUnrelatedOptions` to hide unrelated options from --help. The issue would happen with cl::opt tools if linker garbage collection is disabled or libLLVM-13git.so is used. (See https://reviews.llvm.org/D104363)
* If we decide to support binary utility multiplexting (https://reviews.llvm.org/D104686), we will not get conflicting options. An option may have different meanings in different utilities (especially for one-letter options).
I expect that most users will not observe any difference.
There is a related topic whether we should disallow the single-dash `-long-option` form.
(Discussed in 2019: https://lists.llvm.org/pipermail/llvm-dev/2019-April/131786.html Accept --long-option but not -long-option for llvm binary utilities)
I'd like to disallow -long-option but may want to do this in a separate change.
The main point is that (1) grouped short options have syntax conflict with one-dash long options. (2) the GNU getopt_long style two-dash long option is much more popular.
I can think of potential pushback for some Mach-O specific options, e.g. nm -arch
http://www.manpagez.com/man/1/nm/osx-10.12.6.php says `-arch` has one dash.
If such options may have problems, we can keep supporting one dash forms.
With OptTable, allowing one-dash forms for a specific option is easy.
On Fri, Jul 2, 2021 at 10:15 AM Fāng-ruì Sòng via llvm-dev <llvm...@lists.llvm.org> wrote:llvm/tools/ include some binary utilities used as replacement for GNU binutils, e.g. llvm-objcopy, llvm-symbolizer, llvm-nm.
In some old threads people discussed some drawbacks of using cl::opt for user-facing utilities (I cannot find them now).
Would be good to describe some of the known drawbacks/expected benefits.
One potential one (though I don't recall it being discussed recently) would be that maybe this addresses the issue of global ctors in cl::opt? Does OptTable avoid/not use global constructors? That would be nice - it's an ongoing issue that LLVM library users pay for command line argument support they have no need for in the form of global ctor execution time.
Switching to OptTable is an appealing solution. I have prepared two patches for two binary utilities: llvm-nm and llvm-strings.
* llvm-strings https://reviews.llvm.org/D104889
* llvm-nm https://reviews.llvm.org/D105330
llvm-symbolizer was switched last year. llvm-objdump was switched by thakis earlier this year.
The switch can fix some corners with lib/Support/CommandLine.cpp. Here is a summary:
* -t=d is removed (equal sign after a short option). Use -t d instead.
* --demangle=0 (=0 to disable a boolean option) is removed. Omit the option or use --no-demangle instead.
* To support boolean options (e.g. --demangle --no-demangle), we don't need to compare their positions (if (NoDemangle.getPosition() > Demangle.getPosition()) , see llvm-nm.cpp)
* grouped short options can be specified with one line `setGroupedShortOptions`, instead of adding cl::Grouping to every short options.
* We don't need to add cl::cat to every option and call `HideUnrelatedOptions` to hide unrelated options from --help. The issue would happen with cl::opt tools if linker garbage collection is disabled or libLLVM-13git.so is used. (See https://reviews.llvm.org/D104363)
* If we decide to support binary utility multiplexting (https://reviews.llvm.org/D104686), we will not get conflicting options. An option may have different meanings in different utilities (especially for one-letter options).
I expect that most users will not observe any difference.
There is a related topic whether we should disallow the single-dash `-long-option` form.(Discussed in 2019: https://lists.llvm.org/pipermail/llvm-dev/2019-April/131786.html Accept --long-option but not -long-option for llvm binary utilities)
I'd like to disallow -long-option but may want to do this in a separate change.
I'd say definitely do this as a separate change. I expect there'd be a long tail of users after this change ships in an LLVM release, etc, such that we may want to undo some amount of it a long time after the change is made.
On Fri, Jul 2, 2021 at 11:17 AM David Blaikie <dbla...@gmail.com> wrote:On Fri, Jul 2, 2021 at 10:15 AM Fāng-ruì Sòng via llvm-dev <llvm...@lists.llvm.org> wrote:llvm/tools/ include some binary utilities used as replacement for GNU binutils, e.g. llvm-objcopy, llvm-symbolizer, llvm-nm.
In some old threads people discussed some drawbacks of using cl::opt for user-facing utilities (I cannot find them now).
Would be good to describe some of the known drawbacks/expected benefits.The summary is the list of benefits:)
The drawback is some initial boilerplate (e.g. llvm-tblgen -gen-opt-parser-defs in CMakeLists.txt, class NmOptTable in code).The handling of comma separated options -arch=x86_64,arm64 doesn't have direct OptTable support. llvm::SplitString is needed (just search for SplitString in https://reviews.llvm.org/D105330)But this doesn't tend to increase complexity because the cl::list<std::string> will need per-value verification anyway.One potential one (though I don't recall it being discussed recently) would be that maybe this addresses the issue of global ctors in cl::opt? Does OptTable avoid/not use global constructors? That would be nice - it's an ongoing issue that LLVM library users pay for command line argument support they have no need for in the form of global ctor execution time.OptTable is used as a local variable. So yes, it avoids global constructors,
On Fri, Jul 2, 2021 at 11:27 AM Fāng-ruì Sòng <mas...@google.com> wrote:On Fri, Jul 2, 2021 at 11:17 AM David Blaikie <dbla...@gmail.com> wrote:On Fri, Jul 2, 2021 at 10:15 AM Fāng-ruì Sòng via llvm-dev <llvm...@lists.llvm.org> wrote:llvm/tools/ include some binary utilities used as replacement for GNU binutils, e.g. llvm-objcopy, llvm-symbolizer, llvm-nm.
In some old threads people discussed some drawbacks of using cl::opt for user-facing utilities (I cannot find them now).
Would be good to describe some of the known drawbacks/expected benefits.The summary is the list of benefits:)
Ah, it looks more like a list of changes, not necessarily benefits - removing certain syntaxes seems generally like a cost to me (potential to break existing users), rather than an outright benefit.
The API benefits sound nice, though presumably some could be retrofitted to cl::opt if that was the only goal. Side benefits in addition to removing global ctors are nice to have.
The drawback is some initial boilerplate (e.g. llvm-tblgen -gen-opt-parser-defs in CMakeLists.txt, class NmOptTable in code).The handling of comma separated options -arch=x86_64,arm64 doesn't have direct OptTable support. llvm::SplitString is needed (just search for SplitString in https://reviews.llvm.org/D105330)But this doesn't tend to increase complexity because the cl::list<std::string> will need per-value verification anyway.One potential one (though I don't recall it being discussed recently) would be that maybe this addresses the issue of global ctors in cl::opt? Does OptTable avoid/not use global constructors? That would be nice - it's an ongoing issue that LLVM library users pay for command line argument support they have no need for in the form of global ctor execution time.OptTable is used as a local variable. So yes, it avoids global constructors,Nice :)
thus avoiding cl::opt option name collision."If we decide to support binary utility multiplexing" below mentioned this point.Switching to OptTable is an appealing solution. I have prepared two patches for two binary utilities: llvm-nm and llvm-strings.
* llvm-strings https://reviews.llvm.org/D104889
* llvm-nm https://reviews.llvm.org/D105330
llvm-symbolizer was switched last year. llvm-objdump was switched by thakis earlier this year.
The switch can fix some corners with lib/Support/CommandLine.cpp. Here is a summary:
* -t=d is removed (equal sign after a short option). Use -t d instead.
* --demangle=0 (=0 to disable a boolean option) is removed. Omit the option or use --no-demangle instead.
* To support boolean options (e.g. --demangle --no-demangle), we don't need to compare their positions (if (NoDemangle.getPosition() > Demangle.getPosition()) , see llvm-nm.cpp)
* grouped short options can be specified with one line `setGroupedShortOptions`, instead of adding cl::Grouping to every short options.
* We don't need to add cl::cat to every option and call `HideUnrelatedOptions` to hide unrelated options from --help. The issue would happen with cl::opt tools if linker garbage collection is disabled or libLLVM-13git.so is used. (See https://reviews.llvm.org/D104363)
* If we decide to support binary utility multiplexting (https://reviews.llvm.org/D104686), we will not get conflicting options. An option may have different meanings in different utilities (especially for one-letter options).
I expect that most users will not observe any difference.
There is a related topic whether we should disallow the single-dash `-long-option` form.(Discussed in 2019: https://lists.llvm.org/pipermail/llvm-dev/2019-April/131786.html Accept --long-option but not -long-option for llvm binary utilities)
I'd like to disallow -long-option but may want to do this in a separate change.
I'd say definitely do this as a separate change. I expect there'd be a long tail of users after this change ships in an LLVM release, etc, such that we may want to undo some amount of it a long time after the change is made.Thanks for chiming in.
The main point is that (1) grouped short options have syntax conflict with one-dash long options. (2) the GNU getopt_long style two-dash long option is much more popular.
I can think of potential pushback for some Mach-O specific options, e.g. nm -arch
http://www.manpagez.com/man/1/nm/osx-10.12.6.php says `-arch` has one dash.
If such options may have problems, we can keep supporting one dash forms.
With OptTable, allowing one-dash forms for a specific option is easy.
The API benefits sound nice, though presumably some could be retrofitted to cl::opt if that was the only goal. Side benefits in addition to removing global ctors are nice to have.
The drawback is some initial boilerplate (e.g. llvm-tblgen -gen-opt-parser-defs in CMakeLists.txt, class NmOptTable in code).
The handling of comma separated options -arch=x86_64,arm64 doesn't have direct OptTable support. llvm::SplitString is needed (just search for SplitString in https://reviews.llvm.org/D105330)
But this doesn't tend to increase complexity because the cl::list<std::string> will need per-value verification anyway.
One potential one (though I don't recall it being discussed recently) would be that maybe this addresses the issue of global ctors in cl::opt? Does OptTable avoid/not use global constructors? That would be nice - it's an ongoing issue that LLVM library users pay for command line argument support they have no need for in the form of global ctor execution time.
OptTable is used as a local variable. So yes, it avoids global constructors,
Nice :)
Note that MLIR is using cl::opt without global ctor (we build with `-Werror=global-constructors`).
The pattern we use to write a tool with cl::opt and avoid global ctor (and can be used to avoid collision) looks like: https://github.com/llvm/llvm-project/blob/main/mlir/lib/IR/MLIRContext.cpp#L57-L83
The tool that wants to expose the MLIRContext options to the command line calls registerMLIRContextCLOptions() before parsing the command line.
Wouldn't this translate directly to LLVM tools as well with some minor refactoring?
The same applies to all of the infrastructure in MLIR, passes are registered explicitly, etc. This decouples the "is this code linked in" from "options are loaded" annoying part of the global constructors.
--
Mehdi
[Alexandre Ganea] I think one other issue with cl::opt is that it aggregates the “command-line argument definition” and the “runtime parameter” de facto in a single object (unless cl::location is manually specified to every cl::opt). What MLIR does solves the issue mentioned by David, the fact that every tool pulls/initializes every cl::opt out there. However OptTable solves both problems, and makes the entry point thread-safe.
Because I still don't see clearly how to build something like `opt` with all the pass and the options with OptTable, how does it all compose?
Because I still don't see clearly how to build something like `opt` with all the pass and the options with OptTable, how does it all compose?The proposed changes are specific to binary utilities: llvm-{ar,cxxfilt,nm,objcopy,objdump,readobj,size,strings,symbolizer}.'opt', 'llc', etc are not in the scope.(I guess I should have named the utilities more specifically to not cause confusion.)For llvm-{ar,cxxfilt,nm,objcopy,objdump,readobj,size,strings,symbolizer}, we don't want a random cl::opt in lib/Transforms, lib/MC, orlib/LTO to be accidentally specifiable on the command line.(In the rare debugging case where such functionality is needed, it needs a -mllvm prefix like what ld.lld does.)
`HideUnrelatedOptions` to hide unrelated options from --help. The issue would happen with cl::opt tools if linker garbage collection is disabled or libLLVM-13git.so is used. (See https://reviews.llvm.org/D104363)
There are many unrelated options like --amdgpu-bypass-slow-div which is from llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
This can be worked around with linker garbage collection by discarding unreferenced cl::opt.
The binary utilities command line parsing issue I intended to address is fairly different:We want a single registry of all options and don't want to inherit options from llvm/lib/A just because the tool happens to depend on llvm/lib/A directly or indirectly.
E.g. llvm-nm needs to depend on bitcode reader because it needs to handle LLVM bitcode files, however, I don't want a random cl::opt in bitcode reader to appear in llvm-nm's command line option list.So I just built mlir-opt and inspected its --help output. It has exactly the problem I called out in my first message:
* We don't need to add cl::cat to every option and call`HideUnrelatedOptions` to hide unrelated options from --help. The issue would happen with cl::opt tools if linker garbage collection is disabled or libLLVM-13git.so is used. (See https://reviews.llvm.org/D104363)There are many unrelated options like --amdgpu-bypass-slow-div which is from llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
> Sure, but that isn't the problem you were raising that I was answering
> to, let's not move the goal post here.
>
> How do you solve this "namespacing" issue though? This was the sense of my
> question earlier in this thread: is OptTable providing me a solution to
> build library options that can be re-exported through tools command line
> interface? (bonus point if it manages some sort of namespacing).
No, OptTable options are not composable.
> The intent of cl::opt in libraries is that they can be exposed on the
> command line. It seems to me that this namespacing issue is quite intrinsic
> to the flat command line interface itself.
Yes.
> The way we work around this to avoid collision in mlir is through
> convention, making sure passes are prefixed according to their library (or
> dialects in MLIR).
Yes. llvm-readobj has a similar pattern: calling a function which
defines cl::opt local variables.
OptTable is for tools like clang, flang, lld where
* they want high configurability, e.g. -long-option for some options while --long-option for others.
* they don't want library cl::opt options.[1]
* they want to avoid cl::opt's loosing behavior (-t=d and -demangle=0 are two examples I raised; hey, -enable-new-pm=0:) it is convenient internally but externally we'd better use -no- instead)
[1]: As you may have implied, this
is alternatively solvable by reorganizing llvm/lib/* options. This is
a huge task and the solution isn't particular clear yet.
--riscv-no-aliases is an example that cl::opt in library code can affect
multiple tools: llc/llvm-mc/llvm-objdump (llvm-mc/llvm-objdump usage was
unintentional). Someone signing up for the work needs to be careful on
letting utilities call the relevant register functions.
My proposal is like: binary utilities should move in the direction of
clang/lld as well.
> > This can be worked around with linker garbage collection by discarding
> > unreferenced cl::opt.
> >
>
> I don't understand how this works actually: cl::opt that rely on global
> constructors aren't referenced ever, as long as the file is linked in they
> will be involved. This is an incredibly clunky situation to play with
> linker semantics here, we end relying on the way files are organized in
> static archives (and it breaks when you build libLLVM.so as you mentioned
> before).
Yep, the reliance on linker garbage collection makes hiding unrelated options.
>>>>>>>>> *[Alexandre Ganea] *I think one other issue with cl::opt is that
>>>>>>>>> it aggregates the “command-line argument definition” and the “runtime
>>>>>>>>> parameter” *de facto* in a single object (unless cl::location is
>>>>>>> To me *these points are all usability issues of cl::opt*. I care
>>>>>>> about not exposing unnecessary interfaces so cl::opt accepting the weird
>>>>>>> -t=d looks a downside to me.
>>>>>>>
>>>>>>> --demangle=0 is weird and some llvm/llvm/test tests do use cl::opt
>>>>>>> options this way, so we cannot just remove this usage. As a workaround, we
>>>>>>> could add a cl::foobar_toggle to a cl::opt to disallow =0.
>>>>>>> We would end with more customization for one option, cl::cat (for
>>>>>>> hiding unrelated options), cl::foobar_toggle (for disallowing =0), and
>>>>>>> potentially others for other ad-hoc tasks.
>>>>>>>
>>>>>>> I can highlight another thing about the global state of cl::opt => *library
>>>>>>> cl::opt and binary utility cl::opt share the same namespace*.
>>> > I can highlight another thing about the global state of cl::opt => *library
>>> cl::opt and binary utility cl::opt share the same namespace*.
>>>
>>> This can be worked around with linker garbage collection by discarding
>>> unreferenced cl::opt.
>>>
>>
>> I re-read these messages. I think you probably meant something more
>> generic - how to design decentralized command line option registry where
>> every library can register some options.
>>
>
>Yes indeed!
>
>I'm not against using OptTable for the sole purpose of binary tools, but
>that does not seem to provide us with a path forward. So I am afraid that
>this is a local optima that is short sighted.
Mentioned previously, I don't try to alter current cl::opt usage in
library code and non-binary-utility tools (e.g. opt,llc).
Though I am not sure I agree with clang/lld/llvm-symbolizer/llvm-objdump
are using a local optima. I think cl::opt just don't fit for their use cases.
OptTable is just the appropriate solution for them.
Bogus: since options are data with the OptTable approach, we can re-use
something like clang-tblgen -gen-opt-docs to help ensure the documentation
doesn't diverge from the reality.
>> The binary utilities command line parsing issue I intended to address is
>> fairly different:
>> We want a single registry of all options and don't want to inherit options
>> from llvm/lib/A just because the tool happens to depend on llvm/lib/A
>> directly or indirectly.
>>
>
>I agree, what I pointed at in MLIR was an attempt to achieve this, the tool
>has to explicitly call a function at runtime to make the options available.
>
>
>> E.g. llvm-nm needs to depend on bitcode reader because it needs to handle
>> LLVM bitcode files, however, I don't want a random cl::opt in bitcode
>> reader to appear in llvm-nm's command line option list.
>>
>> So I just built mlir-opt and inspected its --help output. It has exactly
>> the problem I called out in my first message:
>>
>
>> * We don't need to add cl::cat to every option and call
>>
>> `HideUnrelatedOptions` to hide unrelated options from --help. The issue
>> would happen with cl::opt tools if linker garbage collection is disabled or
>> libLLVM-13git.so is used. (See https://reviews.llvm.org/D104363)
>>
>> There are many unrelated options like --amdgpu-bypass-slow-div which is from llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
>>
>>
>I think you're missing the point here: mlir-opt does not have this problem *for
>mlir components.* It pays the price of the LLVM components using global
>constructors, but I pointed to the solution we use for mlir components that
>aren't using global constructors and avoid this problem.
See [1] above
llvm/tools/ include some binary utilities used as replacement for GNU binutils, e.g. llvm-objcopy, llvm-symbolizer, llvm-nm.
In some old threads people discussed some drawbacks of using cl::opt for user-facing utilities (I cannot find them now).
Switching to OptTable is an appealing solution. I have prepared two patches for two binary utilities: llvm-nm and llvm-strings.
* llvm-strings https://reviews.llvm.org/D104889
* llvm-nm https://reviews.llvm.org/D105330
llvm-symbolizer was switched last year. llvm-objdump was switched by thakis earlier this year.
The switch can fix some corners with lib/Support/CommandLine.cpp. Here is a summary:
* -t=d is removed (equal sign after a short option). Use -t d instead.
* --demangle=0 (=0 to disable a boolean option) is removed. Omit the option or use --no-demangle instead.
To me, removing these would make the interface *worse*. This is
purely subjective, but I use the second item regularly when
locally debugging to swap back and forth between two modes
easily.
* To support boolean options (e.g. --demangle --no-demangle), we don't need to compare their positions (if (NoDemangle.getPosition() > Demangle.getPosition()) , see llvm-nm.cpp)
* grouped short options can be specified with one line `setGroupedShortOptions`, instead of adding cl::Grouping to every short options.
* We don't need to add cl::cat to every option and call `HideUnrelatedOptions` to hide unrelated options from --help. The issue would happen with cl::opt tools if linker garbage collection is disabled or libLLVM-13git.so is used. (See https://reviews.llvm.org/D104363)
* If we decide to support binary utility multiplexting (https://reviews.llvm.org/D104686), we will not get conflicting options. An option may have different meanings in different utilities (especially for one-letter options).
I expect that most users will not observe any difference.
There is a related topic whether we should disallow the single-dash `-long-option` form.
(Discussed in 2019: https://lists.llvm.org/pipermail/llvm-dev/2019-April/131786.html Accept --long-option but not -long-option for llvm binary utilities)
I'd like to disallow -long-option but may want to do this in a separate change.
The main point is that (1) grouped short options have syntax conflict with one-dash long options. (2) the GNU getopt_long style two-dash long option is much more popular.
I can think of potential pushback for some Mach-O specific options, e.g. nm -arch
http://www.manpagez.com/man/1/nm/osx-10.12.6.php says `-arch` has one dash.
If such options may have problems, we can keep supporting one dash forms.
With OptTable, allowing one-dash forms for a specific option is easy.
On 7/2/21 10:14 AM, Fāng-ruì Sòng via llvm-dev wrote:
llvm/tools/ include some binary utilities used as replacement for GNU binutils, e.g. llvm-objcopy, llvm-symbolizer, llvm-nm.
In some old threads people discussed some drawbacks of using cl::opt for user-facing utilities (I cannot find them now).
Switching to OptTable is an appealing solution. I have prepared two patches for two binary utilities: llvm-nm and llvm-strings.
* llvm-strings https://reviews.llvm.org/D104889
* llvm-nm https://reviews.llvm.org/D105330
llvm-symbolizer was switched last year. llvm-objdump was switched by thakis earlier this year.
The switch can fix some corners with lib/Support/CommandLine.cpp. Here is a summary:
* -t=d is removed (equal sign after a short option). Use -t d instead.
* --demangle=0 (=0 to disable a boolean option) is removed. Omit the option or use --no-demangle instead.
To me, removing these would make the interface *worse*. This is purely subjective, but I use the second item regularly when locally debugging to swap back and forth between two modes easily
* To support boolean options (e.g. --demangle --no-demangle), we don't need to compare their positions (if (NoDemangle.getPosition() > Demangle.getPosition()) , see llvm-nm.cpp)
* grouped short options can be specified with one line `setGroupedShortOptions`, instead of adding cl::Grouping to every short options.
* We don't need to add cl::cat to every option and call `HideUnrelatedOptions` to hide unrelated options from --help. The issue would happen with cl::opt tools if linker garbage collection is disabled or libLLVM-13git.so is used. (See https://reviews.llvm.org/D104363)
* If we decide to support binary utility multiplexting (https://reviews.llvm.org/D104686), we will not get conflicting options. An option may have different meanings in different utilities (especially for one-letter options).
I expect that most users will not observe any difference.
There is a related topic whether we should disallow the single-dash `-long-option` form.
(Discussed in 2019: https://lists.llvm.org/pipermail/llvm-dev/2019-April/131786.html Accept --long-option but not -long-option for llvm binary utilities)
I'd like to disallow -long-option but may want to do this in a separate change.
The main point is that (1) grouped short options have syntax conflict with one-dash long options. (2) the GNU getopt_long style two-dash long option is much more popular.
I can think of potential pushback for some Mach-O specific options, e.g. nm -arch
http://www.manpagez.com/man/1/nm/osx-10.12.6.php says `-arch` has one dash.
If such options may have problems, we can keep supporting one dash forms.
With OptTable, allowing one-dash forms for a specific option is easy.
_______________________________________________ LLVM Developers mailing list llvm...@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
On Mon, Jul 5, 2021 at 9:43 AM Philip Reames <list...@philipreames.com> wrote:
On 7/2/21 10:14 AM, Fāng-ruì Sòng via llvm-dev wrote:
llvm/tools/ include some binary utilities used as replacement for GNU binutils, e.g. llvm-objcopy, llvm-symbolizer, llvm-nm.
In some old threads people discussed some drawbacks of using cl::opt for user-facing utilities (I cannot find them now).
Switching to OptTable is an appealing solution. I have prepared two patches for two binary utilities: llvm-nm and llvm-strings.
* llvm-strings https://reviews.llvm.org/D104889
* llvm-nm https://reviews.llvm.org/D105330
llvm-symbolizer was switched last year. llvm-objdump was switched by thakis earlier this year.
The switch can fix some corners with lib/Support/CommandLine.cpp. Here is a summary:
* -t=d is removed (equal sign after a short option). Use -t d instead.
* --demangle=0 (=0 to disable a boolean option) is removed. Omit the option or use --no-demangle instead.
To me, removing these would make the interface *worse*. This is purely subjective, but I use the second item regularly when locally debugging to swap back and forth between two modes easily
See Medhi's message: "I think part of the confusion on my side in this thread is that when I read "binary utilities" I thought first and foremost about `opt` and `lld`, while you're using "binary utilities" to refer to what I would call "end-user tools". I agree with you that tools like clang and lld are in a different category than `opt`."
The proposal is for llvm-{ar,cov,cxxfilt,nm,objcopy,objdump,readobj,size,strings,symbolizer}. The options mostly follow GNU, with a few LLVM extensions. There are really few options which default to true and may be toggled by users to false. When they have toggles, there are `--no-*` options.
It's not like opt or llc where you need something like -enable-new-pm=0 or -enable-lto-internalization=0
I think part of the confusion on my side in this thread is that when I read "binary utilities" I thought first and foremost about `opt` and `lld`, while you're using "binary utilities" to refer to what I would call "end-user tools".
I agree with you that tools like clang and lld are in a different category than `opt`.cl::opt as it may not be suitable as-is, but OptTable being not composable and not offering any facility to someone building a tool to re-expose library options is also quite limited. It seems to me that we need such a solution, and it also seems that if we had such solutions it would be suitable for all the tools we intend to build using LLVM-based libraries.Without any plan to build this though, I'm not trying to block progress on your cleanup/improvement of these end-user tools :)
Sadly agree :(
> Using OptTable has been suggested, but one of the common complaints is that OptTable for tool options is unwieldy and overly complicated for small simple tools. Also, there isn't a good way to handle options that are buried inside libraries, and many of the cl::opt options are.
Short options were a big problem of these user-facing binary utilities.
On the good side, most binary utilities are ready for the
crunchgen/busybox/multiplexer proposal now.
I have switched llvm-readobj (https://reviews.llvm.org/D105532)
llvm-size (https://reviews.llvm.org/D105598)
llvm-cxxfilt (https://reviews.llvm.org/D105605) llvm-nm
(https://reviews.llvm.org/D105330) llvm-strings
(https://reviews.llvm.org/D104889)
(and from the previous release llvm-symbolizer).
The descriptions say more on why OptTable is better than cl::opt for
user-facing options.
> Many years ago I initiated lengthy discussions on llvm-dev and at llvm socials about an alternate approach [2], but it was a half measure at best. One of the challenges that it didn't solve was the need for registering command line options for all the debugging values buried in the passes. I don't mean to derail this effort by sending us all down a rabbit hole, but I also think that for a real tractable solution to an llvm Busybox/multicall binary solution, we really need to do something about cl::opt.
In mlir, mlir/lib/Support/Timing.cpp uses this style
namespace {
struct DefaultTimingManagerOptions {
llvm::cl::opt<bool> timing{"mlir-timing",
llvm::cl::desc("Display execution times"),
llvm::cl::init(false)};
llvm::cl::opt<DisplayMode> displayMode{
"mlir-timing-display", llvm::cl::desc("Display method for timing data"),
llvm::cl::init(DisplayMode::Tree),
llvm::cl::values(
clEnumValN(DisplayMode::List, "list",
"display the results in a list sorted by total time"),
clEnumValN(DisplayMode::Tree, "tree",
"display the results ina with a nested tree view"))};
};
} // end anonymous namespace
static llvm::ManagedStatic<DefaultTimingManagerOptions> options;
void mlir::registerDefaultTimingManagerCLOptions() {
// Make sure that the options struct has been constructed.
*options;
}
Still cl::opt, but if register*CLOptions functions are well
controlled, the global option name space problem should be fine.
On Sep 16, 2021, at 6:19 PM, Leonard Chan <leona...@google.com> wrote: