Embedded Bitcode Design:
Embedded Bitcode are designed to enable bitcode distribution without disturbing
normal development flow. When a program is compiled with bitcode, clang will
embed the optimized bitcode in a special section in the object file, together
with the options that is used during the compilation. The object file will still
have the normal TEXT, DATA sections for normal linking. During the linking,
linker will check all the input object files have embedded bitcode and collect
the bitcode into an archive which is embedded in the output. The archive also
contains all the information that is needed to rebuild the linked binary. All
compilation and linking stage can be replayed to generated the final binary.
There are mainly two parts we would like to upstream first:
1. Clang Driver:
Adding -fembed-bitcode option. When this new option is used, it will split the
compilation into 2 stages. The first stage runs the frontend and all the
optimization passes, and the second stage embeds the bitcode from the first
stage then runs the CodeGen passes. There is also a -fembed-bitcode-marker
option that doesn't split the compilation into 2 stages and it only puts an 1
byte marker into the object file. This is used to speed up the debug build
because bitcode serialization and verification will make -fembed-bitcode slower
especially with -O0 -g. Linker can still check the presence of the section to
provide feedback if any of the object files participated in the linking is
missing bitcode in a full bitcode build.
2. Bitcode Embedding:
Several special sections are used by bitcode to mark the presence of the bitcode
in the MachO file.
"__LLVM, __bitcode" is used to store the optimized bitcode in the object file.
It can have an 1-byte size as a marker to provide diagnostics in debug build.
"__LLVM, __cmdline" is used to store the clang command-line options. There are
few options that are not reflected in the bitcode that we would like to replay in
the rebuild. For example, '-O0' option makes us run FastISel during rebuild.
Thanks
Steven
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Can you please explain how this relates to the existing .llvmbc section
feature?
Peter
> cfe-dev mailing list
> cfe...@lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
--
Peter
It is not currently related because we started the implementation before Thin-LTO
gets proposed in the community but our "__LLVM, __bitcode" section is pretty much
the same as ".llvmbc" section. Note ".llvmbc" doesn't really follow the section
naming convention for MachO objects. I am hoping to unify them during the upstream
of the implementation.
Thanks
Steven
That would be my main request. Seems like a nice feature, but we
should have one implementation of it :-)
BTW, can you explain a bit why you need things like "-O0" recorded? In
case you want to go from bitcode back to object file one file at a
time (no LTO)? Is that orthogonal? That is, should the command line be
included in .bc files too? What is the command line option that is
included, the -cc1 or the driver one?
There was some discussion on the past about which options get run in
clang if given -flto. For example, it seems likely that a more
conservative inlining pass would be a good thing to not remove
opportunities for the link time inlining. What would happen with
"-flto -fembed-bitcode"? Would the bitcode be the same as with just
-flto and the object file less optimized?
Cheers,
Rafael
I would like to echo Rafael's comments.
My general understanding is that given an object file with embedded IR I should be able to reproduce the same object.
Everything else should be "supporting" that objective... which might include relevant flags and transformations leading _to_ this IR and _from_ this IR to the given object code.
Does my understanding matches your overall goal?
Thanks.
Sergei
---
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
> -----Original Message-----
> From: llvm-dev [mailto:llvm-dev...@lists.llvm.org] On Behalf Of Rafael
> Espíndola via llvm-dev
> Sent: Thursday, February 04, 2016 4:01 PM
> To: Steven Wu
> Cc: LLVM Developers Mailing List; cfe-dev
> Subject: Re: [llvm-dev] [cfe-dev] [RFC] Embedding Bitcode in Object Files
>
> On 3 February 2016 at 14:01, Steven Wu via llvm-dev <llvm-
Thanks for the comment!
In terms of bitcode section, my plan is to make "__LLVM, __bitcode" section the MachO version of ".llvmbc" section. In latest Darwin OS, "__LLVM" segment will not be loaded by dyld when you try to execute a binary with embedded bitcode which is a plus for this feature.
And for the command line, Sergei has the correct idea about the motivation behind this. We want to have enough information to recreate the same binary from the embedded bitcode (at least when compiled with the same compiler). Here is an example:
$ clang -fembed-bitcode -O0 test.c -c -###
"clang" "-cc1" (...lots of options...) "-o" "test.bc" "-x" "c" "test.c" <--- First stage
"clang" "-cc1" "-triple" "x86_64-apple-macosx10.11.0" "-emit-obj" "-fembed-bitcode" "-O0" "-disable-llvm-optzns" "-o" "test.o" "-x" "ir" "test.bc" <--- Second stage
If we record all the options from the second stage, we can recreate the same object file using the exact same command. So, yes, they are cc1 flags. I understand they are no stable but second stage can only have a handful of options that: 1. affects codegen. 2. not embedded in the bitcode that should be record. This list should be shrinking towards zero eventually (not sure about -O0 and other optimization options). If we have to rename them before removing them from the embedding option list, we can provide upgrade for them.
This feature is orthogonal to LTO. For my current implementation, "-flto -fembed-bitcode" is the same as "-flto". Linker need to have the logic to handle a llvm bitcode file (treated as LTO) and a macho file with embedded bitcode (treated as normal link) differently.
Thanks
Steven
"__LLVM, __cmdline" is used to store the clang command-line options. There are
few options that are not reflected in the bitcode that we would like to replay in
the rebuild. For example, '-O0' option makes us run FastISel during rebuild.
$ clang -fembed-bitcode -O0 test.c -c -###
"clang" "-cc1" (...lots of options...) "-o" "test.bc" "-x" "c" "test.c" <--- First stage
"clang" "-cc1" "-triple" "x86_64-apple-macosx10.11.0" "-emit-obj" "-fembed-bitcode" "-O0" "-disable-llvm-optzns" "-o" "test.o" "-x" "ir" "test.bc" <--- Second stage
I don't think we need any path in the command line section. We only record the command-line options that will affect CodeGen. See my example in one of the preview reply:$ clang -fembed-bitcode -O0 test.c -c -###I can't think of any source path that can affect CodeGen. There should not be any paths other than the bitcode input path and binary output path exists in the second stage and they are excluded from the command line section as well. -fdebug-prefix-map is consumed by the front-end and prefixed paths are a part of the debug info in the metadata. You don't need to encode -fdebug-prefix-map in the bitcode section to reproduce the object file with the same debug info. Did that answer your concern?
"clang" "-cc1" (...lots of options...) "-o" "test.bc" "-x" "c" "test.c" <--- First stage
"clang" "-cc1" "-triple" "x86_64-apple-macosx10.11.0" "-emit-obj" "-fembed-bitcode" "-O0" "-disable-llvm-optzns" "-o" "test.o" "-x" "ir" "test.bc" <--- Second stage
I don't know whether this is an issue in the current implementation, but I wanted to bring up a potential privacy issue.
In embedding the information, care should be taken to avoid embedding any information that contains personally identifiable information. This can certainly occur
if paths need to be embedded, as user names, or other private/confidential information may be present in the naming of directories and paths. Generally, I suspect
that it would be desirable to have an opt-in strategy for designating in the compiler which pieces of information/options need to be saved, and for all options marked
as needed, determine whether there is the possibility/likelihood that they may contain personally identifiable information.
Kevin B. Smith
From: "Kevin B via llvm-dev Smith" <llvm...@lists.llvm.org>
To: "James Y Knight" <jykn...@google.com>, "Steven Wu" <stev...@apple.com>
Cc: llvm...@lists.llvm.org, "Clang Dev" <cfe...@lists.llvm.org>
Sent: Saturday, February 6, 2016 4:30:20 PM
Subject: Re: [llvm-dev] [RFC] Embedding Bitcode in Object Files
I don't know whether this is an issue in the current implementation, but I wanted to bring up a potential privacy issue.
In embedding the information, care should be taken to avoid embedding any information that contains personally identifiable information. This can certainly occur
if paths need to be embedded, as user names, or other private/confidential information may be present in the naming of directories and paths.
Generally, I suspect
that it would be desirable to have an opt-in strategy for designating in the compiler which pieces of information/options need to be saved, and for all options marked
as needed, determine whether there is the possibility/likelihood that they may contain personally identifiable information.
Kevin B. Smith
From: llvm-dev [mailto:llvm-dev...@lists.llvm.org] On Behalf Of James Y Knight via llvm-dev
Sent: Friday, February 05, 2016 3:13 PM
To: Steven Wu <stev...@apple.com>
Cc: LLVM Developers Mailing List <llvm...@lists.llvm.org>; Clang Dev <cfe...@lists.llvm.org>
Subject: Re: [llvm-dev] [RFC] Embedding Bitcode in Object Files
On Fri, Feb 5, 2016 at 6:06 PM, Steven Wu <stev...@apple.com> wrote:
I don't think we need any path in the command line section. We only record the command-line options that will affect CodeGen. See my example in one of the preview reply:
$ clang -fembed-bitcode -O0 test.c -c -###
"clang" "-cc1" (...lots of options...) "-o" "test.bc" "-x" "c" "test.c" <--- First stage
"clang" "-cc1" "-triple" "x86_64-apple-macosx10.11.0" "-emit-obj" "-fembed-bitcode" "-O0" "-disable-llvm-optzns" "-o" "test.o" "-x" "ir" "test.bc" <--- Second stageI can't think of any source path that can affect CodeGen. There should not be any paths other than the bitcode input path and binary output path exists in the second stage and they are excluded from the command line section as well. -fdebug-prefix-map is consumed by the front-end and prefixed paths are a part of the debug info in the metadata. You don't need to encode -fdebug-prefix-map in the bitcode section to reproduce the object file with the same debug info. Did that answer your concern?
Great -- it wasn't clear from the first message if you were just embedding the whole command-line as is. If the plan instead to embed only a few relevant options, I agree there should be no issue as far as paths go.
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Hal,
No, it is not more of a problem than with DWARF info. DWARF info definitely contains personally identifiable information. However, people usually realize that is the case,
and will turn off or strip debug info if they are worried about such issues, or make a specific plan to cleanse that information.
You really just want to attempt to eliminate such information to the greatest extent possible. The desirability of using embedded Bitcode in libraries (which is a very
natural use model, that I'm pretty sure this is intended to support), will be improved by taking into consideration this aspect of the implementation.
Kevin Smith
__FILE__ is a frontend issue, I still have to add some equivalent to my
remap patches for that into clang...
Joerg
I don't know what is involved in upstreaming that, but yes, it seems very useful/necessary to me.
My 2c…
Benefits of the feature clearly outweigh any potential privacy concerns from my point of view… and yes, there are multiple ways to deal with privacy even if it is an issue.
Sergei
---
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation