[llvm-dev] [inline-asm][asm-goto] Supporting "asm goto" in inline assembly

Yatsina, Marina via llvm-dev

unread,

Mar 29, 2017, 3:38:19 AM3/29/17

to llvm...@lists.llvm.org, r...@google.com, jykn...@google.com, eh...@mozilla.com, rjmc...@apple.com, mehdi...@apple.com, chan...@gmail.com, ma...@braunis.de

Hi,

I wanted to revive this issue of supporting asm goto (Bug 9295).

As was already proposed, the best way seems to be introducing new IR.

If we’re changing the IR, we should probably provide an infrastructure that solves or at least enables future support for things like:

1. MS-style inline asm jmps and goto (Bug 24529)

2. Analyzing symbols defined/references in the inline assembly (Bug 28970), taking into account module/file-scope inline assembly.

3. Provide some information about the cost of the inline assembly? (I’m not sure if we want to couple it with this issue and if the cost should be represented in this new IR or some other way)

This new IR should contain:

1. A list of symbols referenced by the inline assembly

a. Should we try to provide better analysis of how we use these symbols (e.g. in jmp instructions, in call instructions)? Can it help us do some error checking?

2. A list of symbols defines by the inline assembly

a. Should we provide additional information about how symbols defined (e.g. which symbols defined as globals)?.

At a later stage we also need to discuss other aspects of this new instruction (e.g. it being a block terminator and other behaviors we might want to be affected by it).

Any other thoughts and ideas we want to include in this change that I’m missing?

Thanks,

Marina

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

Chandler Carruth via llvm-dev

unread,

Mar 30, 2017, 4:20:59 AM3/30/17

to Yatsina, Marina, llvm...@lists.llvm.org, r...@google.com, jykn...@google.com, eh...@mozilla.com, rjmc...@apple.com, mehdi...@apple.com, chan...@gmail.com, ma...@braunis.de

On Wed, Mar 29, 2017 at 9:38 AM Yatsina, Marina via llvm-dev <llvm...@lists.llvm.org> wrote:

Hi,

I wanted to revive this issue of supporting asm goto (Bug 9295).

FWIW, my opinion is that the use cases don't really justify the complexity this adds to the compiler. I feel like embedding inline assembly into normal C control flow constructs and writing the entire thing directly give pretty reasonable options.

But maybe others really see high-value reasons to add support for this... If they do...

As was already proposed, the best way seems to be introducing new IR.

If we’re changing the IR, we should probably provide an infrastructure that solves or at least enables future support for things like:

1. MS-style inline asm jmps and goto (Bug 24529)

I *strongly* agree with the 'WONTFIX' resolution here. More than GCC's "asm goto", this feature seems much more harmful to the compiler and much less well motivated.

2. Analyzing symbols defined/references in the inline assembly (Bug 28970), taking into account module/file-scope inline assembly.

Unless we decide to address #1 in some way, this seems completely orthogonal to "asm goto". While I'd love to see a good resolution to PR28970, I don't think it makes sense to couple the two together. Among other things, none of my concerns about "asm goto" apply to simply exposing the symbols defined.

3. Provide some information about the cost of the inline assembly? (I’m not sure if we want to couple it with this issue and if the cost should be represented in this new IR or some other way)

I suspect this too should be an orthogonal discussion.

I think we should expose a TTI-cost-analysis API and allow passing inline assembly to it. Then targets can, if they choose, actually analyze the assembly to compute a cost. But I don't think we need IR changes here and even if we do, likely orthogonal ones to "asm goto".

Yatsina, Marina via llvm-dev

unread,

Mar 30, 2017, 10:44:43 AM3/30/17

to Chandler Carruth, llvm...@lists.llvm.org, r...@google.com, jykn...@google.com, eh...@mozilla.com, rjmc...@apple.com, mehdi...@apple.com, ma...@braunis.de, Tayree, Coby

Linux kernel is using the “asm goto” feature, other projects probably use it as well. I think it provides motivation to support it in LLVM.

Regarding the complexity, I believe there is some infrastructure that we can at least partially reuse (the support for “indirectbr” instruction).

My focus is adding “asm goto” support, the other things are indeed completely orthogonal and came up in bugs related to Bug 9295.

The reason I mentioned them in this discussion is that they seem to require IR change as well.

I thought that if we’re doing this IR change for “asm goto” it’s worth checking if there is additional useful information we want to expose except the C labels used by inline assembly.

If you prefer to separate the IR changes for “asm goto” and the IR changes that will allow exposing defined symbols, then we can focus on the “asm goto” feature alone.

Thanks,

Marina

---------------------------------------------------------------------
Intel Israel (74) Limited

Joerg Sonnenberger via llvm-dev

unread,

Mar 30, 2017, 3:38:49 PM3/30/17

to llvm...@lists.llvm.org

On Thu, Mar 30, 2017 at 02:44:32PM +0000, Yatsina, Marina via llvm-dev wrote:
> Linux kernel is using the “asm goto” feature, other projects probably
> use it as well. I think it provides motivation to support it in LLVM.

I haven't seen anything in all of pkgsrc. Just as data point.

Joerg
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Chandler Carruth via llvm-dev

unread,

Mar 30, 2017, 4:22:36 PM3/30/17

to Yatsina, Marina, llvm...@lists.llvm.org, r...@google.com, jykn...@google.com, eh...@mozilla.com, rjmc...@apple.com, mehdi...@apple.com, ma...@braunis.de, Tayree, Coby

Just responding to the motivation stuff as that remains an open question:

On Thu, Mar 30, 2017 at 4:44 PM Yatsina, Marina <marina....@intel.com> wrote:

Linux kernel is using the “asm goto” feature,

But your original email indicated they have an alternative code path for compilers that don't support it?

What might be compelling would be if there are serious performance problems when using the other code path that cannot be addressed by less invasive (and more general) improvements to LLVM. If this is the *only* way to get comparable performance from the Linux Kernel, then I think that might be an interesting discussion. But it would take a very careful and detailed analysis of why IMO.

other projects probably use it as well.

This is entirely possible, but I'd like to understand which projects and why they use it rather than any of the alternatives before we impose the implementation complexity on LLVM. At least that's my two cents.

-Chandler

Reid Kleckner via llvm-dev

unread,

Mar 30, 2017, 4:35:16 PM3/30/17

to Yatsina, Marina, llvm...@lists.llvm.org, ma...@braunis.de, eh...@mozilla.com

Hi Marina,

As you've highlighted, LLVM's support for inline assembly could definitely use work. It is not a well-loved area of the compiler. =/ These are some of my thoughts on what we could do.

I'd prefer to not support jumps into and out of MS inline asm blobs. I sketched out a design for doing it in http://llvm.org/pr24529#c4, but it creates more LLVM IR basic blocks that have no valid insertion point. I'd like to move away from that, not towards it. MSVC doesn't even support inline asm on non-x86, so over time there will be less demand for this esoteric feature, not more.

I actually think supporting a terminator asm instruction for "asm goto" wouldn't be that expensive. We already have terminators like indirectbr and catchswitch that create edges that can't be split. I suspect we will want to add more to support parallel IR. Adding one for "asm goto" doesn't seem that bad, but I'm not convinced it's worth it.

Digging the symbol uses out of inline asm and the symbol definitions out of module inline asm seems totally reasonable. Adding this to the IR representation seems nice.

Improving our cost estimates with the help of MC would also be nice.

Reid

Yatsina, Marina via llvm-dev

unread,

Apr 4, 2017, 9:07:29 AM4/4/17

to Chandler Carruth, llvm...@lists.llvm.org, r...@google.com, jykn...@google.com, eh...@mozilla.com, rjmc...@apple.com, mehdi...@apple.com, ma...@braunis.de, Tayree, Coby

Asm goto feature was introduces to GCC in order to optimize the support for tracepoints in Linux kernel (it can be used for other things that do nop patching).

GCC documentation describes their motivating example here:


https://gcc.gnu.org/onlinedocs/gcc-4.8.4/gcc/Extended-Asm.html

     #define TRACE1(NUM)                         \

       do {                                      \

         asm goto ("0: nop;"                     \

                   ".pushsection trace_table;"   \

                   ".long 0b, %l0;"              \

                   ".popsection"                 \

                   : : : : trace#NUM);           \

         if (0) { trace#NUM: trace(); }          \

       } while (0)

     #define TRACE  TRACE1(__COUNTER__)

In this example (which in fact inspired the asm goto feature) we want on rare occasions to call the trace function; on other occasions we'd like to keep the overhead to the absolute minimum. The normal code path consists of a single nop instruction. However, we record the address of this nop together with the address of a label that calls the trace function. This allows the nop instruction to be patched at run time to be an unconditional branch to the stored label. It is assumed that an optimizing compiler moves the labeled block out of line, to optimize the fall through path from the asm.

Here is the Linux kernel RFC which discusses the old C way of implementing it and the performance issues that were noticed.

It also states some performance numbers of the old C code vs. the asm goto:

https://lwn.net/Articles/350714/

This LTTng (Linux Trace Toolkit Next Generation) presentation talks about using this feature as a way of optimize static tracepoints (slides 3-4)

https://www.computer.org/cms/ComputingNow/HomePage/2011/0111/rW_SW_UsingTracing.pdf

This presentation also mentions that a lot of other Linux applications use this tracing mechanism.

I believe we already have much of the infrastructure in place (using the indirecbr instruction infrastructure).

We do need to make sure MachineBlockPlacement optimizes the fall through path to make sure we can gain the performance for the nop patching.

Thanks,

Marina

From: Chandler Carruth [mailto:chan...@gmail.com]
Sent: Thursday, March 30, 2017 23:22
To: Yatsina, Marina <marina....@intel.com>; llvm...@lists.llvm.org; r...@google.com; jykn...@google.com; eh...@mozilla.com; rjmc...@apple.com; mehdi...@apple.com; ma...@braunis.de; Tayree, Coby <coby....@intel.com>
Subject: Re: [llvm-dev] [inline-asm][asm-goto] Supporting "asm goto" in inline assembly

Just responding to the motivation stuff as that remains an open question:

---------------------------------------------------------------------
Intel Israel (74) Limited

Chandler Carruth via llvm-dev

unread,

Apr 4, 2017, 12:27:07 PM4/4/17

to Yatsina, Marina, Chandler Carruth, llvm...@lists.llvm.org, r...@google.com, jykn...@google.com, eh...@mozilla.com, rjmc...@apple.com, mehdi...@apple.com, ma...@braunis.de, Tayree, Coby

On Tue, Apr 4, 2017 at 6:07 AM Yatsina, Marina via llvm-dev <llvm...@lists.llvm.org> wrote:

Asm goto feature was introduces to GCC in order to optimize the support for tracepoints in Linux kernel (it can be used for other things that do nop patching).

GCC documentation describes their motivating example here:
https://gcc.gnu.org/onlinedocs/gcc-4.8.4/gcc/Extended-Asm.html
 
     #define TRACE1(NUM)                         \
       do {                                      \
         asm goto ("0: nop;"                     \
                   ".pushsection trace_table;"   \
                   ".long 0b, %l0;"              \
                   ".popsection"                 \
                   : : : : trace#NUM);           \
         if (0) { trace#NUM: trace(); }          \
       } while (0)
     #define TRACE  TRACE1(__COUNTER__)
In this example (which in fact inspired the asm goto feature) we want on rare occasions to call the trace function; on other occasions we'd like to keep the overhead to the absolute minimum. The normal code path consists of a single nop instruction. However, we record the address of this nop together with the address of a label that calls the trace function. This allows the nop instruction to be patched at run time to be an unconditional branch to the stored label. It is assumed that an optimizing compiler moves the labeled block out of line, to optimize the fall through path from the asm.

Here is the Linux kernel RFC which discusses the old C way of implementing it and the performance issues that were noticed.

It also states some performance numbers of the old C code vs. the asm goto:

https://lwn.net/Articles/350714/
 
This LTTng (Linux Trace Toolkit Next Generation) presentation talks about using this feature as a way of optimize static tracepoints (slides 3-4)
https://www.computer.org/cms/ComputingNow/HomePage/2011/0111/rW_SW_UsingTracing.pdf
This presentation also mentions that a lot of other Linux applications use this tracing mechanism.

Thanks, this is exactly the kind of discussion that I think will help make progress here.

I think this feature makes a lot of sense and is a really nice feature. However, I think implementing it with inline assembly imposes a lot of really unfortunate constraints on compilation -- it requires asm goto, pushsection and popsection, etc.

I would much rather provide a much more direct way to represent a patchable nop and the addresses of label within a function. For example, I could imagine something like:

```

if (0) { trace_call: /* code to call the trace function */ }

patch: __builtin_patchable_nop()

__builtin_save_labels(trace_call, patch)

```

But someone can probably design a much better way to represent this in Clang. The advantages I see here (admittedly, mostly for the implementation in Clang and LLVM):

1) It allows Clang and LLVM to model this with running an assembler over anything.

2) It doesn't require new terminators in LLVM's IR

3) We already have intrinsics in LLVM's IR that could easily be extended to produce a nop.

4) It would be portable -- each backend could select an appropriate sized nop to patch a jump into

Would this make sense?

I believe we already have much of the infrastructure in place (using the indirecbr instruction infrastructure).

We do need to make sure MachineBlockPlacement optimizes the fall through path to make sure we can gain the performance for the nop patching.

Thanks,

Marina

From: Chandler Carruth [mailto:chan...@gmail.com]
Sent: Thursday, March 30, 2017 23:22
To: Yatsina, Marina <marina....@intel.com>; llvm...@lists.llvm.org; r...@google.com; jykn...@google.com; eh...@mozilla.com; rjmc...@apple.com; mehdi...@apple.com; ma...@braunis.de; Tayree, Coby <coby....@intel.com>
Subject: Re: [llvm-dev] [inline-asm][asm-goto] Supporting "asm goto" in inline assembly

Just responding to the motivation stuff as that remains an open question:

On Thu, Mar 30, 2017 at 4:44 PM Yatsina, Marina <marina....@intel.com> wrote:

Linux kernel is using the “asm goto” feature,

But your original email indicated they have an alternative code path for compilers that don't support it?

What might be compelling would be if there are serious performance problems when using the other code path that cannot be addressed by less invasive (and more general) improvements to LLVM. If this is the *only* way to get comparable performance from the Linux Kernel, then I think that might be an interesting discussion. But it would take a very careful and detailed analysis of why IMO.

other projects probably use it as well.

This is entirely possible, but I'd like to understand which projects and why they use it rather than any of the alternatives before we impose the implementation complexity on LLVM. At least that's my two cents.

-Chandler

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

_______________________________________________

Matthias Braun via llvm-dev

unread,

Apr 4, 2017, 2:13:01 PM4/4/17

to Chandler Carruth, Yatsina, Marina, llvm...@lists.llvm.org, Tayree, Coby, eh...@mozilla.com

My two cents:

- I think inline assembly should work even if the compiler cannot parse the contents. This would rule out msvc inline assembly (or alternatively put all the parsing and interpretation burden on the frontend), but would work with gcc asm goto which specifies possible targets separately.

- Supporting control flow in inline assembly by allowing jumps out of an assembly block seems natural to me.

- Jumping into an inline assembly block seems like an unnecessary feature to me.

- To have this working in lib/CodeGen we would need an alternative opcode with the terminator flag set. (There should also be opportunities to remodel some instruction flags in the backend, to be part of the MachineInstr instead of the opcode, but that is an orthogonal discussion to this)

- I don't foresee big problems in CodeGen, we should take a look on how computed goto is implementation to find ways to reference arbitrary basic blocks.

- The register allocator fails when the terminator instruction also writes a register which is subsequently spilled (none of the existing targets does that, but you could specify this situation in inline assembly).

- I'd always prefer intrinsics over inline assembly. Hey, why don't we add a -Wassembly that warns on inline assembly usage and is enabled by default...

- I still think inline assembly is valuable for new architecture bringup/experimentation situations.

- Matthias

On Apr 4, 2017, at 9:26 AM, Chandler Carruth via llvm-dev <llvm...@lists.llvm.org> wrote:

On Tue, Apr 4, 2017 at 6:07 AM Yatsina, Marina via llvm-dev <llvm...@lists.llvm.org> wrote:

Asm goto feature was introduces to GCC in order to optimize the support for tracepoints in Linux kernel (it can be used for other things that do nop patching).

GCC documentation describes their motivating example here:


https://gcc.gnu.org/onlinedocs/gcc-4.8.4/gcc/Extended-Asm.html

     #define TRACE1(NUM)                         \

       do {                                      \

         asm goto ("0: nop;"                     \

                   ".pushsection trace_table;"   \

                   ".long 0b, %l0;"              \

                   ".popsection"                 \

                   : : : : trace#NUM);           \

         if (0) { trace#NUM: trace(); }          \

       } while (0)

     #define TRACE  TRACE1(__COUNTER__)

In this example (which in fact inspired the asm goto feature) we want on rare occasions to call the trace function; on other occasions we'd like to keep the overhead to the absolute minimum. The normal code path consists of a single nop instruction. However, we record the address of this nop together with the address of a label that calls the tracefunction. This allows the nop instruction to be patched at run time to be an unconditional branch to the stored label. It is assumed that an optimizing compiler moves the labeled block out of line, to optimize the fall through path from the asm.

John McCall via llvm-dev

unread,

Apr 4, 2017, 2:44:14 PM4/4/17

to Matthias Braun, Yatsina, Marina, llvm...@lists.llvm.org, Tayree, Coby, eh...@mozilla.com

On Apr 4, 2017, at 2:12 PM, Matthias Braun <ma...@braunis.de> wrote:
My two cents:

- I think inline assembly should work even if the compiler cannot parse the contents. This would rule out msvc inline assembly (or alternatively put all the parsing and interpretation burden on the frontend), but would work with gcc asm goto which specifies possible targets separately.
- Supporting control flow in inline assembly by allowing jumps out of an assembly block seems natural to me.
- Jumping into an inline assembly block seems like an unnecessary feature to me.
- To have this working in lib/CodeGen we would need an alternative opcode with the terminator flag set. (There should also be opportunities to remodel some instruction flags in the backend, to be part of the MachineInstr instead of the opcode, but that is an orthogonal discussion to this)
- I don't foresee big problems in CodeGen, we should take a look on how computed goto is implementation to find ways to reference arbitrary basic blocks.
- The register allocator fails when the terminator instruction also writes a register which is subsequently spilled (none of the existing targets does that, but you could specify this situation in inline assembly).
- I'd always prefer intrinsics over inline assembly. Hey, why don't we add a -Wassembly that warns on inline assembly usage and is enabled by default...
- I still think inline assembly is valuable for new architecture bringup/experimentation situations.

To me, this feels like a great example of "we really wanted a language feature, but we figured out that we could hack it in using inline assembly in a way that's ultimately significantly harder for the compiler to support than a language feature, and now it's your problem." I agree with Chandler that we should just design and implement the language feature.

I would recommend:

if (__builtin_patchable_branch("section name")) {

trace();

}

==>

%0 = call i1 @llvm.patchable_branch(i8* @sectionNameString)

br %0, ...

where @llvm.patchable_branch has the semantics of appending whatever patching information is necessary to the given section such that, if you apply the patch, it will change the result of the call from 0 to 1. That can then typically be pattern-matched in the backend to get the optimal codegen.

If I might recommend a better ABI for the patching information: consider using a pair of relative pointers, one from the patching information to the patchable instruction, and one from the patchable instruction to the new target. That would allow the patching information to be relocated at zero cost.

The actual details of how to apply the patch, and what the inline patchable-instruction sequence needs to be in order to accept the patch, would be target-specific. The documented motivating example seems to assume that a single nop is always big enough, which is pretty questionable.

This feature could be made potentially interesting to e.g. JIT authors by allowing the patching information to be embellished with additional information to identify the source branch.

John.

James Y Knight via llvm-dev

unread,

Apr 4, 2017, 3:27:49 PM4/4/17

to Matthias Braun, Yatsina, Marina, llvm...@lists.llvm.org, eh...@mozilla.com, Tayree, Coby

On Tue, Apr 4, 2017 at 2:12 PM, Matthias Braun <ma...@braunis.de> wrote:

- The register allocator fails when the terminator instruction also writes a register which is subsequently spilled (none of the existing targets does that, but you could specify this situation in inline assembly).

You can't actually have outputs from an asm goto in the GCC implementation (and I'd suggest leaving that restriction in the LLVM implementation too if it makes the implementation easier).

From GCC docs:

"This form of asm is restricted to not have outputs. This is due to a internal restriction in the compiler that control transfer instructions cannot have outputs. This restriction on asm goto may be lifted in some future version of the compiler. In the meantime, asm goto may include a memory clobber, and so leave outputs in memory."

You can still have register clobbers, which I suppose might trigger the same failure case?

Matthias Braun via llvm-dev

unread,

Apr 4, 2017, 4:10:09 PM4/4/17

to James Y Knight, llvm...@lists.llvm.org, Tayree, Coby, Yatsina, Marina, eh...@mozilla.com

Ah that is convenient :)

You can still have register clobbers, which I suppose might trigger the same failure case?

clobbers are less of a problem as there is no reason to spill a register that is just clobbered by a regmask.

- Matthias

Matthias Braun via llvm-dev

unread,

Apr 4, 2017, 4:13:27 PM4/4/17

to John McCall, llvm...@lists.llvm.org, eh...@mozilla.com, Yatsina, Marina, Tayree, Coby

On Apr 4, 2017, at 11:44 AM, John McCall via llvm-dev <llvm...@lists.llvm.org> wrote:

On Apr 4, 2017, at 2:12 PM, Matthias Braun <ma...@braunis.de> wrote:
My two cents:

- I think inline assembly should work even if the compiler cannot parse the contents. This would rule out msvc inline assembly (or alternatively put all the parsing and interpretation burden on the frontend), but would work with gcc asm goto which specifies possible targets separately.
- Supporting control flow in inline assembly by allowing jumps out of an assembly block seems natural to me.
- Jumping into an inline assembly block seems like an unnecessary feature to me.
- To have this working in lib/CodeGen we would need an alternative opcode with the terminator flag set. (There should also be opportunities to remodel some instruction flags in the backend, to be part of the MachineInstr instead of the opcode, but that is an orthogonal discussion to this)
- I don't foresee big problems in CodeGen, we should take a look on how computed goto is implementation to find ways to reference arbitrary basic blocks.
- The register allocator fails when the terminator instruction also writes a register which is subsequently spilled (none of the existing targets does that, but you could specify this situation in inline assembly).
- I'd always prefer intrinsics over inline assembly. Hey, why don't we add a -Wassembly that warns on inline assembly usage and is enabled by default...
- I still think inline assembly is valuable for new architecture bringup/experimentation situations.

To me, this feels like a great example of "we really wanted a language feature, but we figured out that we could hack it in using inline assembly in a way that's ultimately significantly harder for the compiler to support than a language feature, and now it's your problem." I agree with Chandler that we should just design and implement the language feature.

I would recommend:

if (__builtin_patchable_branch("section name")) {
trace();
}

==>

%0 = call i1 @llvm.patchable_branch(i8* @sectionNameString)
br %0, ...

where @llvm.patchable_branch has the semantics of appending whatever patching information is necessary to the given section such that, if you apply the patch, it will change the result of the call from 0 to 1. That can then typically be pattern-matched in the backend to get the optimal codegen.

If I might recommend a better ABI for the patching information: consider using a pair of relative pointers, one from the patching information to the patchable instruction, and one from the patchable instruction to the new target. That would allow the patching information to be relocated at zero cost.

The actual details of how to apply the patch, and what the inline patchable-instruction sequence needs to be in order to accept the patch, would be target-specific. The documented motivating example seems to assume that a single nop is always big enough, which is pretty questionable.

This feature could be made potentially interesting to e.g. JIT authors by allowing the patching information to be embellished with additional information to identify the source branch.

I completely agree that for this example we rather want a proper intrinsic. As a matter of fact we have similar mechanism in CodeGen already to support the XRay feature.

- Matthias

Dean Michael Berris via llvm-dev

unread,

Apr 6, 2017, 10:05:39 PM4/6/17

to Matthias Braun, LLVM Developers, Tayree, Coby, Yatsina, Marina, eh...@mozilla.com

I for one would really like that intrinsic. I have something similar under review, which wraps a function call for XRay's custom event logging feature (I should've sent an RFC on this I realise, I'll do that next). Patch doing this in particular for XRay's requirements are in

https://reviews.llvm.org/D27503 -- wherein we do the following:

- In LLVM IR, lower calls to the @llvm.xray.customevent(...) intrinsic into something like:

# align to 2 byte address

.xray_sled_N

jmp +NN

# calling convention setup

call <XRay's trampoline>

We also mark the point where this sled is in the instrumentation map.

- At runtime we overwrite the jump to become nops.

If we get this patchable branch intrinsic, then we can just certainly use that in lowering the XRay built-in we're trying to add to Clang as well.

/me goes writing up the RFC for the custom event logging.

Cheers

-- Dean

David Woodhouse via llvm-dev

unread,

Feb 12, 2018, 11:26:14 AM2/12/18

to Matthias Braun, John McCall, Peter Zijlstra, llvm...@lists.llvm.org, Tayree, Coby, Yatsina, Marina, eh...@mozilla.com

FYI there is now serious talk of the Linux kernel dropping support for compilers that *don't* support asm goto.

https://lkml.org/lkml/2018/2/12/78

Peter Zijlstra via llvm-dev

unread,

Feb 12, 2018, 12:50:53 PM2/12/18

to David Woodhouse, Yatsina, Marina, llvm...@lists.llvm.org, Matthias Braun, eh...@mozilla.com, Tayree, Coby

On Mon, Feb 12, 2018 at 04:19:30PM +0000, David Woodhouse wrote:
> FYI there is now serious talk of the Linux kernel dropping support for
> compilers that *don't* support asm goto.

> > I completely agree that for this example we rather want a proper

> > intrinsic. As a matter of fact we have similar mechanism in CodeGen
> > already to support the XRay feature.

Also, we're very much _NOT_ going to support anything 'similar' but
completely dfferent. It's asm-goto or nothing.

David Woodhouse via llvm-dev

unread,

Feb 14, 2018, 5:34:47 AM2/14/18

to Chandler Carruth, Yatsina, Marina, llvm...@lists.llvm.org, r...@google.com, jykn...@google.com, eh...@mozilla.com, rjmc...@apple.com, mehdi...@apple.com, ma...@braunis.de, Tayree, Coby

Let's not conflate the asm-goto part with the .pushsection/.popsection.

The latter ("0: .pushsection foo; .long 0b; .popsection") is used *all*
over the kernel to build up tables of code locations — for exception
handling of instructions which might fault, as well as for runtime
patching of instructions like the above. It's not always a nop vs. call
alternative.

It would be nice to have the compiler assist with that. We currently
have code to trawl through all the built object files and find calls to
__fentry__ so we can patch them in/out at runtime, for example. And we
might considered doing the same for calls to the retpoline thunks.

But I think we would be best served right now by considering that out
of scope, and looking *only* at the part which is handled by 'asm
goto'.

Reid Kleckner via llvm-dev

unread,

Feb 14, 2018, 12:47:04 PM2/14/18

to David Woodhouse, Yatsina, Marina, llvm...@lists.llvm.org, Tayree, Coby, ma...@braunis.de, eh...@mozilla.com

On Wed, Feb 14, 2018 at 2:34 AM, David Woodhouse <dw...@infradead.org> wrote:

Let's not conflate the asm-goto part with the .pushsection/.popsection.

The latter ("0: .pushsection foo; .long 0b; .popsection") is used *all*
over the kernel to build up tables of code locations — for exception
handling of instructions which might fault, as well as for runtime
patching of instructions like the above. It's not always a nop vs. call
alternative.

It would be nice to have the compiler assist with that. We currently
have code to trawl through all the built object files and find calls to
__fentry__ so we can patch them in/out at runtime, for example. And we
might considered doing the same for calls to the retpoline thunks.

But I think we would be best served right now by considering that out
of scope, and looking *only* at the part which is handled by 'asm
goto'.

Collecting labels in code is a pretty interesting and reasonable use case. I'd be really interested in a separate discussion or document describing the kernel use cases for this. In particular, I'm worried about what things LLVM might hoist across the label. We might want to end up designing something more like real asynchronous exception handling (__try) than all this assembly blob stuff.

But I agree, asm-goto is probably higher priority.

Reply all

Reply to author

Forward