[llvm-dev] Metadata in LLVM back-end

Son Tuan VU via llvm-dev

unread,

Jul 22, 2020, 8:31:35 AM7/22/20

to llvm-dev

Hi all,

Currently metadata (other than debug info) can be attached to IR instructions but disappears during DAG selection.

My question is why we do not keep the metadata during code lowering and then attach to MachineInstr, just as for IR instructions? Is there any technical challenge, or is it only because nobody wants to do so?

Thanks for your help,

Best,

Jinyan via llvm-dev

unread,

Jul 22, 2020, 10:26:56 AM7/22/20

to llvm...@lists.llvm.org

I meet same question with you, and I finnally choose using instrinsic to tag the instruction

-----Original Messages-----
From:"Son Tuan VU via llvm-dev" <llvm...@lists.llvm.org>
Sent Time:2020-07-22 20:31:18 (Wednesday)
To: llvm-dev <llvm...@lists.llvm.org>
Cc:
Subject: [llvm-dev] Metadata in LLVM back-end

ATT00003.txt

David Greene via llvm-dev

unread,

Jul 27, 2020, 1:11:55 PM7/27/20

to Son Tuan VU, llvm-dev

I have wanted codegen metadata for a very long time so I'm interested to
hear the history behind this choice, and more importantly, whether
adding such capability would be generally acceptable to the community.

-David
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Chris Lattner via llvm-dev

unread,

Jul 27, 2020, 9:00:20 PM7/27/20

to David Greene, llvm-dev

> On Jul 27, 2020, at 10:11 AM, David Greene via llvm-dev <llvm...@lists.llvm.org> wrote:
>
> Son Tuan VU via llvm-dev <llvm...@lists.llvm.org> writes:
>
>> Currently metadata (other than debug info) can be attached to IR
>> instructions but disappears during DAG selection.
>>
>> My question is why we do not keep the metadata during code lowering and
>> then attach to MachineInstr, just as for IR instructions? Is there any
>> technical challenge, or is it only because nobody wants to do so?
>
> I have wanted codegen metadata for a very long time so I'm interested to
> hear the history behind this choice, and more importantly, whether
> adding such capability would be generally acceptable to the community.

The first questions need to be “what does it mean?”, “how does it work?”, and “what is it useful for?”. It is hard to evaluate a proposal without that.

Metadata isn’t free - it must be maintained or invalidated for it to be useful. The details on that dramatically shape whether it can be used for any given purpose.

-Chris

Lorenzo Casalino via llvm-dev

unread,

Jul 29, 2020, 3:33:34 AM7/29/20

to llvm...@lists.llvm.org

On Jul 27, 2020, at 10:11 AM, David Greene via llvm-dev <llvm...@lists.llvm.org> wrote:

Son Tuan VU via llvm-dev <llvm...@lists.llvm.org> writes:

Currently metadata (other than debug info) can be attached to IR
instructions but disappears during DAG selection.

My question is why we do not keep the metadata during code lowering and
then attach to MachineInstr, just as for IR instructions? Is there any
technical challenge, or is it only because nobody wants to do so?

I have wanted codegen metadata for a very long time so I'm interested to
hear the history behind this choice, and more importantly, whether
adding such capability would be generally acceptable to the community.

The first questions need to be “what does it mean?”, “how does it work?”, and “what is it useful for?”.  It is hard to evaluate a proposal without that.

Hi everyone,

I'm trying to answer to each of these questions; it is likely the answers won't be exhaustive, but I hope they will serve as a starting point for an interesting proposal (from my point of view and the one ofSon Tuan VU andDavid Greene):

- "What does it mean?": it means to preserve specific information, represented as metadata assigned to instructions, from the IR level, down to the codegen phases.

- "How does it work?": metadata should be preserved during the several back-end transformations; for instance, during the lowering phase, DAGCombine performs several optimization to the IR, potentially combining several instructions. The new instruction should, then, assigned with metadata obtained as a proper combination of the original ones (e.g., a union of metadata information).

It might be possible to have a dedicated data-structure for such metadata info, and an instance of such structure assigned to each instruction.

- "What is it useful for?": I think it is quite context-specific; but, in general, it is useful when some "higher-level" information (e.g., that canbe discovered only before the back-end stage of the compiler) are required in the back-end to perform "semantic"-related optimizations.

To give an (quite generic) example where such codegen metadata may be useful: in the field of "secure compilation", preservation of security properties during the compilation phases is essential; such properties are specified in the high-level specifications of the program, and may be expressed with IR metadata. The possibility to keep such IR metadata in the codegen phases may allow preservation of properties that may be invalidated by codegen phases.

Cheers, -- Lorenzo

David Greene via llvm-dev

unread,

Jul 31, 2020, 4:48:10 PM7/31/20

to Lorenzo Casalino, llvm...@lists.llvm.org

Thanks for keeping this going, Lorenzo.

Lorenzo Casalino via llvm-dev <llvm...@lists.llvm.org> writes:

>> The first questions need to be “what does it mean?”, “how does it
>> work?”, and “what is it useful for?”. It is hard to evaluate a
>> proposal without that.
>
> Hi everyone,
>

> - "What does it mean?": it means to preserve specific information,
> represented as metadata assigned to instructions, from the IR level,
> down to the codegen phases.

An important part of the definition is "how late?" For my particular
uses it would be right up until lowering of asm pseudo-instructions,
even after regalloc and scheduling. I don't know whether someone might
need metadata even later than that (at asm/obj emission time?) but if
metadata is supported on Machine IR then it shouldn't be an issue.

As with IR-level metadata, there should be no guarantee that metadata is
preserved and that it's a best-effort thing. In other words, relying on
metadata for correctness is probably not the thing to do.

> - "How does it work?": metadata should be preserved during the several
>    back-end transformations; for instance, during the lowering phase,
> DAGCombine    performs several optimization to the IR, potentially
> combining several    instructions. The new instruction should, then,
> assigned with metadata obtained    as a proper combination of the
> original ones (e.g., a union of metadata    information).

I want to make it clear that this is expensive to do, in that the number
of changes to the codegen pipeline is quite extensive and widespread. I
know because I've done it*. :) It will help if there are utilities
people can use to merge metadata during DAG transformation and the more
we make such transfers and combinations "automatic" the easier it will
be to preserve metadata.

Once the mechanisms are there it also takes effort to keep them going.
For example if a new DAG transformation is done people need to think
about metadata. This is where "automatic" help makes a real difference.

* By "it" I mean communicate information down to late phases of codegen.
I don't have a "metadata in codegen" patch as such. I simply cobbled
something together in our downstream fork that works for some very
specific use-cases.

> It might be possible to have a dedicated data-structure for such
> metadata info, and an instance of such structure assigned to each
> instruction.

I'm not entirely sure what you mean by this.

> - "What is it useful for?": I think it is quite context-specific; but,
> in general, it is useful when some "higher-level" information

> (e.g., that canbe discovered only before the back-end stage of the

> compiler) are required in the back-end to perform "semantic"-related
> optimizations.

That's my use-case. There's semantic information codegen would like to
know but is really much more practical to discover at the LLVM IR level
or even passed from the frontend. Much information is lost by the time
codegen is hit and it's often impractical or impossible for codegen to
derive it from first principles.

> To give an (quite generic) example where such codegen metadata may be
> useful: in the field of "secure compilation", preservation of security
> properties during the compilation phases is essential; such properties
> are specified in the high-level specifications of the program, and may
> be expressed with IR metadata. The possibility to keep such IR
> metadata in the codegen phases may allow preservation of properties
> that may be invalidated by codegen phases.

That's a great use-case. I do wonder about your use of "essential"
though. Is it needed for correctness? If so an intrinsics-based
solution may be better.

My use-cases mostly revolve around communication with a proprietary
frontend and thus aren't useful to the community, which is why I haven't
pursued this with any great vigor before this.

I do have uses that convey information from LLVM analyses but
unfortunately I can't share them for now.

All of my use-cases are related to optimization. No "metadata" is
needed for correctness.

I have pondered whether intrinsics might work for my use-cases. My fear
with intrinsics is that they will interfere with other codegen analyses
and transformations. For example they could be a scheduling barrier.

I also have wondered about how intrinsics work within SelectionDAG. Do
they impact dagcombine and other transformations? The reason I call out
SelectionDAG specifically is that most of our downstream changes related
to conveying information are in DAG-related files (dagcombine, legalize,
etc.). Perhaps intrinsics could suffice for the purposes of getting
metadata through SelectionDAG with conversion to "first-class" metadata
at the Machine IR level. Maybe this is even an intermediate step toward
"full metadata" throughout the compilation.

-David

Chris Lattner via llvm-dev

unread,

Aug 2, 2020, 3:38:03 PM8/2/20

to Lorenzo Casalino, llvm...@lists.llvm.org

Thanks Lorenzo,

I was looking for a ‘one level deeper’ analysis of how this works.

The issue is this: either information is preserved across certain sorts of transformations or it is not. If not, it either goes stale (problematic for anything that looks at it later) or is invalidated/removed.

The fundamental issue in IR design is factoring the representation of information from the code that needs to inspect and update it. “Metadata” designs try to make it easy to add out of band information to the IR in various ways, with a goal of reducing the impact on the rest of the compiler.

However, I’ve never seen them work out well. Either the data becomes stale, or you end up changing a lot of the compiler to support it. Look at debug info metadata in LLVM for example, it has both problems :-). This is why MLIR has moved to make source location information and attributes a first class part of the IR.

-Chris

_______________________________________________

Lorenzo Casalino via llvm-dev

unread,

Aug 6, 2020, 10:47:29 AM8/6/20

to David Greene, llvm...@lists.llvm.org, clat...@nondot.org

Am 31/07/20 um 22:47 schrieb David Greene:

@David

> Thanks for keeping this going, Lorenzo.
>
> Lorenzo Casalino via llvm-dev <llvm...@lists.llvm.org> writes:
>
>>> The first questions need to be “what does it mean?”, “how does it
>>> work?”, and “what is it useful for?”. It is hard to evaluate a
>>> proposal without that.
>> Hi everyone,
>>
>> - "What does it mean?": it means to preserve specific information,
>> represented as metadata assigned to instructions, from the IR level,
>> down to the codegen phases.
> An important part of the definition is "how late?" For my particular
> uses it would be right up until lowering of asm pseudo-instructions,
> even after regalloc and scheduling. I don't know whether someone might
> need metadata even later than that (at asm/obj emission time?) but if
> metadata is supported on Machine IR then it shouldn't be an issue.

"How late" it is context-specific: even in my case, I required such
information
to be preserved until pseudo instruction expansion. Conservatively, they
could be
preserved until the last pass of codegen pipeline.

Regarding their employment in the later steps, I would not say they are not
required, sinceI worked on a specific topic of secure compilation, and I do
not have the wholepicture in mind; nonetheless, it would be possible to
test how
things work out withthe codegen and later reason on future developments.

> As with IR-level metadata, there should be no guarantee that metadata is
> preserved and that it's a best-effort thing. In other words, relying on
> metadata for correctness is probably not the thing to do.

Ok, I made a mistake stating that metadata should be *preserved*; what
I really meant is to preserve the *information* that such metadata
represent.

>> - "How does it work?": metadata should be preserved during the several
>>    back-end transformations; for instance, during the lowering phase,
>> DAGCombine    performs several optimization to the IR, potentially
>> combining several    instructions. The new instruction should, then,
>> assigned with metadata obtained    as a proper combination of the
>> original ones (e.g., a union of metadata    information).
> I want to make it clear that this is expensive to do, in that the number
> of changes to the codegen pipeline is quite extensive and widespread. I
> know because I've done it*. :) It will help if there are utilities
> people can use to merge metadata during DAG transformation and the more
> we make such transfers and combinations "automatic" the easier it will
> be to preserve metadata.
>
> Once the mechanisms are there it also takes effort to keep them going.
> For example if a new DAG transformation is done people need to think
> about metadata. This is where "automatic" help makes a real difference.
>
> * By "it" I mean communicate information down to late phases of codegen.
> I don't have a "metadata in codegen" patch as such. I simply cobbled
> something together in our downstream fork that works for some very
> specific use-cases.

I know what you have been through, and I can only agree with you: for the
project I mentioned above, I had to perform several changes to the whole IR
lowering phase in order to correctly propagate high-level information;
it wasn't
cheap and required a lot of effort.

>> It might be possible to have a dedicated data-structure for such
>> metadata info, and an instance of such structure assigned to each
>> instruction.
> I'm not entirely sure what you mean by this.

I was imagining a per-instruction data-structure collecting metadata info
related to that specific instruction, instead of having several metadata info
directly embedded in each instruction.

>> - "What is it useful for?": I think it is quite context-specific; but,
>> in general, it is useful when some "higher-level" information
>> (e.g., that canbe discovered only before the back-end stage of the
>> compiler) are required in the back-end to perform "semantic"-related
>> optimizations.
> That's my use-case. There's semantic information codegen would like to
> know but is really much more practical to discover at the LLVM IR level
> or even passed from the frontend. Much information is lost by the time
> codegen is hit and it's often impractical or impossible for codegen to
> derive it from first principles.
>
>> To give an (quite generic) example where such codegen metadata may be
>> useful: in the field of "secure compilation", preservation of security
>> properties during the compilation phases is essential; such properties
>> are specified in the high-level specifications of the program, and may
>> be expressed with IR metadata. The possibility to keep such IR
>> metadata in the codegen phases may allow preservation of properties
>> that may be invalidated by codegen phases.
> That's a great use-case. I do wonder about your use of "essential"
> though.

With *essential* I mean fundamental for satisfying a specific target
security property.

> Is it needed for correctness? If so an intrinsics-based
> solution may be better.

Uhm...it might sound as a naive question, but what do you mean with
*correctness*?

> My use-cases mostly revolve around communication with a proprietary
> frontend and thus aren't useful to the community, which is why I haven't
> pursued this with any great vigor before this.
>
> I do have uses that convey information from LLVM analyses but
> unfortunately I can't share them for now.
>
> All of my use-cases are related to optimization. No "metadata" is
> needed for correctness.

> I have pondered whether intrinsics might work for my use-cases. My fear
> with intrinsics is that they will interfere with other codegen analyses
> and transformations. For example they could be a scheduling barrier.
>
> I also have wondered about how intrinsics work within SelectionDAG. Do
> they impact dagcombine and other transformations? The reason I call out
> SelectionDAG specifically is that most of our downstream changes related
> to conveying information are in DAG-related files (dagcombine, legalize,
> etc.). Perhaps intrinsics could suffice for the purposes of getting
> metadata through SelectionDAG with conversion to "first-class" metadata
> at the Machine IR level. Maybe this is even an intermediate step toward
> "full metadata" throughout the compilation.

I employed intrinsics as a mean for carrying metadata, but,
by my experience, I am not sure they can be resorted as a valid alternative:

- For each llvm-ir instruction employed in my project (e.g., store), a
semantically
   equivalent intrinsic is declared, with particular parameters representing
   metadata (i.e., first-class metadata are represented by specific
intrinsic's
   parameters).

- During the lowering, each ad-hoc intrinsic must be properly handled,
manually
adding the proper legalization operations, DAG combinations and so on.

- During MIR conversion of the llvm-ir (i.e., mapping intrinsics to
pseudo-instructions),
metadata are passed to the MIR representation of the program.

In particular, the second point rises a critical problem in terms of
optimizations
(e.g., intrinsic store + intrinsic trunc are not automatically converted
into a
intrinsic truncated store).Then, the backend must be instructed to
perform such
optimizations, which are actually already performed on non-intrinsic
instructions
(e.g., store + trunc is already converted into a truncated store).

Instead of re-inventing the wheel, and since the backend should be
nonetheless
modified in order to support optimizations on intrinsics, I would rather
prefer to
insert some sort of mechanism to support metadata attachment as
first-class elements
of the IR/MIR, and automatic merging of metadata, for instance.

----

@Chris

I may be wrong (in such case, please, correct me), but if I got it
correctly,
source-level debugging metadata are "external" (i.e., not a first-class
element
of the llvm-ir), and their management involve a great effort.

As described above, in my project I used metadata as first class
elements of the
IR/MIR; I found this approach more immediate and simpler to handle, although
some passes and transformation must be modified.

Then, I agree with you saying that metadata infos should be first-class
elements of
the IR/MIR (or, at least, "packed" into a structure being first-class
part of the
IR/MIR).

----

In any case, I wonder if metadata at codegen level is actually a thing
that the
community would benefit (then, justifying a potentially huge and/or long
serie of
patches), or it is something in which only a small group would be
interested in.

Cheers
-- Lorenzo

David Greene via llvm-dev

unread,

Aug 7, 2020, 4:54:45 PM8/7/20

to Lorenzo Casalino, llvm...@lists.llvm.org, clat...@nondot.org

Lorenzo Casalino via llvm-dev <llvm...@lists.llvm.org> writes:

>> As with IR-level metadata, there should be no guarantee that metadata is
>> preserved and that it's a best-effort thing. In other words, relying on
>> metadata for correctness is probably not the thing to do.

> Ok, I made a mistake stating that metadata should be *preserved*; what
> I really meant is to preserve the *information* that such metadata
> represent.

We do have one way of doing that now that's nearly foolproof in terms of
accidental loss: intrinsics. Intrinsics AFAIK are never just deleted
and have to be explicitly handled at some point. Intrinsics may not
work well for your use-case for a variety of reasons but they are an
option.

I'm mostly just writing this to get thoughts in my head organized. :)

>> * By "it" I mean communicate information down to late phases of codegen.
>> I don't have a "metadata in codegen" patch as such. I simply cobbled
>> something together in our downstream fork that works for some very
>> specific use-cases.

> I know what you have been through, and I can only agree with you: for
> the project I mentioned above, I had to perform several changes to the
> whole IR lowering phase in order to correctly propagate high-level
> information; it wasn't cheap and required a lot of effort.

I know your pain. :)

>>> It might be possible to have a dedicated data-structure for such
>>> metadata info, and an instance of such structure assigned to each
>>> instruction.
>> I'm not entirely sure what you mean by this.
>
> I was imagining a per-instruction data-structure collecting metadata info
> related to that specific instruction, instead of having several metadata info
> directly embedded in each instruction.

Interesting. At the IR level metadata isn't necessarily unique, though
it can be made so. If multiple pieces of information were amalgamated
into one structure that might reduce the ability to share the in-memory
representation, which has a cost. I like the ability of IR metadata to
be very flexible while at the same time being relatively cheap in terms
of resource utilization.

I don't always like that IR metadata is not scoped. It makes it more
difficult to process the IR for a Function in isolation. But that's a
relatively minor quibble for me. It's a tradeoff between convenience
and resource utilization.

>> That's a great use-case. I do wonder about your use of "essential"
>> though.

> With *essential* I mean fundamental for satisfying a specific target
> security property.

>> Is it needed for correctness? If so an intrinsics-based solution
>> may be better.

> Uhm...it might sound as a naive question, but what do you mean with
> *correctness*?

I mean will the compiler generate incorrect code or otherwise violate
some contract. In your secure compilation example, if the compiler
*promises* that the generated code will be "secure" then that's a
contract that would be violated if the metadata were lost.

> I employed intrinsics as a mean for carrying metadata, but, by my
> experience, I am not sure they can be resorted as a valid alternative:
>
> - For each llvm-ir instruction employed in my project (e.g., store),
> a semantically    equivalent intrinsic is declared, with particular
> parameters representing    metadata (i.e., first-class metadata are
> represented by specific intrinsic's    parameters).
>
> - During the lowering, each ad-hoc intrinsic must be properly
> handled, manually    adding the proper legalization operations, DAG
> combinations and so on.
>
> - During MIR conversion of the llvm-ir (i.e., mapping intrinsics to
> pseudo-instructions),    metadata are passed to the MIR representation
> of the program.
>
> In particular, the second point rises a critical problem in terms of
> optimizations (e.g., intrinsic store + intrinsic trunc are not
> automatically converted into a intrinsic truncated store).Then, the
> backend must be instructed to perform such optimizations, which are
> actually already performed on non-intrinsic instructions (e.g., store
> + trunc is already converted into a truncated store).

Gotcha. That certainly is a lot of burden. Do the intrinsics *have to*
mirror the existing instructions exactly or could a more generic
intrinsic be defined that took some data as an argument, for example a
pointer to a static string? Then each intrinsic instance could
reference a static string unique to its context.

I have not really thought this through, just throwing out ideas in a
devil's advocate sort of way.

In my case using intrinsics would have to tie the intrinsic to the
instruction it is annotating. This seems similar to your use-case.
This is straightforward to do if everything is SSA but once we've gone
beyond that things get a lot more complicated. The mapping of
information to specific instructions really does seem like the most
difficult bit.

> Instead of re-inventing the wheel, and since the backend should be
> nonetheless modified in order to support optimizations on intrinsics,
> I would rather prefer to insert some sort of mechanism to support
> metadata attachment as first-class elements of the IR/MIR, and
> automatic merging of metadata, for instance.

Can you explain a bit more what you mean by "first-class?"

> In any case, I wonder if metadata at codegen level is actually a thing
> that the community would benefit (then, justifying a potentially huge
> and/or long serie of patches), or it is something in which only a
> small group would be interested in.

I would also like to know this. Have others found the need to convey
information down to codegen and if so, what approaches were considered
and tried?

Maybe this is a niche requirement but I really don't think it is. I
think it more likely that various hacks/modifications have been made
over the years to sufficiently approximate a desired outcome and that
this has led to not insignificant technical debt.

Or maybe I just think that because I've worked on a 40-year-old compiler
for my entire career. :)

-David

David Greene via llvm-dev

unread,

Aug 7, 2020, 5:09:23 PM8/7/20

to Chris Lattner, Lorenzo Casalino, llvm...@lists.llvm.org

Chris Lattner via llvm-dev <llvm...@lists.llvm.org> writes:

> The issue is this: either information is preserved across certain
> sorts of transformations or it is not. If not, it either goes stale
> (problematic for anything that looks at it later) or is
> invalidated/removed.
>
> The fundamental issue in IR design is factoring the representation of
> information from the code that needs to inspect and update it.
> “Metadata” designs try to make it easy to add out of band information
> to the IR in various ways, with a goal of reducing the impact on the
> rest of the compiler.
>
> However, I’ve never seen them work out well. Either the data becomes
> stale, or you end up changing a lot of the compiler to support it.
> Look at debug info metadata in LLVM for example, it has both problems
> :-). This is why MLIR has moved to make source location information
> and attributes a first class part of the IR.

I basically agree with your analysis. Some information is so pervasive
that it really should be a part of the IR proper. But other information
may not be. The kind of information I'm thinking of basically boils
down to optimization hints. It's fine and semantically sound to drop
it, though not ideal if it can be avoided.

I see debug info as being in a quite different class. With the -g
option we are making a promise to our users. So using a mechanism that
by design doesn't make promises seems a poor fit.

A long long time ago in the dark ages before git and Phabricator I
submitted a patch for review that would have added comment information
to machine instructions. It was basically a string member on every
MachineInstr. At the time it was deemed too expensive and rightly so.
Instead I ended up adding some flag values that the AsmPrinter uses as a
hint to generate various comments. I'm still not very happy with that
"solution" and a more general-purpose mechanism for annotating
IR/SelectionDAG/MIR objects would be quite welcome.

A generic first-class annotation construct would cover both use-cases.
If you and the wider community are open to adding first-class generic
information annotation, I'm eager to work on it!

-David

Lorenzo Casalino via llvm-dev

unread,

Aug 18, 2020, 2:28:06 AM8/18/20

to David Greene, llvm...@lists.llvm.org, clat...@nondot.org

Am 07/08/20 um 22:54 schrieb David Greene:

> Lorenzo Casalino via llvm-dev <llvm...@lists.llvm.org> writes:
>
>>> As with IR-level metadata, there should be no guarantee that metadata is
>>> preserved and that it's a best-effort thing. In other words, relying on
>>> metadata for correctness is probably not the thing to do.
>> Ok, I made a mistake stating that metadata should be *preserved*; what
>> I really meant is to preserve the *information* that such metadata
>> represent.
> We do have one way of doing that now that's nearly foolproof in terms of
> accidental loss: intrinsics. Intrinsics AFAIK are never just deleted
> and have to be explicitly handled at some point. Intrinsics may not
> work well for your use-case for a variety of reasons but they are an
> option.
>
> I'm mostly just writing this to get thoughts in my head organized. :)

The only problem with intrinsics, for me, was the need to mirror the
already existing instructions. As you pointed out, if there's a way to map
intrinsics and instructions, there would be no reason to mirror the latter,
andjust use the former to carry metadata.

>>>> It might be possible to have a dedicated data-structure for such
>>>> metadata info, and an instance of such structure assigned to each
>>>> instruction.
>>> I'm not entirely sure what you mean by this.
>> I was imagining a per-instruction data-structure collecting metadata info
>> related to that specific instruction, instead of having several metadata info
>> directly embedded in each instruction.
> Interesting. At the IR level metadata isn't necessarily unique, though
> it can be made so. If multiple pieces of information were amalgamated
> into one structure that might reduce the ability to share the in-memory
> representation, which has a cost. I like the ability of IR metadata to
> be very flexible while at the same time being relatively cheap in terms
> of resource utilization.
>
> I don't always like that IR metadata is not scoped. It makes it more
> difficult to process the IR for a Function in isolation. But that's a
> relatively minor quibble for me. It's a tradeoff between convenience
> and resource utilization.
>

Uhm...could I ask you to elaborate a bit more on the "limitation on
in-memory
representation sharing"? It is not clear to me how this would cause a
problem.

>>> That's a great use-case. I do wonder about your use of "essential"
>>> though.
>> With *essential* I mean fundamental for satisfying a specific target
>> security property.
>>> Is it needed for correctness? If so an intrinsics-based solution
>>> may be better.
>> Uhm...it might sound as a naive question, but what do you mean with
>> *correctness*?
> I mean will the compiler generate incorrect code or otherwise violate
> some contract. In your secure compilation example, if the compiler
> *promises* that the generated code will be "secure" then that's a
> contract that would be violated if the metadata were lost.

You got the point: if no metadata are provided/lost, the codegen phase
is not
able to fulfill the contract (in my use case, generate code that is
"secure").

I like brainstorming ;)

>
> In my case using intrinsics would have to tie the intrinsic to the
> instruction it is annotating. This seems similar to your use-case.
> This is straightforward to do if everything is SSA but once we've gone
> beyond that things get a lot more complicated. The mapping of
> information to specific instructions really does seem like the most
> difficult bit.

No, intrinsics does not have to mirror existing instructions; yes, they
can be used just to carry around specific data as arguments.
Nonetheless, there
we have our (implementation) problem: how to map info (e.g., intrinsics) to
instruction, and viceversa?

I am really curious on how would you perform it in the pre-RA phase :)

>> Instead of re-inventing the wheel, and since the backend should be
>> nonetheless modified in order to support optimizations on intrinsics,
>> I would rather prefer to insert some sort of mechanism to support
>> metadata attachment as first-class elements of the IR/MIR, and
>> automatic merging of metadata, for instance.
> Can you explain a bit more what you mean by "first-class?"

Never mind, I used the wrong terminology: I just meant to directly
embed metadata in the IR/MIR.

>> In any case, I wonder if metadata at codegen level is actually a thing
>> that the community would benefit (then, justifying a potentially huge
>> and/or long serie of patches), or it is something in which only a
>> small group would be interested in.
> I would also like to know this. Have others found the need to convey
> information down to codegen and if so, what approaches were considered
> and tried?
>
> Maybe this is a niche requirement but I really don't think it is. I
> think it more likely that various hacks/modifications have been made
> over the years to sufficiently approximate a desired outcome and that
> this has led to not insignificant technical debt.
>
> Or maybe I just think that because I've worked on a 40-year-old compiler
> for my entire career. :)
>
> -David

Best regards,
Lorenzo

David Greene via llvm-dev

unread,

Aug 19, 2020, 4:37:54 PM8/19/20

to Lorenzo Casalino, llvm...@lists.llvm.org, clat...@nondot.org

Lorenzo Casalino via llvm-dev <llvm...@lists.llvm.org> writes:

>>> I was imagining a per-instruction data-structure collecting metadata info
>>> related to that specific instruction, instead of having several metadata info
>>> directly embedded in each instruction.

>> Interesting. At the IR level metadata isn't necessarily unique, though
>> it can be made so. If multiple pieces of information were amalgamated
>> into one structure that might reduce the ability to share the in-memory
>> representation, which has a cost.
>>

> Uhm...could I ask you to elaborate a bit more on the "limitation on
> in-memory representation sharing"? It is not clear to me how this
> would cause a problem.

I just mean that at the IR level, if you have a metadata node with, say,
a string "foo bar" and another one with "foo" and put one on an
instruction and the other on another instruction, they won't share an
in-memory representation, whereas if you had separate nodes with "foo"
and "bar" and put both on a single instruction and just "foo" on another
instruction the "foo" metadata would be shared.

>> In my case using intrinsics would have to tie the intrinsic to the
>> instruction it is annotating. This seems similar to your use-case.
>> This is straightforward to do if everything is SSA but once we've gone
>> beyond that things get a lot more complicated. The mapping of
>> information to specific instructions really does seem like the most
>> difficult bit.

> No, intrinsics does not have to mirror existing instructions; yes,
> they can be used just to carry around specific data as arguments.
> Nonetheless, there we have our (implementation) problem: how to map
> info (e.g., intrinsics) to instruction, and viceversa?
>
> I am really curious on how would you perform it in the pre-RA phase :)

Pre-RA it's relatively easy as long as we're still in SSA. The
intrinsic would simply take the instruction it should annotate as an
operand. After SSA it obviously becomes more difficult. I don't have a
lot of good answers for that right now. The live range for the value
defined by the annotated instruction and used the intrinsic would
contain both instructions so maybe that could be used to connect them.

If the annotated instruction doesn't have an output value (like a store
on machine architectures) you would use the chain output in SelectionDAG
but there's no analogue in the MachineInstr representation.

-David

Lorenzo Casalino via llvm-dev

unread,

Aug 31, 2020, 4:01:22 AM8/31/20

to David Greene, llvm...@lists.llvm.org, clat...@nondot.org

Am 19/08/20 um 22:37 schrieb David Greene:

> Lorenzo Casalino via llvm-dev <llvm...@lists.llvm.org> writes:
>
>>>> I was imagining a per-instruction data-structure collecting metadata info
>>>> related to that specific instruction, instead of having several metadata info
>>>> directly embedded in each instruction.
>>> Interesting. At the IR level metadata isn't necessarily unique, though
>>> it can be made so. If multiple pieces of information were amalgamated
>>> into one structure that might reduce the ability to share the in-memory
>>> representation, which has a cost.
>>>
>> Uhm...could I ask you to elaborate a bit more on the "limitation on
>> in-memory representation sharing"? It is not clear to me how this
>> would cause a problem.
> I just mean that at the IR level, if you have a metadata node with, say,
> a string "foo bar" and another one with "foo" and put one on an
> instruction and the other on another instruction, they won't share an
> in-memory representation, whereas if you had separate nodes with "foo"
> and "bar" and put both on a single instruction and just "foo" on another
> instruction the "foo" metadata would be shared.
>

But isn't it an implementation aspect? I mean, you can have a metadata
nodes which members are pointers; if two nodes have to share the same
member instance, they can share the same pointer.

After all, even when two instructions refer to a structurally equivalent
Constant object
(https://llvm.org/doxygen/classllvm_1_1Constant.html#details),
they actually share the same pointer to the same Constant object.

> Pre-RA it's relatively easy as long as we're still in SSA. The
> intrinsic would simply take the instruction it should annotate as an
> operand. After SSA it obviously becomes more difficult. I don't have a
> lot of good answers for that right now. The live range for the value
> defined by the annotated instruction and used the intrinsic would
> contain both instructions so maybe that could be used to connect them.
>
> If the annotated instruction doesn't have an output value (like a store
> on machine architectures) you would use the chain output in SelectionDAG
> but there's no analogue in the MachineInstr representation.

The usage of intrinsics as wrapper for instructions to be annotated is a
really nice idea! Although this would require to instruct almost all
passes of the codegen pipeline to skip them (which, for instance, is already
done for llvm.dbg.* intrinsics).

Nonetheless, although I like the idea, without a strategy to track
output-less
MachineInstructions, it won't go really far :(

Furthermore, after register allocation there is a non-negligible effort
to properly annotate instructions which share the same output register...

Concerning the usage of the live ranges to tie annotated instruction and
intrinsic, I have some doubts:

1. After register allocation, since metadata intrinsics are skipped
(otherwise,
    they would be involved in the register allocation process,
increasing the
    register pressure), the instruction stream would present both
virtual and
    physical registers, which I am not sure it is totally ok.

2. Liveness information are still available after register allocation?
Assuming
a positive answer, live intervals may be split due to register
allocation, making
connection between intrinsic and annotated instruction really difficult.

An enumeration of the MachineInstrucions, which is preserved through the
codegen
passes, would allow the creation of a 1:1 map between intrinsic and
annotated instruction;
but, unfortunately, there seems to not be such kind of enumeration in LLVM
(maybe, SlotIndexes could might be used in a creative way).

Sorry for the long delay!

-- Lorenzo

David Greene via llvm-dev

unread,

Aug 31, 2020, 8:11:03 AM8/31/20

to Lorenzo Casalino, llvm...@lists.llvm.org, clat...@nondot.org

Lorenzo Casalino via llvm-dev <llvm...@lists.llvm.org> writes:

>> If the annotated instruction doesn't have an output value (like a store
>> on machine architectures) you would use the chain output in SelectionDAG
>> but there's no analogue in the MachineInstr representation.

> The usage of intrinsics as wrapper for instructions to be annotated is
> a really nice idea! Although this would require to instruct almost all
> passes of the codegen pipeline to skip them (which, for instance, is
> already done for llvm.dbg.* intrinsics).

It's not free, certainly.

> Nonetheless, although I like the idea, without a strategy to track
> output-less MachineInstructions, it won't go really far :(

Agreed. There are probably ways to hack it in, but true metadata would
b e much better.

> Furthermore, after register allocation there is a non-negligible effort
> to properly annotate instructions which share the same output register...
>
> Concerning the usage of the live ranges to tie annotated instruction and
> intrinsic, I have some doubts:
>
> 1. After register allocation, since metadata intrinsics are skipped
> (otherwise,     they would be involved in the register allocation
> process, increasing the     register pressure), the instruction stream
> would present both virtual and     physical registers, which I am not
> sure it is totally ok.

They would have to participate in register allocation. I think the only
downside would be an intrinsic that artificially extends the live range
of a value by using it past its true dead point, either because the use
really is the "last" one or because it fills a "hole" in the live range
that otherwise would exist (for example a use in one of the if-then-else
branches that would otherwise not exist).

If the intrinsics really shadow "real" instructions then it should be
possible to place them such that this is not an issue; for example, you
could place them immediately before the "real" instruction.

It's possible they could introduce extra spills and reloads, in that if
a value is spilled it would be reloaded before the intrinsic. If the
intrinsic were placed immediately before the "real" instruction then the
reload would very likely be re-used for the "real" instruction so this
is probably not an issue in practice.

> 2. Liveness information are still available after register
> allocation? Assuming a positive answer, live intervals may be
> split due to register allocation, making connection between
> intrinsic and annotated instruction really difficult.

Intervals are available post-RA. They still contain information about
defs so it is *possible* to track things back though the information
tends to degrade.

> An enumeration of the MachineInstrucions, which is preserved through
> the codegen passes, would allow the creation of a 1:1 map between
> intrinsic and annotated instruction; but, unfortunately, there seems
> to not be such kind of enumeration in LLVM (maybe, SlotIndexes could
> might be used in a creative way).

Yeah, SlotIndexes are what is used in the live ranges.

> Sorry for the long delay!

No problem. It's good to hash these things out and identify areas of
weakness that metadata could fill.

Lorenzo Casalino via llvm-dev

unread,

Sep 7, 2020, 4:26:15 AM9/7/20

to David Greene, llvm...@lists.llvm.org, clat...@nondot.org

Am 31/08/20 um 14:10 schrieb David Greene:

> Lorenzo Casalino via llvm-dev <llvm...@lists.llvm.org> writes:
>
>> Furthermore, after register allocation there is a non-negligible effort
>> to properly annotate instructions which share the same output register...
>>
>> Concerning the usage of the live ranges to tie annotated instruction and
>> intrinsic, I have some doubts:
>>
>> 1. After register allocation, since metadata intrinsics are skipped
>> (otherwise,     they would be involved in the register allocation
>> process, increasing the     register pressure), the instruction stream
>> would present both virtual and     physical registers, which I am not
>> sure it is totally ok.
> They would have to participate in register allocation.

Should they? I mean: the register allocation "simply" creates a map
(VirtReg -> PhysReg),
and actual register re-writing takes place in a subsequent machine pass.

So, we could avoid their partecipation in register allocation, reducing
register
pressure and spill/reload work. As a downside, we would have
intrinsics with virtual registers as outputs, but it is not a problem,
since they do
not perform any real computation.

> I think the only
> downside would be an intrinsic that artificially extends the live range
> of a value by using it past its true dead point, either because the use
> really is the "last" one or because it fills a "hole" in the live range
> that otherwise would exist (for example a use in one of the if-then-else
> branches that would otherwise not exist).
>
> If the intrinsics really shadow "real" instructions then it should be
> possible to place them such that this is not an issue; for example, you
> could place them immediately before the "real" instruction.

I do not think this would be possible: before register allocation, code is
SSA form, thus the annotated instruction *must* preceeds the intrinsic
annotating it. An alternative is to place the annotating intrinsic before
the instruction who ends the specific live-range (not necessarely be an
immediate predecessor).

Just to point out a problem to cope with: instruction scheduling must be
aware of this particular positioning of annotation intrinsics.

> It's possible they could introduce extra spills and reloads, in that if
> a value is spilled it would be reloaded before the intrinsic. If the
> intrinsic were placed immediately before the "real" instruction then the
> reload would very likely be re-used for the "real" instruction so this
> is probably not an issue in practice.

Yes, I agree

Kind regards,
-- Lorenzo

David Greene via llvm-dev

unread,

Sep 8, 2020, 11:58:56 AM9/8/20

to Lorenzo Casalino, llvm...@lists.llvm.org, clat...@nondot.org

Lorenzo Casalino <lorenzo.c...@gmail.com> writes:

> Am 31/08/20 um 14:10 schrieb David Greene:
>> Lorenzo Casalino via llvm-dev <llvm...@lists.llvm.org> writes:
>>
>>> Furthermore, after register allocation there is a non-negligible effort
>>> to properly annotate instructions which share the same output register...
>>>
>>> Concerning the usage of the live ranges to tie annotated instruction and
>>> intrinsic, I have some doubts:
>>>
>>> 1. After register allocation, since metadata intrinsics are skipped
>>> (otherwise,     they would be involved in the register allocation
>>> process, increasing the     register pressure), the instruction stream
>>> would present both virtual and     physical registers, which I am not
>>> sure it is totally ok.

>> They would have to participate in register allocation.

> Should they? I mean: the register allocation "simply" creates a map
> (VirtReg -> PhysReg), and actual register re-writing takes place in a
> subsequent machine pass.

Maybe they could be skipped? I don't know if there's any precedent for
that.

> So, we could avoid their partecipation in register allocation,
> reducing register pressure and spill/reload work. As a downside, we
> would have intrinsics with virtual registers as outputs, but it is not
> a problem, since they do not perform any real computation.

If we can get that to work, yes I guess having no-op intrinsics with
virtual registers would be ok. I don't know how the backend post-RA
would cope with that though. There might be lots of asserts that assume
physical registers.

>> If the intrinsics really shadow "real" instructions then it should be
>> possible to place them such that this is not an issue; for example, you
>> could place them immediately before the "real" instruction.
>
> I do not think this would be possible: before register allocation, code is
> SSA form, thus the annotated instruction *must* preceeds the intrinsic
> annotating it.

Oh yes of course. Duh. :)

> An alternative is to place the annotating intrinsic before the
> instruction who ends the specific live-range (not necessarely be an
> immediate predecessor).

I'm not sure exactly what you mean, but it strikes me just now that if
the intrinsic is connected to the target instruction via the target
instruction's output value, then putting the intrinsic right after the
target instruction should not have any live range issues, unless the
target instruction were truly dead, in which case the intrinsic would
keep it alive. But since the intrinsic would eventually go away, I
assume we could eliminate the target instruction at the same time.

If the target instruction output is used *somewhere* it has a live range
and adding another use just after the def should not affect register
allocation appreciably. It could of course affect spill choice
heuristics like number of uses of a value but that's probably in the
noise.

It could, however, affect folding (e.g. mem operands) because a single
use of a load would turn into two uses, preventing folding. It's not
clear to me whether you would *want* folding in your use-case since you
apparently need to do something special with the load anyway.

> Just to point out a problem to cope with: instruction scheduling must be
> aware of this particular positioning of annotation intrinsics.

Probably true. This is a difficult problem, one I have dealt with. If
you want to keep two instructions "close" during scheduling it is a real
pain. ScheduleDAG has a concept for "glue" nodes but it's pretty hacky
and difficult to maintain in the presence of upstream churn. My initial
attempt to avoid the need for codegen metadata took this approach and it
was quite infeasible. My second approach to hack in the information in
other ways wasn't much more successful. :(

I think we've uncovered a number of tricky issues when trying to encode
metadata via intrinsics. To me, at least, they clearly point to the
need for a first-class solution and I think you agree with that too.
Chris also seemed to at least give tentative support to the idea.

I wonder if we're at the point of drafting an initial RFC for review.

-David

Lorenzo Casalino via llvm-dev

unread,

Sep 15, 2020, 5:31:49 AM9/15/20

to David Greene, llvm...@lists.llvm.org, clat...@nondot.org

Am 08/09/20 um 17:57 schrieb David Greene:

> Lorenzo Casalino <lorenzo.c...@gmail.com> writes:
>
>> Am 31/08/20 um 14:10 schrieb David Greene:
>>> Lorenzo Casalino via llvm-dev <llvm...@lists.llvm.org> writes:
>>>
>>>> Furthermore, after register allocation there is a non-negligible effort
>>>> to properly annotate instructions which share the same output register...
>>>>
>>>> Concerning the usage of the live ranges to tie annotated instruction and
>>>> intrinsic, I have some doubts:
>>>>
>>>> 1. After register allocation, since metadata intrinsics are skipped
>>>> (otherwise,     they would be involved in the register allocation
>>>> process, increasing the     register pressure), the instruction stream
>>>> would present both virtual and     physical registers, which I am not
>>>> sure it is totally ok.
>>> They would have to participate in register allocation.
>> Should they? I mean: the register allocation "simply" creates a map
>> (VirtReg -> PhysReg), and actual register re-writing takes place in a
>> subsequent machine pass.
> Maybe they could be skipped? I don't know if there's any precedent for
> that.

I think that they could be neglected, since they just carry information;
there's
no point in allocating physical registers for their unused output.

>> So, we could avoid their partecipation in register allocation,
>> reducing register pressure and spill/reload work. As a downside, we
>> would have intrinsics with virtual registers as outputs, but it is not
>> a problem, since they do not perform any real computation.
> If we can get that to work, yes I guess having no-op intrinsics with
> virtual registers would be ok. I don't know how the backend post-RA
> would cope with that though. There might be lots of asserts that assume
> physical registers.

Yes, if I recall correctly, there are a lot of check of the type of
register.

>>> If the intrinsics really shadow "real" instructions then it should be
>>> possible to place them such that this is not an issue; for example, you
>>> could place them immediately before the "real" instruction.
>> I do not think this would be possible: before register allocation, code is
>> SSA form, thus the annotated instruction *must* preceeds the intrinsic
>> annotating it.
> Oh yes of course. Duh. :)
>
>> An alternative is to place the annotating intrinsic before the
>> instruction who ends the specific live-range (not necessarely be an
>> immediate predecessor).
> I'm not sure exactly what you mean

I mean, to avoid artificial extension of the live-range, place the
annotating
intrinsic (I) before the instruction (K) that kills the live-range (but the
intrinsic (I) does not have to be an *immediate* predecessor of (K) in the
instruction stream).

For instance, assume to have the following SSA stream (I am using the
ARM Thumb2
MIR since I've been working mainly on that backend):

#i %res = t2ANDrr %src_1_i, %src_2_i
...
#j %null = llvm.metadata %a, (some metadata)
...
#l %c = t2STRi12 %res, %stack_slot_res

Where instruction #l kills the live-range representing %res, and
instructions
#j is covered by the live-range of %res, which spans from #i to #l.

Giving a total ordering to the stream of instructions, #i <= #j <= #l.
As you can infer, intrinsic represented by instruction #j does not have
to be immediate predecessor of #l (that is, there can exist an
instruction #k
such that #j < #k < #l).

In such way, the live-range won't be extended (at least, in this trivial
case...)

> but it strikes me just now that if
> the intrinsic is connected to the target instruction via the target
> instruction's output value, then putting the intrinsic right after the
> target instruction should not have any live range issues, unless the
> target instruction were truly dead, in which case the intrinsic would
> keep it alive. But since the intrinsic would eventually go away, I
> assume we could eliminate the target instruction at the same time.
>
> If the target instruction output is used *somewhere* it has a live range
> and adding another use just after the def should not affect register
> allocation appreciably.

Yes! :D

> It could of course affect spill choice
> heuristics like number of uses of a value but that's probably in the
> noise.
>
> It could, however, affect folding (e.g. mem operands) because a single
> use of a load would turn into two uses, preventing folding. It's not
> clear to me whether you would *want* folding in your use-case since you
> apparently need to do something special with the load anyway.

Uhm...yes, folding requires particular attention; but, in my project, I
avoided the problem by "disabling" folding, so I didn't care really much
about that aspect.

>> Just to point out a problem to cope with: instruction scheduling must be
>> aware of this particular positioning of annotation intrinsics.
> Probably true. This is a difficult problem, one I have dealt with. If
> you want to keep two instructions "close" during scheduling it is a real
> pain. ScheduleDAG has a concept for "glue" nodes but it's pretty hacky
> and difficult to maintain in the presence of upstream churn. My initial
> attempt to avoid the need for codegen metadata took this approach and it
> was quite infeasible. My second approach to hack in the information in
> other ways wasn't much more successful. :(

It is just only an idea, but could MI Bundles be profitably employed?

> I think we've uncovered a number of tricky issues when trying to encode
> metadata via intrinsics. To me, at least, they clearly point to the
> need for a first-class solution and I think you agree with that too.
> Chris also seemed to at least give tentative support to the idea.

Yep!

> I wonder if we're at the point of drafting an initial RFC for review.

Uh, this a good question. To be honest, it would the first time for me.
For sure, we could start by pinpointing the main problems and challenges
-- that we identified -- that the employment of intrinsics would face.

David Greene via llvm-dev

unread,

Sep 15, 2020, 10:58:22 AM9/15/20

to Lorenzo Casalino, llvm...@lists.llvm.org, clat...@nondot.org

Lorenzo Casalino <lorenzo.c...@gmail.com> writes:

>>> An alternative is to place the annotating intrinsic before the
>>> instruction who ends the specific live-range (not necessarely be an
>>> immediate predecessor).
>>
>> I'm not sure exactly what you mean
>
> I mean, to avoid artificial extension of the live-range, place the
> annotating intrinsic (I) before the instruction (K) that kills the
> live-range (but the intrinsic (I) does not have to be an *immediate*
> predecessor of (K) in the instruction stream).

Ok, got it. Thanks!

>> It could, however, affect folding (e.g. mem operands) because a single
>> use of a load would turn into two uses, preventing folding. It's not
>> clear to me whether you would *want* folding in your use-case since you
>> apparently need to do something special with the load anyway.
>
> Uhm...yes, folding requires particular attention; but, in my project, I
> avoided the problem by "disabling" folding, so I didn't care really much
> about that aspect.

That makes sense for your project but it is another case of intrinsics
causing problems for general use.

>>> Just to point out a problem to cope with: instruction scheduling must be
>>> aware of this particular positioning of annotation intrinsics.
>>
>> Probably true. This is a difficult problem, one I have dealt with. If
>> you want to keep two instructions "close" during scheduling it is a real
>> pain. ScheduleDAG has a concept for "glue" nodes but it's pretty hacky
>> and difficult to maintain in the presence of upstream churn. My initial
>> attempt to avoid the need for codegen metadata took this approach and it
>> was quite infeasible. My second approach to hack in the information in
>> other ways wasn't much more successful. :(
>
> It is just only an idea, but could MI Bundles be profitably employed?

Possibly. Those didn't exist when I did my work.

>> I wonder if we're at the point of drafting an initial RFC for review.
>
> Uh, this a good question. To be honest, it would the first time for
> me. For sure, we could start by pinpointing the main problems and
> challenges -- that we identified -- that the employment of intrinsics
> would face.

That's the place to start, I think. Gather a list of requirements/use
cases along with the challenges we've discussed. Then it's a matter of
engineering a solution that fulfills the requirements while hitting as
few of the challenges as possible. Let's start by simply gathering some
lists. I'll take a quick stab and you and others can add to/edit it.

Requirements
------------
- Convey information not readily available in existing IR constructs to
very late-stage codegen (after regalloc/scheduling, right through
asm/object emission)

- Flexible format - it should be as simple as possible to express the
desired information while minimizing changes to APIs

- Preserve information by default, only drop if explicitly told (I'm
trying to capture the requirements for your use-case here and this
differs from IR-level metadata)

- No bifurcation between "well-known"/"built-in" information and things
added later/locally

- Should not impact compile time excessively (what is "excessive?")

Challenges of using intrinsics and other alternatives
-----------------------------------------------------
- Post-SSA annotation/how to associate intrinsics with
instructions/registers/types

- Instruction selection fallout (inhibiting folding, etc.)

- Register allocation impacts (extending live ranges, etc.)

- Scheduling challenges (ensuring intrinsics can be found
post-scheduling, etc.)

- Extending existing constructs (which ones?) requires hard-coding
aspects of information, reducing flexibility

This is currently rather weasily-worded, because I didn't want to impose
too many restrictions right off the bat.

Lorenzo Casalino via llvm-dev

unread,

Oct 10, 2020, 7:13:23 AM10/10/20

to David Greene, llvm...@lists.llvm.org, clat...@nondot.org

> That's the place to start, I think. Gather a list of requirements/use
> cases along with the challenges we've discussed. Then it's a matter of
> engineering a solution that fulfills the requirements while hitting as
> few of the challenges as possible. Let's start by simply gathering some
> lists. I'll take a quick stab and you and others can add to/edit it.
>
> Requirements
> ------------
> - Convey information not readily available in existing IR constructs to
> very late-stage codegen (after regalloc/scheduling, right through
> asm/object emission)

I see this more as the GOAL of the RFC, rather than a requirement.

> - Flexible format - it should be as simple as possible to express the
> desired information while minimizing changes to APIs

I do not want to raise a philosophical discussion (although, I would
find it quite interesting), but "flexible" does not necessarely mean
"simple".

We could split this requirement as:

- Flexible format - the format should be expressive enough to enable
modelization
of *virtually* any kind of information type.

- Simple interface - expressing information and attaching them to MIR
elements (e.g.,
instructions) should be "easy" (what does it mean *easy*?)

> - Preserve information by default, only drop if explicitly told (I'm
> trying to capture the requirements for your use-case here and this
> differs from IR-level metadata)

What about giving to end-users the possibility to define a custom
defaultpolicy, as
well as the possibility to define different type of policies.

Further, we must cope with the combination of instructions: the
information associated
to two instructions eligible for combination, how are combined?

- Information transformation - the information associated to two
instruction A, B, which
are combined into an instruction C, should be properly transformed
according to a
user-specific policy.

A default policy may be "assign both information of A and B to C"
(gather-all/assign-all
policy?)

> - No bifurcation between "well-known"/"built-in" information and things
> added later/locally

May I ask you to elaborate a bit more about this point?

> - Should not impact compile time excessively (what is "excessive?")

Probably, such estimation should be performed on

What about the granularity level?

- Granularity level - metadata information should be attachable with
different
level of granularity:

- *Coarse*: MachineFunction level
- *Medium*: MachineBasicBlock level
- *Fine*: MachineInstruction level

Clearly, there are other degree of granularity and/or dimensions to be
considered
(e.g., LiveInterval, MIBundles, Loops, ...).

> Challenges of using intrinsics and other alternatives
> -----------------------------------------------------
> - Post-SSA annotation/how to associate intrinsics with
> instructions/registers/types
>
> - Instruction selection fallout (inhibiting folding, etc.)
>
> - Register allocation impacts (extending live ranges, etc.)
>
> - Scheduling challenges (ensuring intrinsics can be found
> post-scheduling, etc.)
>
> - Extending existing constructs (which ones?) requires hard-coding
> aspects of information, reducing flexibility
>
> This is currently rather weasily-worded, because I didn't want to impose
> too many restrictions right off the bat.
>
> -David

Sorry for the long delay!

-- Lorenzo

David Greene via llvm-dev

unread,

Oct 20, 2020, 12:37:34 PM10/20/20

to Lorenzo Casalino, llvm...@lists.llvm.org, clat...@nondot.org

Lorenzo Casalino <lorenzo.c...@gmail.com> writes:

>> Requirements
>> ------------
>> - Convey information not readily available in existing IR constructs to
>> very late-stage codegen (after regalloc/scheduling, right through
>> asm/object emission)
>
> I see this more as the GOAL of the RFC, rather than a requirement.

Fair enough.

>> - Flexible format - it should be as simple as possible to express the
>> desired information while minimizing changes to APIs
> I do not want to raise a philosophical discussion (although, I would
> find it quite interesting), but "flexible" does not necessarely mean
> "simple".
>
> We could split this requirement as:

Good idea to separate these.

> - Flexible format - the format should be expressive enough to enable
> modelization
> of *virtually* any kind of information type.
>
> - Simple interface - expressing information and attaching them to MIR
> elements (e.g.,
> instructions) should be "easy" (what does it mean *easy*?)

I would say "easy" means:

- Utilities are available to make maintaining information as transparent
(automatic) as possible.

- When not automatic, it is straightforward to apply the necessary APIs
to keep information updated.

>> - Preserve information by default, only drop if explicitly told (I'm
>> trying to capture the requirements for your use-case here and this
>> differs from IR-level metadata)

> What about giving to end-users the possibility to define a custom
> defaultpolicy, as
> well as the possibility to define different type of policies.

Possibly, though that might be overkill. We don't want to bog this down
so much that it doesn't make progress. I would lean toward picking a
policy and then incrementally adding features as needed.

> Further, we must cope with the combination of instructions: the
> information associated to two instructions eligible for combination,
> how are combined?
>
> - Information transformation - the information associated to two
> instruction A, B, which are combined into an instruction C, should
> be properly transformed according to a user-specific policy.
>
> A default policy may be "assign both information of A and B to C"
> (gather-all/assign-all policy?)

Again, I would lean toward just assign both pieces of information and
rpvode utilities to scrub the result if necessary. If it turns out
that other cases are common, we can add other default policies.

>> - No bifurcation between "well-known"/"built-in" information and things
>> added later/locally

> May I ask you to elaborate a bit more about this point?

Sure. The current IR metadata is bifurcated. Some pieces of
information are more "first-class" than others. For example there are
specialized metadata nodes
(https://llvm.org/docs/LangRef.html#specialized-metadata-nodes) while
other pieces of metadata are simple strings or numbers.

It would be simplest/easiest if metadata were handled uniformly.

>> - Should not impact compile time excessively (what is "excessive?")
>
> Probably, such estimation should be performed on

Did something get cut off here?

> What about the granularity level?
>
> - Granularity level - metadata information should be attachable with
> different
> level of granularity:
>
> - *Coarse*: MachineFunction level
> - *Medium*: MachineBasicBlock level
> - *Fine*: MachineInstruction level
>
> Clearly, there are other degree of granularity and/or dimensions to be
> considered
> (e.g., LiveInterval, MIBundles, Loops, ...).

It's probably a good idea to list at least the levels of granularity we
expect to need. I'd start with function/block/instruction as I can
imagine uses for all three. I am less sure about the other levels you
mention. We can add more capability later if needed.

> Sorry for the long delay!

No problem! I know I'm extremely busy as I'm sure we all are. :)

Since you initially raised the topic, do you want to take the lead in
writing up a RFC? I can certainly do it too but I want to give you
right of first refusal. :)

-David

Lorenzo Casalino via llvm-dev

unread,

Oct 21, 2020, 4:49:28 AM10/21/20

to David Greene, llvm...@lists.llvm.org

> Le 20 oct. 2020 à 6:37 PM, David Greene <d...@hpe.com> a écrit :

>
> Lorenzo Casalino <lorenzo.c...@gmail.com> writes:
>
>>> - Flexible format - it should be as simple as possible to express the
>>> desired information while minimizing changes to APIs
>> I do not want to raise a philosophical discussion (although, I would
>> find it quite interesting), but "flexible" does not necessarely mean
>> "simple".
>>
>> We could split this requirement as:
>
> Good idea to separate these.
>
>> - Flexible format - the format should be expressive enough to enable
>> modelization
>> of *virtually* any kind of information type.
>>
>> - Simple interface - expressing information and attaching them to MIR
>> elements (e.g.,
>> instructions) should be "easy" (what does it mean *easy*?)
>
> I would say "easy" means:
>
> - Utilities are available to make maintaining information as transparent
> (automatic) as possible.
>
> - When not automatic, it is straightforward to apply the necessary APIs
> to keep information updated.
>

Ok, perfect!

>>> - Preserve information by default, only drop if explicitly told (I'm
>>> trying to capture the requirements for your use-case here and this
>>> differs from IR-level metadata)
>
>> What about giving to end-users the possibility to define a custom
>> defaultpolicy, as
>> well as the possibility to define different type of policies.
>
> Possibly, though that might be overkill. We don't want to bog this down
> so much that it doesn't make progress. I would lean toward picking a
> policy and then incrementally adding features as needed.
>
>> Further, we must cope with the combination of instructions: the
>> information associated to two instructions eligible for combination,
>> how are combined?
>>
>> - Information transformation - the information associated to two
>> instruction A, B, which are combined into an instruction C, should
>> be properly transformed according to a user-specific policy.
>>
>> A default policy may be "assign both information of A and B to C"
>> (gather-all/assign-all policy?)
>
> Again, I would lean toward just assign both pieces of information and
> rpvode utilities to scrub the result if necessary. If it turns out
> that other cases are common, we can add other default policies.
>

I agree!

>>> - No bifurcation between "well-known"/"built-in" information and things
>>> added later/locally
>
>> May I ask you to elaborate a bit more about this point?
>
> Sure. The current IR metadata is bifurcated. Some pieces of
> information are more "first-class" than others. For example there are
> specialized metadata nodes
> (https://llvm.org/docs/LangRef.html#specialized-metadata-nodes) while
> other pieces of metadata are simple strings or numbers.
>
> It would be simplest/easiest if metadata were handled uniformly.
>

Ok, so this boils down to a uniform usage of the metadata.

>>> - Should not impact compile time excessively (what is "excessive?")
>>
>> Probably, such estimation should be performed on
>
> Did something get cut off here?

Uops. Yep, I removed a paragraph, but, apparentely I forgot the first
period. In any case, we should discuss about how to quantitatively
determine an acceptable upper-bound on the overhead on the compilation
time and give a motivation for it. For instance, max n% overhead on the
compilation time must be guaranteed, because ** list of reasons **.

Of course, first we should identify the worst-case scenario; probably
the case where all the MIR elements are decorated with metadata, and all
the API functionalities are employed?

>
>> What about the granularity level?
>>
>> - Granularity level - metadata information should be attachable with
>> different
>> level of granularity:
>>
>> - *Coarse*: MachineFunction level
>> - *Medium*: MachineBasicBlock level
>> - *Fine*: MachineInstruction level
>>
>> Clearly, there are other degree of granularity and/or dimensions to be
>> considered
>> (e.g., LiveInterval, MIBundles, Loops, ...).
>
> It's probably a good idea to list at least the levels of granularity we
> expect to need. I'd start with function/block/instruction as I can
> imagine uses for all three. I am less sure about the other levels you
> mention. We can add more capability later if needed.
>
>> Sorry for the long delay!
>
> No problem! I know I'm extremely busy as I'm sure we all are. :)
>
> Since you initially raised the topic, do you want to take the lead in
> writing up a RFC? I can certainly do it too but I want to give you
> right of first refusal. :)
> -David

Uhm...actually, it wasn't me but Son Tuan, so the right of refusal
should be granted to him :) And I noticed now that he wasn't included in
CC of all our mails; I hope he was able to follow our discussion
anyways. I am adding him in this mail and let us wait if he has any
critical feature or point to discuss.

Thank you, David :)

-- Lorenzo

David Greene via llvm-dev

unread,

Nov 4, 2020, 11:41:20 AM11/4/20

to Lorenzo Casalino, llvm...@lists.llvm.org

Sorry about the late reply.

Lorenzo Casalino <lorenzo.c...@gmail.com> writes:

>>>> - Should not impact compile time excessively (what is "excessive?")
>>>
>>> Probably, such estimation should be performed on
>>
>> Did something get cut off here?
>
> Uops. Yep, I removed a paragraph, but, apparentely I forgot the first
> period. In any case, we should discuss about how to quantitatively
> determine an acceptable upper-bound on the overhead on the compilation
> time and give a motivation for it. For instance, max n% overhead on the
> compilation time must be guaranteed, because ** list of reasons **.

I am not sure how we'd arrive at such a number or motivate/defend it.
Do we have any sense of the impact of the existing metadata
infrastructure? If not I'm not sure we can do it for something
completely new. I think we can set a goal but we'd have to revise it as
we gain experience.

>> Since you initially raised the topic, do you want to take the lead in
>> writing up a RFC? I can certainly do it too but I want to give you
>> right of first refusal. :)
>> -David
>
> Uhm...actually, it wasn't me but Son Tuan, so the right of refusal
> should be granted to him :) And I noticed now that he wasn't included in
> CC of all our mails; I hope he was able to follow our discussion
> anyways. I am adding him in this mail and let us wait if he has any
> critical feature or point to discuss.

Fair enough! I have recently taken on a lot more work so unfortunately
I can't devote a lot of time to this at the moment. I've got to clear
out my pipeline first. I'd be very happy to help review text, etc.

-David

Lorenzo Casalino via llvm-dev

unread,

Nov 4, 2020, 12:30:18 PM11/4/20

to David Greene, llvm...@lists.llvm.org

Le 04/11/20 à 17:40, David Greene a écrit :

> Sorry about the late reply.
>
> Lorenzo Casalino <lorenzo.c...@gmail.com> writes:
>
>>>>> - Should not impact compile time excessively (what is "excessive?")
>>>> Probably, such estimation should be performed on
>>> Did something get cut off here?
>> Uops. Yep, I removed a paragraph, but, apparentely I forgot the first
>> period. In any case, we should discuss about how to quantitatively
>> determine an acceptable upper-bound on the overhead on the compilation
>> time and give a motivation for it. For instance, max n% overhead on the
>> compilation time must be guaranteed, because ** list of reasons **.
> I am not sure how we'd arrive at such a number or motivate/defend it.
> Do we have any sense of the impact of the existing metadata
> infrastructure? If not I'm not sure we can do it for something
> completely new. I think we can set a goal but we'd have to revise it as
> we gain experience.

I think it is the best approach to employ :)

>>> Since you initially raised the topic, do you want to take the lead in
>>> writing up a RFC? I can certainly do it too but I want to give you
>>> right of first refusal. :)
>>> -David
>> Uhm...actually, it wasn't me but Son Tuan, so the right of refusal
>> should be granted to him :) And I noticed now that he wasn't included in
>> CC of all our mails; I hope he was able to follow our discussion
>> anyways. I am adding him in this mail and let us wait if he has any
>> critical feature or point to discuss.
> Fair enough! I have recently taken on a lot more work so unfortunately
> I can't devote a lot of time to this at the moment. I've got to clear
> out my pipeline first. I'd be very happy to help review text, etc.

Do not worry, it is ok ;) Meanwhile we wait for any feedback/input from Son,
I'll try to prepare a draft of RFC and publish it here.

Thank you David, and have a nice day :)

-- Lorenzo

Son Tuan VU via llvm-dev

unread,

Nov 8, 2020, 6:31:07 PM11/8/20

to Lorenzo Casalino, llvm-dev, David Greene

Hi,

Thank you all for keeping this going. Indeed I was not aware that the discussion was going on, I am really sorry for this late reply.

I understand Chris' point about metadata design. Either the metadata becomes stale or removed (if we do not teach transformations to preserve it), or we end up modifying many (if not all) transformations to keep the data intact.

Currently in the IR, I feel like the default behavior is to ignore/remove the metadata, and only a limited number of transformations know how to maintain and update it, which is a best-effort approach.

That being said, my initial thought was to adopt this approach to the MIR, so that we can at least have a minimal mechanism to communicate additional information to various transformations, or even dump it to the asm/object file.

In other words, it is the responsibility of the users who introduce/use the metadata in the MIR to teach the transformations they selected how to preserve their metadata. A common API to abstract this would definitely help, just as combineMetadata() from lib/Transforms/Utils/Local.cpp does.

As for my use case, it is also security-related. However, I do not consider the metadata to be a compilation "correctness" criteria: metadata, by definition (from the LLVM IR), can be safely removed without affecting the program's correctness.

If possible, I would like to have more details on Lorenzo's use case in order to see how metadata would interfere with program's correctness.

As for the RFC, I can definitely try to write one, but this would be my first time doing so. But maybe it is better to start with Lorenzo's proposal, as you have already been working on this? Please tell me if you prefer me to start the RFC though.

Thank you again for keeping this going.

Sincerely,

- Son

Lorenzo Casalino via llvm-dev

unread,

Nov 10, 2020, 3:27:36 AM11/10/20

to Son Tuan VU, llvm-dev, David Greene

Le 09/11/20 à 00:30, Son Tuan VU a écrit :

Hi,

Thank you all for keeping this going. Indeed I was not aware that the discussion was going on, I am really sorry for this late reply.

Nice to hear you again! Thank you for starting this thread ;)

I understand Chris' point about metadata design. Either the metadata becomes stale or removed (if we do not teach transformations to preserve it), or we end up modifying many (if not all) transformations to keep the data intact.

Currently in the IR, I feel like the default behavior is to ignore/remove the metadata, and only a limited number of transformations know how to maintain and update it, which is a best-effort approach.

That being said, my initial thought was to adopt this approach to the MIR, so that we can at least have a minimal mechanism to communicate additional information to various transformations, or even dump it to the asm/object file.

In other words, it is the responsibility of the users who introduce/use the metadata in the MIR to teach the transformations they selected how to preserve their metadata. A common API to abstract this would definitely help, just as combineMetadata() from lib/Transforms/Utils/Local.cpp does.

Unfortunately, I never worked with the LLVM-IR Metadata (I almost focused on the back-end
and I just scratched the LLVM's middle-end), but I see your point.

Clearly, applying the needed modifications to all the back-end transformations/optimizations
is unfeasible and, probably, not worth it -- different users may have different requirements/needs
regarding a specific pass.

I like the idea of a common API to handle the MIR metadata, and let the end user handle
such data. Of course, if the community encounters common cases while handling the metadata, such
cases may be integrated with the upstream project.

Nonetheless, the main point of this thread is to preserve middle-end metadata down to the
back-end, right after the Instruction Selection phase. Hence, despite the need of the end user, a
"preserve-all" policy during the lowering stage is required, which will involve a bit of changes,
in particular in the DAGCombine pass.

As for my use case, it is also security-related. However, I do not consider the metadata to be a compilation "correctness" criteria: metadata, by definition (from the LLVM IR), can be safely removed without affecting the program's correctness.

If possible, I would like to have more details on Lorenzo's use case in order to see how metadata would interfere with program's correctness.

I would really like to discuss here the details, but, unfortunately, I am working on a publication
and, thus, I cannot disclose any detail here :(

However, with "correctness" I do not refer to "I/O correctness", but the preservation of a
security property expressed in the front-end (e.g., specified in the source-code) or in the
middle-end (e.g., specified in the LLVM-IR, for instance by a transformation pass).

From a security point-of-view, removing or altering metadata does not interfere with the I/O
functionality of the code (although may impact on the performances), but may introduce
vulnerabilities.

As for the RFC, I can definitely try to write one, but this would be my first time doing so. But maybe it is better to start with Lorenzo's proposal, as you have already been working on this? Please tell me if you prefer me to start the RFC though.

It is the first time for me too, do not worry!

We could just use any other RFC as a template to get started :D

I think that a structure like the following would be fine:

1. Background
     1.1 Motivation
     1.2 Use-cases
     1.3 Other approaches
2. Goal(s)
3. Requirements
4. Drawbacks and main bottlenecks
5. Design sketch
6. Roadmap sketch
7. Potential future development

It may be a bit overkill; you are warmly invited to cut/refine these points!

And...no, I still have no sketch of the RFC; sorry, I had a bit of workload in these
days.

Yes, you can start the write up of the RFC.

Quoting David:

"Since you first raised the topic [...] I want to give you right of first refusal."

Have a nice day!

-- Lorenzo

Lorenzo Casalino via llvm-dev

unread,

Jan 6, 2021, 8:56:24 AM1/6/21

to Son Tuan VU, llvm-dev, David Greene

Dear Tuan,

How are you doing? Did you manage to start the draft for the RFC?

I take this opportunity to wish you all the best for this new year :)

Best regards,
Lorenzo Casalino

Le 10/11/20 à 09:27, Lorenzo Casalino a écrit :

Matt Morehouse via llvm-dev

unread,

Jun 15, 2021, 7:33:02 PM6/15/21

to Lorenzo Casalino, Marco Elver, David Greene, llvm-dev, Necip Yildiran, Dmitry Vyukov, David Blaikie

Did anyone send an RFC for this?

First-class metadata would be exceptionally useful for sanitizers and other dynamic tools. For

example, we want to construct PC-keyed metadata tables in the binary (without affecting the

generated code), to inform program behavior at runtime or to allow offline analysis. A

prerequisite is to actually propagate the metadata we need from the Clang frontend or LLVM

middle-end down to the assembly printer.

Our team has brainstormed many use cases:

- GWP-TSan: storing PCs of accesses lowered from C++ atomics, to filter them out from race

detection.

* List<atomic access PC>

- Stack trace compression: storing a conservative call graph, for use in decompressing stack

traces offline.

* Map[callsite PC] -> List<callee PC>

- no_sanitize attributes: storing a map of functions that have the no_sanitize("...")

attribute to the associated sanitizer, for filtering out from GWP-*San. Ideally we do not

introduce new no_sanitize string literals, but simply rely on existing ones (e.g. a

no_sanitize("thread") works for both TSan but also GWP-TSan).

* Map[Func] -> SanitizerKind

- Fuzzing aid/CFG reconstruction: marking coverage PCs as function entry/exit or # of

outgoing edges from BB (allows to find gaps in coverage frontier).

- Type-aware malloc and heap profiling: enable the allocator to get the type for a given new

call, to optimize for expected usage of the allocation.

* Map[new callsite PC] -> object type

- Other: potential use cases for future bug-finding tools (GWP-assert, GWP-MSan,

GWP-DFSan, GWP-UBSan).

First-class metadata would open the door to some really cool things.

Thanks,

Matt Morehouse

_______________________________________________

Reid Kleckner via llvm-dev

unread,

Jun 16, 2021, 2:47:49 PM6/16/21

to Matt Morehouse, Marco Elver, David Greene, llvm-dev, Necip Yildiran, Lorenzo Casalino, David Blaikie, Dmitry Vyukov

If you need PCs of certain key instructions, I suggest you take a look at MachineInstr::setPostInstrSymbol:

https://llvm.org/doxygen/classllvm_1_1MachineInstr.html#ac8ce95857a66b3706a84d1fd5072f0dd

This is used to track setjmp return addresses in CFG, for example. The feature isn't really designed to put labels on arbitrary instructions, just things like calls or atomics that aren't likely to be rewritten or replaced by later codegen passes. However, most of your use cases seem to just need return addresses, which is what this feature was made for.

Lorenzo Casalino via llvm-dev

unread,

Jun 16, 2021, 3:02:49 PM6/16/21

to Matt Morehouse, Marco Elver, David Greene, llvm-dev, Necip Yildiran, Dmitry Vyukov, David Blaikie

Hello Matt,

I think that the RFC drafting went stale some months ago due to heavy workload on which all the partecipants were subject to.

As of now, I do not know when the RFC will be actually drafted and sent.

Cheers,

Lorenzo

Le 16 juin 2021 à 1:32 AM, Matt Morehouse <mas...@google.com> a écrit :

Matt Morehouse via llvm-dev

unread,

Jun 16, 2021, 4:27:54 PM6/16/21

to Reid Kleckner, Marco Elver, David Greene, llvm-dev, Necip Yildiran, Lorenzo Casalino, David Blaikie, Dmitry Vyukov

Thanks Reid. setPostInstrSymbol is useful for getting PCs, but we still need a way to propagate the metadata we need down to the point where we can use setPostInstrSymbol (and further to the assembly printer, so we can actually encode the metadata in the binary). Things like function types, C++ object types, etc. that aren't normally available in the backend.

Matt Morehouse via llvm-dev

unread,

Jun 16, 2021, 4:42:52 PM6/16/21

to Lorenzo Casalino, Marco Elver, Necip Yildiran, llvm-dev, Dmitry Vyukov, David Blaikie

Thanks for the update, Lorenzo.

I have some free time to work on an RFC, but I'm unfamiliar with how the implementation details would work.

If I dig through this thread and try to draft something, would you and/or Son be willing to contribute?

Thanks,

Matt

Son Tuan VU via llvm-dev

unread,

Jun 16, 2021, 7:50:36 PM6/16/21

to Matt Morehouse, Marco Elver, Necip Yildiran, llvm-dev, Lorenzo Casalino, Dmitry Vyukov, David Blaikie

Hi all,

Thanks for resuscitating this discussion.

@Lorenzo please pardon me for dropping this for quite a while. It was indeed a tense period for me.

@Matt yes it'd be awesome if you can sketch an RFC, we can definitely iterate over to come up with more polished versions. I'd be more than happy to help in any way I can.

Son Tuan Vu

Lorenzo Casalino via llvm-dev

unread,

Jun 17, 2021, 1:44:36 PM6/17/21

to Son Tuan VU, Marco Elver, Necip Yildiran, llvm-dev, Dmitry Vyukov, David Blaikie

Le 17 juin 2021 à 1:50 AM, Son Tuan VU <sontua...@gmail.com> a écrit :

Hi all,

Thanks for resuscitating this discussion.

@Lorenzo please pardon me for dropping this for quite a while. It was indeed a tense period for me.

No problem, I know! ( Karine told me ;-) )

@Matt yes it'd be awesome if you can sketch an RFC, we can definitely iterate over to come up with more polished versions. I'd be more than happy to help in any way I can.

Son Tuan Vu

I agree with Son! If you nees any help, do not hesitate!

Thank you,

Lorenzo

David Greene via llvm-dev

unread,

Jun 23, 2021, 4:30:00 PM6/23/21

to Lorenzo Casalino, Matt Morehouse, Marco Elver, David Greene, llvm-dev, Necip Yildiran, David Blaikie, Dmitry Vyukov

Lorenzo Casalino via llvm-dev <llvm...@lists.llvm.org> writes:

> I think that the RFC drafting went stale some months ago due to heavy
> workload on which all the partecipants were subject to.

Indeed. In the interim I switched jobs and have been ramping up. I
am still very interested in this topic and will be happy to look over
an RFC.

Reply all

Reply to author

Forward