[RFC] Arbitrary Regions in MLIR

638 views
Skip to first unread message

Mehdi AMINI

unread,
May 16, 2019, 8:45:53 PM5/16/19
to MLIR
Hi all,

I worked with River on two proposals that are related, here is the first one. We'll send another RFC for the second. We believe this first one shouldn't be a very intrusive change at this point and we think this will open-up the design field for MLIR users. We're looking forward to some feedback here.


Arbitrary Regions in MLIR

Overview

Historically, MLIR has been designed as an SSA/CFG IR, with a special case for mlfunc which used to encapsulate in a function what is now the affine dialect. As part of removing the special case of mlfunc, and unifying the Operation class as a generic container, the concept of Region was introduced.

A very simplistic definition is that a Region is a list of blocks nested under an operation. The first use-cases for Region have been affine.for and affine.if, and more recently in-line GPU kernels. In both cases the attached region(s) is(are) assumed to contain a single block. A generalization/formalization of the Region concept is not implemented, one of the original motivation included: “the nested (lists of) blocks should be black-boxable from the perspective of code generation and traditional control- and data-flow analyses”.

However this raises the question about the common semantics for this list of blocks and to which point does the “black-boxable” property extend. This RFC advocates for pushing the "black-boxable" aspect further: regions are nothing else than a list of blocks whose semantics can only be inferred with knowledge of the region (either through attributes or through the enclosing Operation itself), in particular the list of blocks should not be guaranteed to form a CFG.

Motivation and Use-Cases for Restrictions on Regions

We can explore the “black-boxable” aspect of a Region through the anticipated use-cases we know about, and speculate on useful properties to express. The current state of region includes the ability for dialects to specify ”attribute verification hooks [...] to verify the additional constraints they impose on the regions”. Such constraints can be, for example, about a region limited to affine properties, or about a Region expressing a computation offloaded to a device like a TPU or a GPU.

The consequence of having dialect specific semantics on a Region limits the kind of transformations that can be conservatively applied on a region generically (in other words: without specific knowledge about this region).

Some views in favor of preserving CFGs is that passes could process regions as isolated CFGs (like a function body) and that having to verify if the region is a CFG or not is unnecessary boilerplate.

On the other hand, even regions that look like CFGs on the surface may not be compatible with traditional CFG transformations in subtle ways. A good example is GPU kernels and in general modeling SPMD/SIMT programming model where multiple concurrent threads execute simultaneously and implicitly the region. This model offers a simplified view of the device to the programmer, which seems to fit a traditional CFG and is represented as such in LLVM. However this is incorrect in general and LLVM has a history of subtle mis-compiles of GPU kernels. LLVM introduced the convergent attribute in 2015 to try to retrofit some of these properties on a CFG but it is known to be not enough, and even today LLVM Jump-Threading pass is still reported to be incorrect with respect to GPU kernels. The same reference indicates that HSAIL “attempts to define restrictions on program transformations in terms of immediate dominators and post-dominators”, basically restricting the definition of a kernel Region as something that does not hold the usual properties of a CFG. Embracing the "GPU extensions" to CFG as first-class concept like SPIR-V/HSAIL to make MLIR more convenient for supporting GPU programming models does not seem like a path compatible with the generality and extensibility promised by MLIR.


In the absence of the ability to process a region in a generic way by any transformation or analyzes, there isn’t much left to lose to give up on forcing every region to have a “CFG”. The main aspect that is likely a core concept is the dominance property and queries. Not having a CFG in every region implies that dominance cannot be queried in arbitrary regions. One possible solution around handling queries on dominance between two operations (or blocks) could be to consider undefined behavior (assertion in the compiler) to try to query dominance between two operations whose closest enclosing region isn’t a CFG. This seems reasonable since any consumer of dominance should already check if the Region is having the right set of properties before processing it. Without dominance, a region (or the enclosing op) should implement a dialect-specific verifier to ensure properties, such as that a value is “visible” where it is used.


On the aspects motivating making the CFG properties pluggable, on top of removing the false sense of security about working on a CFG (with the aforementioned lesson learnt from LLVM and GPU kernel for example), we open-up MLIR to be used to more accurately represent data-flow graphs like TensorFlow and/or XLA, explore high-level State-Machine IR, and finally revisit our representation of Module/Function in MLIR (RFC coming soon).

Example


"foo"() : () -> () { // CFG Region

 ^bar:

   %bar = "bar"()

   br ^non_cfg


 ^non_cfg:

   "non_cfg"() {std.noncfg} : () -> () {

     ^baz:

       %baz = "baz"()


     ^nested:

       "nested"() : () -> () { // CFG Region

         ^foobar:

             %foobar = "foobar"(%bar)

             br ^foobaz


         ^foobaz:

             "foobaz"(%baz, %foobar)

             return

     }

     return

  }

}


In this example, there are three levels of nesting: the outer region is a CFG, the ^non_cfg block in this region has a “non_cfg”() operation which encloses a non-CFG region. This region contains two blocks which don’t have any dominance relationships. However any block or operation in this region is dominated by “bar”() since it dominates the whole region by dominating the enclosing operation in a CFG region.

The “nested”() operation has CFG region attached which contains two blocks (and two operations). The "foobar"(%bar) operation dominates "foobaz"(%baz, %foobar) in this region. For the same reason as before, the whole region is dominated by “bar”(). However the tricky question is about the %baz = "baz"() dominance status with respect to  "foobaz"(%baz, %foobar). In this proposal we consider that such question is ill-formed as the closest enclosing region is not a CFG.

Regarding the question of using the value %baz, we view the implicit capture mechanism as a convenience over passing explicit arguments to an operation. So to consider identically the case above to the situation where %baz is explicitly passed to the “nested” operation, and so it is the responsibility of the verifier on the "non_cfg" region to ensure that %baz is visible to the “nested” operation.

Region Kind

We propose to introduce a new unit-type attribute in the standard dialect: `std.noncfg`. In the absence of this attribute on the enclosing operation, the region is assumed to be a CFG. We consider this as a pragmatic choice: CFG-based representation are pervasive and making it the default seems sensible. In the future we may consider adding an enum attribute (or a string attribute) to record specific type of regions as needed. For instance if/when we get to a point where the GPU kernel representation requires different semantics (around convergence for example) than the traditional CFG, we could introduce a `SIMT` or `SPMD` region kind.

Specification Changes

LangRef will be updated such that Regions are defined as “a list of MLIR Blocks,'' as opposed to “a CFG of MLIR Blocks”. A new section is introduced to specify CFG regions as a special case of region, defining the properties of CFG regions.

Blocks are redefined to be: “a list of operations without any defined execution order.” The CFG region specification adds the existing constraints on block: “In a CFG Region, a block is a sequential list of operations without control flow (calls are not considered control flow for this purpose) that are executed from top to bottom. The last operation in a block is a terminator operation, which ends the block.”

The successor-list part of an operation is renamed to block-operands, the semantic is specific to the operation.

We expect other changes, the list above is non-exhaustive.

Practical Aspects (Walkers, Traversing the IR, etc.)

The potential restrictions on Region (which includes any attribute independently of the CFG aspects, like a SIMT GPU kernel for instance) affect many pieces of the infrastructure: for example it is no longer clear if the canonicalization patterns and pattern-rewriter are always legal regardless of the dialect restriction on a Region. Similarly the use of a walker can’t be as simple in general, so we need to provide convenient helpers for passes to traverse the nested IR and filter out CFG Regions. Although most dialect-specific traversals filter for dialect-specific operations which are known to be exclusively in CFG Region and thus can’t get away with the current walkers.

It is likely that some added complexity will be warranted by the fact that traversals in general must express their intent with respect to regions in general. We plan to keep convenient walker for the common case.



-- 

Mehdi



Chris Lattner

unread,
May 17, 2019, 12:49:39 AM5/17/19
to Mehdi AMINI, MLIR
On May 16, 2019, at 5:45 PM, Mehdi AMINI <joke...@gmail.com> wrote:
I worked with River on two proposals that are related, here is the first one. We'll send another RFC for the second. We believe this first one shouldn't be a very intrusive change at this point and we think this will open-up the design field for MLIR users. We're looking forward to some feedback here.

Fantastic, thank you for putting this together, I think this is a really important step to move the conversation forward!

However this raises the question about the common semantics for this list of blocks and to which point does the “black-boxable” property extend. This RFC advocates for pushing the "black-boxable" aspect further: regions are nothing else than a list of blocks whose semantics can only be inferred with knowledge of the region (either through attributes or through the enclosing Operation itself), in particular the list of blocks should not be guaranteed to form a CFG.

+1, this is great.

Motivation and Use-Cases for Restrictions on Regions

We can explore the “black-boxable” aspect of a Region through the anticipated use-cases we know about, and speculate on useful properties to express. The current state of region includes the ability for dialects to specify ”attribute verification hooks [...] to verify the additional constraints they impose on the regions”. Such constraints can be, for example, about a region limited to affine properties, or about a Region expressing a computation offloaded to a device like a TPU or a GPU.
The consequence of having dialect specific semantics on a Region limits the kind of transformations that can be conservatively applied on a region generically (in other words: without specific knowledge about this region).

Right.

In the absence of the ability to process a region in a generic way by any transformation or analyzes, there isn’t much left to lose to give up on forcing every region to have a “CFG”. The main aspect that is likely a core concept is the dominance property and queries. Not having a CFG in every region implies that dominance cannot be queried in arbitrary regions. One possible solution around handling queries on dominance between two operations (or blocks) could be to consider undefined behavior (assertion in the compiler) to try to query dominance between two operations whose closest enclosing region isn’t a CFG. This seems reasonable since any consumer of dominance should already check if the Region is having the right set of properties before processing it. Without dominance, a region (or the enclosing op) should implement a dialect-specific verifier to ensure properties, such as that a value is “visible” where it is used.

I don’t think we necessarily have to give up this far.  

I agree with you that we need an ‘open’ ecosystem of ops that have regions, and given that, generic code should make no assumptions about the semantics or behavior of regions.  I’ve had conversations with at least one hardware team where regions are a good way to express things like VLIW instruction bundles, which have clearly different semantics than your usual CFG.  Similar, as you’re aware, regions are also a great way to represent generic DAGs like a TensorFlow graph, even when it doesn’t have normal control flow semantics.

That said, even though I think we want the ability for a region to interpret its contents in an arbitrary way, we can still impose restrictions on the form of the operations and blocks in the region, and I think that control flow semantics (through extensible terminators, which we are already set for) and therefore dominance is a good thing to preserve.  We don’t need to define the semantics of the operations and behavior of the code, but we can still require that any values are defined before use in a CFG or block if they exist.

The reason I raise this is that we do have CFG-like semantics baked into our IR due to block arguments, and I pretty strongly feel that block arguments are the right way to go.

Do you see any problems with that?

On the aspects motivating making the CFG properties pluggable, on top of removing the false sense of security about working on a CFG (with the aforementioned lesson learnt from LLVM and GPU kernel for example), we open-up MLIR to be used to more accurately represent data-flow graphs like TensorFlow and/or XLA, explore high-level State-Machine IR, and finally revisit our representation of Module/Function in MLIR (RFC coming soon).

Yes, absolutely.  In addition to the open op/region ecosystem, I think it is really important that we standardize a few “properties” that these operations can buy into, because the affine.if/for, the std.func operations, and many others *will* have sensible LLVM-like semantics, and we want dialect independent passes like LICM, CSE, and others to be able to apply to these sorts of regions.

I (still) think that this is a problem for another day, and will become very clear when we have more sorts of users and resolve some of the other more pressing issues.

Region Kind

We propose to introduce a new unit-type attribute in the standard dialect: `std.noncfg`. In the absence of this attribute on the enclosing operation, the region is assumed to be a CFG. We consider this as a pragmatic choice: CFG-based representation are pervasive and making it the default seems sensible.

Why encode this as an attribute?  These properties seem like something that is true for all instances of a given operation, so they should be a bit on AbstractOperation, not an attribute.  This allows you to invert the sense of the bit.  We generally want the absence of a set bit on AbstractOperation to be the conservative interpretation of something, so this will work out better.

That said, it isn’t clear what “CFG semantics” means.  Can you define that better?  What sorts of transformations are allowed and prohibited?  You gave a large number of GPU issues from LLVM - are those sorts of use cases not “CFG like”?

Would it make more sense to have a small family of orthogonal bits to model the independent properties that various CFG-like regions have?

For example, we could have a bit for things like “unreachable blocks are dead”.

Specification Changes
LangRef will be updated such that Regions are defined as “a list of MLIR Blocks,'' as opposed to “a CFG of MLIR Blocks”. A new section is introduced to specify CFG regions as a special case of region, defining the properties of CFG regions.
Blocks are redefined to be: “a list of operations without any defined execution order.” The CFG region specification adds the existing constraints on block: “In a CFG Region, a block is a sequential list of operations without control flow (calls are not considered control flow for this purpose) that are executed from top to bottom. The last operation in a block is a terminator operation, which ends the block.”
The successor-list part of an operation is renamed to block-operands, the semantic is specific to the operation.
We expect other changes, the list above is non-exhaustive.

I’m ok with removing CFG from this description if necessary, but also don’t consider it essential depending on what your sense of the bit/naming discussion above is.  

Practical Aspects (Walkers, Traversing the IR, etc.)

The potential restrictions on Region (which includes any attribute independently of the CFG aspects, like a SIMT GPU kernel for instance) affect many pieces of the infrastructure: for example it is no longer clear if the canonicalization patterns and pattern-rewriter are always legal regardless of the dialect restriction on a Region. Similarly the use of a walker can’t be as simple in general, so we need to provide convenient helpers for passes to traverse the nested IR and filter out CFG Regions. Although most dialect-specific traversals filter for dialect-specific operations which are known to be exclusively in CFG Region and thus can’t get away with the current walkers.
It is likely that some added complexity will be warranted by the fact that traversals in general must express their intent with respect to regions in general. We plan to keep convenient walker for the common case.

I agree that we should audit these things, but I don’t expect widespread problems.  For example, the generic walker and rewriter should be fine given that they are defined on structural properties.  You can’t put a tf.Add into a region that doesn’t support data flow or CFG context, so any patterns defined on a tf.Add should be fine no matter where it shows up.

Similarly, walkers and visitors are defined over structural properties of the IR, so they should be ok as well.

-Chris

Mehdi AMINI

unread,
May 17, 2019, 1:18:46 AM5/17/19
to Chris Lattner, MLIR
On Thu, May 16, 2019 at 9:49 PM Chris Lattner <clat...@google.com> wrote:
On May 16, 2019, at 5:45 PM, Mehdi AMINI <joke...@gmail.com> wrote:
I worked with River on two proposals that are related, here is the first one. We'll send another RFC for the second. We believe this first one shouldn't be a very intrusive change at this point and we think this will open-up the design field for MLIR users. We're looking forward to some feedback here.

Fantastic, thank you for putting this together, I think this is a really important step to move the conversation forward!

However this raises the question about the common semantics for this list of blocks and to which point does the “black-boxable” property extend. This RFC advocates for pushing the "black-boxable" aspect further: regions are nothing else than a list of blocks whose semantics can only be inferred with knowledge of the region (either through attributes or through the enclosing Operation itself), in particular the list of blocks should not be guaranteed to form a CFG.

+1, this is great.

Motivation and Use-Cases for Restrictions on Regions

We can explore the “black-boxable” aspect of a Region through the anticipated use-cases we know about, and speculate on useful properties to express. The current state of region includes the ability for dialects to specify ”attribute verification hooks [...] to verify the additional constraints they impose on the regions”. Such constraints can be, for example, about a region limited to affine properties, or about a Region expressing a computation offloaded to a device like a TPU or a GPU.
The consequence of having dialect specific semantics on a Region limits the kind of transformations that can be conservatively applied on a region generically (in other words: without specific knowledge about this region).

Right.

In the absence of the ability to process a region in a generic way by any transformation or analyzes, there isn’t much left to lose to give up on forcing every region to have a “CFG”. The main aspect that is likely a core concept is the dominance property and queries. Not having a CFG in every region implies that dominance cannot be queried in arbitrary regions. One possible solution around handling queries on dominance between two operations (or blocks) could be to consider undefined behavior (assertion in the compiler) to try to query dominance between two operations whose closest enclosing region isn’t a CFG. This seems reasonable since any consumer of dominance should already check if the Region is having the right set of properties before processing it. Without dominance, a region (or the enclosing op) should implement a dialect-specific verifier to ensure properties, such as that a value is “visible” where it is used.

I don’t think we necessarily have to give up this far.  

I agree with you that we need an ‘open’ ecosystem of ops that have regions, and given that, generic code should make no assumptions about the semantics or behavior of regions.  I’ve had conversations with at least one hardware team where regions are a good way to express things like VLIW instruction bundles, which have clearly different semantics than your usual CFG.  Similar, as you’re aware, regions are also a great way to represent generic DAGs like a TensorFlow graph, even when it doesn’t have normal control flow semantics.

That said, even though I think we want the ability for a region to interpret its contents in an arbitrary way, we can still impose restrictions on the form of the operations and blocks in the region, and I think that control flow semantics (through extensible terminators, which we are already set for) and therefore dominance is a good thing to preserve.  We don’t need to define the semantics of the operations and behavior of the code, but we can still require that any values are defined before use in a CFG or block if they exist. 

The reason I raise this is that we do have CFG-like semantics baked into our IR due to block arguments, and I pretty strongly feel that block arguments are the right way to go.

Do you see any problems with that?

I don't see a problem with not changing anything in the way we model CFGs: block arguments are nice and successors on terminator seem also like a well understood design.

I'm not sure I understand your point about "dominance is a good thing to preserve". If a region is non-CFG the concept of dominance may not be applicable.
If we talk about a data flow graph (like TensorFlow), we may want to express a non-CFG region with the sequence:

  %add1 = tf.Add(%0, %1)
  %add2 = tf.Add(%2, %3)
  %mul = tf.Mul(%add1, %add2)

%add1 and %add2 are not ordered one with respect to each other, and there isn't any dominance between the two (%add2 does not post-dominate %add1 here): we would be able to have a block with unordered operations.

Similarly a TensorFlow loop could be expressed in a single block without breaking up the "back-edge":

   %loop.body.init = tf_executor.Merge(%loop.init, %next_count
   %dec_count = "tf.Add"(%loop.body.init, -1)
   %loop_cond = "tf.NotEqual"(%dec_count, 0) :
   %true, %false = tf_executor.Switch (%loop_cond, %dec_count) 
   %next_count = tf_executor.NextIteration(%false)

If we must preserve dominance, then it seems we always have a CFG somehow?


 
On the aspects motivating making the CFG properties pluggable, on top of removing the false sense of security about working on a CFG (with the aforementioned lesson learnt from LLVM and GPU kernel for example), we open-up MLIR to be used to more accurately represent data-flow graphs like TensorFlow and/or XLA, explore high-level State-Machine IR, and finally revisit our representation of Module/Function in MLIR (RFC coming soon).

Yes, absolutely.  In addition to the open op/region ecosystem, I think it is really important that we standardize a few “properties” that these operations can buy into, because the affine.if/for, the std.func operations, and many others *will* have sensible LLVM-like semantics, and we want dialect independent passes like LICM, CSE, and others to be able to apply to these sorts of regions.

I (still) think that this is a problem for another day, and will become very clear when we have more sorts of users and resolve some of the other more pressing issues.

Region Kind

We propose to introduce a new unit-type attribute in the standard dialect: `std.noncfg`. In the absence of this attribute on the enclosing operation, the region is assumed to be a CFG. We consider this as a pragmatic choice: CFG-based representation are pervasive and making it the default seems sensible.

Why encode this as an attribute?  These properties seem like something that is true for all instances of a given operation, so they should be a bit on AbstractOperation, not an attribute.  This allows you to invert the sense of the bit.  We generally want the absence of a set bit on AbstractOperation to be the conservative interpretation of something, so this will work out better.

I just didn't think about using a bit on the AbstractOperation: I intuitively try to design things in a generic way through the dialect interface: stealing a bit on AbstractOperation requires hard-coding the concept in MLIR right? Using attributes is the only way for dialects to express such concepts (without patching MLIR itself).

This is likely enough of a "core" concept in this case that I'm fine with baking it inside AbstractOperation though.
We went with "default to CFG" motivated by a prettier syntax and expecting the overall majority of operation to hold a CFG-region, but in retrospect it wasn't justified since registered op can omit such attribute in their custom printer/parser.

 
That said, it isn’t clear what “CFG semantics” means.  Can you define that better?  What sorts of transformations are allowed and prohibited?  You gave a large number of GPU issues from LLVM - are those sorts of use cases not “CFG like”?

Would it make more sense to have a small family of orthogonal bits to model the independent properties that various CFG-like regions have?

For example, we could have a bit for things like “unreachable blocks are dead”.

So far we were thinking about starting with a single "bit" expressing that we're in a CFG (in the LLVM sense) and so dominance queries are valid, and add more properties/traits as we encounter use-cases (SIMT/SIMD for GPU kernels could be one). We haven't done the work of expressing the various "bits" we would need to slice the useful properties of CFG we would want to express independently.

-- 
Mehdi

Stephen Neuendorffer

unread,
May 17, 2019, 1:42:49 AM5/17/19
to Mehdi AMINI, MLIR

I'd be very interested in understanding how this proposal could allow the modeling of more concurrent semantics. One case is process network/dataflow models with feedback.  This would be somewhat similar to tensorflow, presumably.   The key challenge is that variables would represent streams of elements, rather than single elements, which would not imply a simple ordering between operations in processes. (Related to this, do you have a document for how tensorflow-like models would be represented in MLIR?  If there is one, I seem to have missed it.)    

The second case is synchronous HDL-style semantics.  The key difference here is that operations in a block would not execute strictly sequentially, but would execute simultaneously, resulting in a simultaneous update of all variables in the block.

Steve


--
You received this message because you are subscribed to the Google Groups "MLIR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mlir+uns...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/mlir/CANF-O%3DYbZv%3De5Y7eo0W%2BekmQVYEOeAxy0hCHOXfSr%2BBuCRq2AA%40mail.gmail.com.

Chris Lattner

unread,
May 17, 2019, 12:28:13 PM5/17/19
to Mehdi AMINI, MLIR
On May 16, 2019, at 10:18 PM, Mehdi AMINI <joke...@gmail.com> wrote:
The reason I raise this is that we do have CFG-like semantics baked into our IR due to block arguments, and I pretty strongly feel that block arguments are the right way to go.

Do you see any problems with that?

I don't see a problem with not changing anything in the way we model CFGs: block arguments are nice and successors on terminator seem also like a well understood design.

I'm not sure I understand your point about "dominance is a good thing to preserve". If a region is non-CFG the concept of dominance may not be applicable.
If we talk about a data flow graph (like TensorFlow), we may want to express a non-CFG region with the sequence:

  %add1 = tf.Add(%0, %1)
  %add2 = tf.Add(%2, %3)
  %mul = tf.Mul(%add1, %add2)

%add1 and %add2 are not ordered one with respect to each other, and there isn't any dominance between the two (%add2 does not post-dominate %add1 here): we would be able to have a block with unordered operations.

Right, but we can already model this case.

To be clear, what I mean by “dominance is a good thing to preserve” are structural properties of the IR.  Regardless of this discussion, it sounds like we agree that regions can contain a graph of blocks, and the edges of that graph are defined by terminators.  If so, a dominator tree may be formed, because a dominator tree can be formed from any graph.  Similarly, I assume we agree that operands to terminators are expected to line up with the bbargs of the corresponding successor.

Given this (which I think we agree on) then these are the sorts of structural properties that emerge:

1. Operations in the same block can only refer to each other in lexical order (lower ones can refer to earlier ones)
2. An operation in a block is allowed to refer to a bb argument defined on its block.
3. An operation is only allowed to refer to a value defined in another block (by a bbarg or operation) when it is dominated (in the domtree sense) by it.
4. An operation is only allowed to refer to an SSA value in a different if it is lexically enclosed and dominated by it (though the inverse is not always true, you can’t always refer to values outside your region, it depends on region semantics).

These seem like fundamental properties that prove convergence of algorithms in a simple way, and are fundamental assumptions of SSA based optimizers.  While I agree that we could throw these things overboard, I would really prefer to not to unless there is an overwhelming “whole system” reason why it is better to do so.

As an analogy, very early in the design of LLVM, there was a contingent that argued strongly against an “SSA only” IR stating (correctly) that source languages are in SSA, this makes certain transformations more awkward, etc.  Despite those challenges, the “alloca + mem2reg” design has served LLVM very well, and I’m glad that we did the harder but better thing.

Similarly a TensorFlow loop could be expressed in a single block without breaking up the "back-edge":

   %loop.body.init = tf_executor.Merge(%loop.init, %next_count
   %dec_count = "tf.Add"(%loop.body.init, -1)
   %loop_cond = "tf.NotEqual"(%dec_count, 0) :
   %true, %false = tf_executor.Switch (%loop_cond, %dec_count) 
   %next_count = tf_executor.NextIteration(%false)

I agree that we “could” do that, but it isn’t clear to me why that is better.  What does it make easier?  Given that NextIteration, Switch, and Merge are going away over time, why is it best to erode whole-system design points in order to make them better?


That said, I’ll note that dominance is undefined for unreachable blocks.  I have mixed feelings about this (lots of infinite compile bugs are caused by it), but code like this is valid in LLVM and MLIR today:

func @foo() {
   return

bb1:
   %x = add %x, 1
   br bb1
}

If the design above was super important, you could [ab]use it to achieve the same result.


If we must preserve dominance, then it seems we always have a CFG somehow?

I don’t know what you mean by this.

Region Kind

We propose to introduce a new unit-type attribute in the standard dialect: `std.noncfg`. In the absence of this attribute on the enclosing operation, the region is assumed to be a CFG. We consider this as a pragmatic choice: CFG-based representation are pervasive and making it the default seems sensible.

Why encode this as an attribute?  These properties seem like something that is true for all instances of a given operation, so they should be a bit on AbstractOperation, not an attribute.  This allows you to invert the sense of the bit.  We generally want the absence of a set bit on AbstractOperation to be the conservative interpretation of something, so this will work out better.

I just didn't think about using a bit on the AbstractOperation: I intuitively try to design things in a generic way through the dialect interface: stealing a bit on AbstractOperation requires hard-coding the concept in MLIR right? Using attributes is the only way for dialects to express such concepts (without patching MLIR itself).

Currently yes, but I see this as a temporary limitation.  We have a conceptual design for dialect specific bits on AbstractOperation that we discussed a week or two ago, it just hasn’t been implemented yet.

That said, “cfg semantics” is a pretty core thing, so it makes sense to burn some amount of that into the IR, just like we’ve decided that folding is such an invariant in the compiler universe that we’ve put it into AbstractOperation directly.

This is likely enough of a "core" concept in this case that I'm fine with baking it inside AbstractOperation though.
We went with "default to CFG" motivated by a prettier syntax and expecting the overall majority of operation to hold a CFG-region, but in retrospect it wasn't justified since registered op can omit such attribute in their custom printer/parser.

 
That said, it isn’t clear what “CFG semantics” means.  Can you define that better?  What sorts of transformations are allowed and prohibited?  You gave a large number of GPU issues from LLVM - are those sorts of use cases not “CFG like”?

Would it make more sense to have a small family of orthogonal bits to model the independent properties that various CFG-like regions have?

For example, we could have a bit for things like “unreachable blocks are dead”.

So far we were thinking about starting with a single "bit" expressing that we're in a CFG (in the LLVM sense) and so dominance queries are valid, and add more properties/traits as we encounter use-cases (SIMT/SIMD for GPU kernels could be one). We haven't done the work of expressing the various "bits" we would need to slice the useful properties of CFG we would want to express independently.

The design point I’m arguing for is what we currently have: 1) dominance queries are always fully defined based on structural properties.  2) dialects can choose the best way to [ab]use the structural restrictions of the system to model their problems. 3) regions by themselves do not imply any specific interpretation to the blocks or operations within them. 4) passes that know about specific regions (e.g. cfg.func, affine.for, etc) can use fully domain specific information to do good things. 5) We carefully define some systemic bits to capture the properties we need to make important basic passes (CSE, DCE, etc) dialect independent.

-Chris


Mehdi AMINI

unread,
May 17, 2019, 1:47:07 PM5/17/19
to Chris Lattner, MLIR
On Fri, May 17, 2019 at 9:28 AM Chris Lattner <clat...@google.com> wrote:
On May 16, 2019, at 10:18 PM, Mehdi AMINI <joke...@gmail.com> wrote:
The reason I raise this is that we do have CFG-like semantics baked into our IR due to block arguments, and I pretty strongly feel that block arguments are the right way to go.

Do you see any problems with that?

I don't see a problem with not changing anything in the way we model CFGs: block arguments are nice and successors on terminator seem also like a well understood design.

I'm not sure I understand your point about "dominance is a good thing to preserve". If a region is non-CFG the concept of dominance may not be applicable.
If we talk about a data flow graph (like TensorFlow), we may want to express a non-CFG region with the sequence:

  %add1 = tf.Add(%0, %1)
  %add2 = tf.Add(%2, %3)
  %mul = tf.Mul(%add1, %add2)

%add1 and %add2 are not ordered one with respect to each other, and there isn't any dominance between the two (%add2 does not post-dominate %add1 here): we would be able to have a block with unordered operations.

Right, but we can already model this case.

To be clear, what I mean by “dominance is a good thing to preserve” are structural properties of the IR.  Regardless of this discussion, it sounds like we agree that regions can contain a graph of blocks,

The key here to me here to know if we have the same things in mind is "can": I agree that a regions can contain a graph of blocks, but it does not imply that a region containing multiple blocks is necessarily chaining them like a CFG.
 
and the edges of that graph are defined by terminators.  If so, a dominator tree may be formed, because a dominator tree can be formed from any graph.

Right, but let's make sure we don't over assume: "any graph" to me implies that there can be multiple entries for instance (two different parallel entry blocks with side-effects that can't be ordered for instance).
 
 Similarly, I assume we agree that operands to terminators are expected to line up with the bbargs of the corresponding successor.

The point of non-CFG region is to not make this assumption about edges between blocks and terminator. This leave everything up to the verifier. As such nothing would prevent a modeling like:

my_streaming_region() ({
  ^b0():
      %val1 = "op"()
      "send"(%val1) [^b1] // Non terminator here.
      %val2" =  "op"()
      "send"(%val2) [^b2]
      yield
      
  ^b1(%arg1):
      ...

  ^b2(%arg2):
     ...
})


Where every block is conceptually executing in parallel and wait for its arguments to be "ready" before executing.

Of course you can model this with separate functions instead of blocks for instance, but then we force an inter-procedural design. Could also use future and explicit loop to model the runtime behavior, but that is obfuscating the high level semantics of the program here.
I'm arguing for leaving the possibility of the IR to generalize here, without losing anything that exists today in the current (CFG) regions.


 

Given this (which I think we agree on) then these are the sorts of structural properties that emerge:

1. Operations in the same block can only refer to each other in lexical order (lower ones can refer to earlier ones)

Just a nit: this is already broken in unreachable blocks in LLVM, which means you can't make this assumption locally in a block.
We're back to the notion of dominance and control flow, if the region isn't a CFG, you don't (necessarily) have an entry block (at least you don't know opaquely which one is it) or you can have multiple.

2. An operation in a block is allowed to refer to a bb argument defined on its block.
3. An operation is only allowed to refer to a value defined in another block (by a bbarg or operation) when it is dominated (in the domtree sense) by it.

This ties into CFG semantics as far as I can tell.
The whole point of non-CFG region is to escape these constraints and allow: 

non_cfg_region() ({
  ^b0():
      %val1 = "op"()
      yield      
  ^b1():
      %val2 = "op"()
      yield
  ^b2():
       op(%val2, %val1)
      yield

...
})
To summarize: you advocating entirely against the non-CFG regions proposal entirely (most of what you're saying is exactly what Albert has been arguing about).
 
1) dominance queries are always fully defined based on structural properties.  2) dialects can choose the best way to [ab]use the structural restrictions of the system to model their problems. 3) regions by themselves do not imply any specific interpretation to the blocks or operations within them. 4) passes that know about specific regions (e.g. cfg.func, affine.for, etc) can use fully domain specific information to do good things. 5) We carefully define some systemic bits to capture the properties we need to make important basic passes (CSE, DCE, etc) dialect independent.


I'm advocating for leaving the possibility of the IR to generalize, without losing anything that exists today in the current (CFG) regions.
I believe this makes MLIR more generic and applicable to a larger range of domain. It embraces the "multi" level part of the goal. I believe that restricting to CFGs and SSA implies that representation outside of the middle-end (close to source-languages) will have more impedance mismatch and likely will have to rewrite another infrastructure because MLIR won't fit their need.

On the other hand, I don't see any downside to *allow* non-CFG regions: the only "cost" is that analyses and transformation need to bail on non-CFG regions. All the argument made is that bailing because of non-CFG regions isn't different from bailing because you are in a SIMT/SIMD regions for instance. Also this seems "cheap" to me at this point of the project, but won't be tractable in a few year.

The rules for using a value become different, I prefer to talk about "visible" instead of "dominate":
- if the current region is a CFG, just follow the usual dominance rules to infer visibility.
- if the region isn't a CFG, then opaquely you don't verify uses of values defined inside the region (other that they can't escape). Dialects provides verifier to enforce visibility rules.
- if the region has implicit capture, any value that is visible to the operation holding the region is visible in the region (if it is valid to use a value as an operand for an operation, if can be used it inside the region attached to the operation). This isn't different from the current situation.


-- 
Mehdi

Chris Lattner

unread,
May 17, 2019, 8:52:19 PM5/17/19
to Mehdi AMINI, MLIR
On May 17, 2019, at 10:46 AM, Mehdi AMINI <joke...@gmail.com> wrote:

%add1 and %add2 are not ordered one with respect to each other, and there isn't any dominance between the two (%add2 does not post-dominate %add1 here): we would be able to have a block with unordered operations.

Right, but we can already model this case.

To be clear, what I mean by “dominance is a good thing to preserve” are structural properties of the IR.  Regardless of this discussion, it sounds like we agree that regions can contain a graph of blocks,

The key here to me here to know if we have the same things in mind is "can": I agree that a regions can contain a graph of blocks, but it does not imply that a region containing multiple blocks is necessarily chaining them like a CFG.

Right, unreachable blocks are already a thing in our current design.

and the edges of that graph are defined by terminators.  If so, a dominator tree may be formed, because a dominator tree can be formed from any graph.

Right, but let's make sure we don't over assume: "any graph" to me implies that there can be multiple entries for instance (two different parallel entry blocks with side-effects that can't be ordered for instance).

I would prefer to not allow that.  This breaks core assumptions about dominance (which is probably why you bring it up) and make various other algorithms more awkward.  Are there specific use-cases you’d like to cover that can’t be handled other ways?  Multiple entry points can be modeled/emulated in lots of ways with a single entry point design.

 Similarly, I assume we agree that operands to terminators are expected to line up with the bbargs of the corresponding successor.

The point of non-CFG region is to not make this assumption about edges between blocks and terminator. This leave everything up to the verifier. As such nothing would prevent a modeling like:

my_streaming_region() ({
  ^b0():
      %val1 = "op"()
      "send"(%val1) [^b1] // Non terminator here.
      %val2" =  "op"()
      "send"(%val2) [^b2]
      yield
      
  ^b1(%arg1):
      ...

  ^b2(%arg2):
     ...
})


Where every block is conceptually executing in parallel and wait for its arguments to be "ready" before executing.

Of course you can model this with separate functions instead of blocks for instance, but then we force an inter-procedural design. Could also use future and explicit loop to model the runtime behavior, but that is obfuscating the high level semantics of the program here.
I'm arguing for leaving the possibility of the IR to generalize here, without losing anything that exists today in the current (CFG) regions.

Ok, this is squarely in the “I agree we could do that, but I don’t understand why we would”.  This is eroding a lot of fundamental invariants that we would otherwise have.  Do you have specific large use cases in mind that can’t be solved in other ways?

Even if this were the right long term direction to go, I’d feel much better about taking things one step at a time and feature creeping towards it.


Given this (which I think we agree on) then these are the sorts of structural properties that emerge:

1. Operations in the same block can only refer to each other in lexical order (lower ones can refer to earlier ones)

Just a nit: this is already broken in unreachable blocks in LLVM, which means you can't make this assumption locally in a block.
We're back to the notion of dominance and control flow, if the region isn't a CFG, you don't (necessarily) have an entry block (at least you don't know opaquely which one is it) or you can have multiple.

Yes, I know and I mentioned that below as well.  Just so you know, this is a source of unending bugs in LLVM, and it is really unfortunate.  If I knew how to solve this problem I would (suggestions welcome :-).

2. An operation in a block is allowed to refer to a bb argument defined on its block.
3. An operation is only allowed to refer to a value defined in another block (by a bbarg or operation) when it is dominated (in the domtree sense) by it.

This ties into CFG semantics as far as I can tell.
The whole point of non-CFG region is to escape these constraints and allow: 

non_cfg_region() ({
  ^b0():
      %val1 = "op"()
      yield      
  ^b1():
      %val2 = "op"()
      yield
  ^b2():
       op(%val2, %val1)
      yield

...
})

This is completely valid in our current design.  You don’t need to break anything about our current invariants to have unreachable blocks that can’t be deleted.

So far we were thinking about starting with a single "bit" expressing that we're in a CFG (in the LLVM sense) and so dominance queries are valid, and add more properties/traits as we encounter use-cases (SIMT/SIMD for GPU kernels could be one). We haven't done the work of expressing the various "bits" we would need to slice the useful properties of CFG we would want to express independently.

The design point I’m arguing for is what we currently have:

To summarize: you advocating entirely against the non-CFG regions proposal entirely (most of what you're saying is exactly what Albert has been arguing about).

I haven’t followed Albert’s discussions, so I can’t comment on that.

I’m arguing that we keep the structural properties of the IR, and define a set of bits that provide semantic properties needed by target-independent passes.


1) dominance queries are always fully defined based on structural properties.  2) dialects can choose the best way to [ab]use the structural restrictions of the system to model their problems. 3) regions by themselves do not imply any specific interpretation to the blocks or operations within them. 4) passes that know about specific regions (e.g. cfg.func, affine.for, etc) can use fully domain specific information to do good things. 5) We carefully define some systemic bits to capture the properties we need to make important basic passes (CSE, DCE, etc) dialect independent.


I'm advocating for leaving the possibility of the IR to generalize, without losing anything that exists today in the current (CFG) regions.
I believe this makes MLIR more generic and applicable to a larger range of domain. It embraces the "multi" level part of the goal. I believe that restricting to CFGs and SSA implies that representation outside of the middle-end (close to source-languages) will have more impedance mismatch and likely will have to rewrite another infrastructure because MLIR won't fit their need.

I think you’re potentially overly dramatizing the situation here.  :-)

I’m not saying that we should never do this.  I’m saying that I would like to see large evidence that we “have to” do this, e.g. that any problems encountered by a narrower design cannot be solved or worked around in a reasonable way.  With that understanding, I’d understand the cost/benefit tradeoff better.

We are progressively generalizing the MLIR design, and nothing is set in stone.  I think it makes sense to take things incrementally here, and the next logical step is to start nailing down the existing region semantic bits that specify the dialect independent properties that do actually hold for certain of our existing regions.

-Chris


Mehdi AMINI

unread,
May 17, 2019, 9:07:58 PM5/17/19
to Chris Lattner, MLIR
The point is that these aren't unreachable. Relying on these being "unreachable" but can't be deleted seems quite strange to me, it looks like a hack and I'm not sure what are the practical consequences. For example if I summarize the effects of this region, can the side-effect of the operations in such blocks be ignored?




 

So far we were thinking about starting with a single "bit" expressing that we're in a CFG (in the LLVM sense) and so dominance queries are valid, and add more properties/traits as we encounter use-cases (SIMT/SIMD for GPU kernels could be one). We haven't done the work of expressing the various "bits" we would need to slice the useful properties of CFG we would want to express independently.

The design point I’m arguing for is what we currently have:

To summarize: you advocating entirely against the non-CFG regions proposal entirely (most of what you're saying is exactly what Albert has been arguing about).

I haven’t followed Albert’s discussions, so I can’t comment on that.

I’m arguing that we keep the structural properties of the IR, and define a set of bits that provide semantic properties needed by target-independent passes.


1) dominance queries are always fully defined based on structural properties.  2) dialects can choose the best way to [ab]use the structural restrictions of the system to model their problems. 3) regions by themselves do not imply any specific interpretation to the blocks or operations within them. 4) passes that know about specific regions (e.g. cfg.func, affine.for, etc) can use fully domain specific information to do good things. 5) We carefully define some systemic bits to capture the properties we need to make important basic passes (CSE, DCE, etc) dialect independent.


I'm advocating for leaving the possibility of the IR to generalize, without losing anything that exists today in the current (CFG) regions.
I believe this makes MLIR more generic and applicable to a larger range of domain. It embraces the "multi" level part of the goal. I believe that restricting to CFGs and SSA implies that representation outside of the middle-end (close to source-languages) will have more impedance mismatch and likely will have to rewrite another infrastructure because MLIR won't fit their need.

I think you’re potentially overly dramatizing the situation here.  :-)

I’m not saying that we should never do this.  I’m saying that I would like to see large evidence that we “have to” do this, e.g. that any problems encountered by a narrower design cannot be solved or worked around in a reasonable way.  With that understanding, I’d understand the cost/benefit tradeoff better.

Right now it seems more conservative and future proof to me to leave the design open. It seems much easier to constraint the design later than the opposite, making this change in one year will be much harder when the code will rely on these implicit assumption instead of being written with an explicit check.
Also having this possibility in the design means that users that we can't foresee or anticipate right now will be able to take advantage of this in their design. There is a chicken-and-egg situation if the framework hardcode this assumption everywhere and no-one can even experiment with an alternative design.

Overall, if there is little cost to do it now, why not?

 

We are progressively generalizing the MLIR design, and nothing is set in stone.  I think it makes sense to take things incrementally here, and the next logical step is to start nailing down the existing region semantic bits that specify the dialect independent properties that do actually hold for certain of our existing regions.

-Chris


--
You received this message because you are subscribed to the Google Groups "MLIR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mlir+uns...@tensorflow.org.

Eric Schweitz US

unread,
May 21, 2019, 12:51:04 PM5/21/19
to MLIR

On Friday, May 17, 2019 at 10:47:07 AM UTC-7, Mehdi AMINI wrote:

On Fri, May 17, 2019 at 9:28 AM Chris Lattner <clat...@google.com> wrote:
On May 16, 2019, at 10:18 PM, Mehdi AMINI <joke...@gmail.com> wrote:

2. An operation in a block is allowed to refer to a bb argument defined on its block.
3. An operation is only allowed to refer to a value defined in another block (by a bbarg or operation) when it is dominated (in the domtree sense) by it.

This ties into CFG semantics as far as I can tell.
The whole point of non-CFG region is to escape these constraints and allow: 

non_cfg_region() ({
  ^b0():
      %val1 = "op"()
      yield      
  ^b1():
      %val2 = "op"()
      yield
  ^b2():
       op(%val2, %val1)
      yield

...
})


Can you clarify this further? Is the intended semantics of a "non-CFG" region that every block executes in parallel? Earlier in this thread, it seemed like another implication was that operations within a specific block would be understood to be an unordered sea of ops.

--
Eric
 

Mehdi AMINI

unread,
May 21, 2019, 2:02:25 PM5/21/19
to Eric Schweitz US, MLIR
On Tue, May 21, 2019 at 9:51 AM Eric Schweitz US <esch...@nvidia.com> wrote:

On Friday, May 17, 2019 at 10:47:07 AM UTC-7, Mehdi AMINI wrote:

On Fri, May 17, 2019 at 9:28 AM Chris Lattner <clat...@google.com> wrote:
On May 16, 2019, at 10:18 PM, Mehdi AMINI <joke...@gmail.com> wrote:

2. An operation in a block is allowed to refer to a bb argument defined on its block.
3. An operation is only allowed to refer to a value defined in another block (by a bbarg or operation) when it is dominated (in the domtree sense) by it.

This ties into CFG semantics as far as I can tell.
The whole point of non-CFG region is to escape these constraints and allow: 

non_cfg_region() ({
  ^b0():
      %val1 = "op"()
      yield      
  ^b1():
      %val2 = "op"()
      yield
  ^b2():
       op(%val2, %val1)
      yield

...
})


Can you clarify this further? Is the intended semantics of a "non-CFG" region that every block executes in parallel?

The intention is to leave up the semantics to the operation holding the region entirely. You can't make assumption related to execution on this region opaquely (by "opaquely" I mean without knowing the operation holding the region or a specific attribute on this operation specifying something about the region).
 
Earlier in this thread, it seemed like another implication was that operations within a specific block would be understood to be an unordered sea of ops.

This is a possibility of a non-CFG region: it may or may not be on a case-by-case basis: there is no enforced behavior by default.

-- 
Mehdi

Sean Silva

unread,
May 28, 2019, 4:29:56 PM5/28/19
to MLIR
Ops have always represented some sort of "execution". The functions-as-ops proposal seems to generalize this to ops being "declarative" rather than representing something with runtime execution semantics. That seems like a major shift. Is there precendent for that? What are the ramifications of "declarative ops" and "declarative regions"?


-- Sean Silva

Mehdi AMINI

unread,
May 28, 2019, 4:37:38 PM5/28/19
to Sean Silva, MLIR
Maybe in practice this is what most of the operations resolve to because that's what most of the program is?
However I'm not sure this is a real aspect of an Operation: for instance we discussed having an "assume" operation which is not intended to have any execution aspect but would communicate some assumptions to the compiler analyses.
 
The functions-as-ops proposal seems to generalize this to ops being "declarative" rather than representing something with runtime execution semantics. That seems like a major shift. Is there precendent for that? What are the ramifications of "declarative ops" and "declarative regions"?

I don't think this proposal is changing anything here: non-CFG region seems orthogonal to have "declarative operations". 
For instance we should be able to move forward with the current proposal to turn module and function to be operations without the non-CFG regions, and at this point function operation inside a module would be declarative.
Other things that will be possible (independently of the non-CFG region) with the "function as operations" proposal is the ability to have declarative operations for module-level metadata.

Best,

-- 
Mehdi

 


-- Sean Silva
 

The CFG region specification adds the existing constraints on block: “In a CFG Region, a block is a sequential list of operations without control flow (calls are not considered control flow for this purpose) that are executed from top to bottom. The last operation in a block is a terminator operation, which ends the block.”

The successor-list part of an operation is renamed to block-operands, the semantic is specific to the operation.

We expect other changes, the list above is non-exhaustive.

Practical Aspects (Walkers, Traversing the IR, etc.)

The potential restrictions on Region (which includes any attribute independently of the CFG aspects, like a SIMT GPU kernel for instance) affect many pieces of the infrastructure: for example it is no longer clear if the canonicalization patterns and pattern-rewriter are always legal regardless of the dialect restriction on a Region. Similarly the use of a walker can’t be as simple in general, so we need to provide convenient helpers for passes to traverse the nested IR and filter out CFG Regions. Although most dialect-specific traversals filter for dialect-specific operations which are known to be exclusively in CFG Region and thus can’t get away with the current walkers.

It is likely that some added complexity will be warranted by the fact that traversals in general must express their intent with respect to regions in general. We plan to keep convenient walker for the common case.



-- 

Mehdi



--
You received this message because you are subscribed to the Google Groups "MLIR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mlir+uns...@tensorflow.org.

Sean Silva

unread,
May 28, 2019, 5:51:03 PM5/28/19
to MLIR
I think it is a bit tricker than that. "assume" does have runtime execution semantics - it is a nop/identity function when it "executes". A std.func op never "executes" (unless we move to a model where a "function declaration" executes in the Python/JavaScript sense, which I don't think we want).


One interesting observation is that std.func seems to be wanting to use std.symbol_name for references between std.func's, rather than returning a Value. But logically it seems that your example non_cfg_region() with unordered basic blocks and yield terminators is similar -- there are a bunch of ops that want to reference each other's return value with no ordering relationship.


Concretely, this:

```
non_cfg_region() ({
  ^b0():
      %val1 = "op"()
      yield      
  ^b1():
      %val2 = "op"()
      yield
  ^b2():
       op(%val2, %val1)
      yield

...
})
```

could become: --->

```
non_cfg_region() ({
  ^b0():
      "op"() {std.symbol_name: "val1"}
      yield      
  ^b1():
      "op"() {std.symbol_name: "val2"}
      yield
  ^b2():
      %val1 = std.get_value() { val: #std.ref<"val1"> }
      %val2 = std.get_value() { val: #std.ref<"val2"> }
       op(%val2, %val1)
      yield

...
})
```
Obviously that's much uglier, but semantically it seems to be representing the same thing as is needed by "function as operation" ?

The only difference motivating std.symbol_name seems to be is multithreading the pass manager / not maintaining use lists across functions, not an essential semantic difference in what the IR is representing. That seems like a smell to me. Thoughts?

 
 
The functions-as-ops proposal seems to generalize this to ops being "declarative" rather than representing something with runtime execution semantics. That seems like a major shift. Is there precendent for that? What are the ramifications of "declarative ops" and "declarative regions"?

I don't think this proposal is changing anything here: non-CFG region seems orthogonal to have "declarative operations". 


As this RFC is proposing a strict generalization of what we have, it's trivially true that it doesn't change anything here. So I don't think that is a valuable criterion by which to judge the RFC (any generalization would trivially satisfy that property). I think the point that needs more elaboration is what constraints still remain in place and why.

For example, is the following op / non-cfg region permitted? here, "my_dialect.change_op_meaning" encloses a region where the "add" op should be interpreted as having the semantics of the "sub" op:

```
"my_dialect.change_op_meaning"() { old_op: "std.add", new_op: "std.sub", std.noncfg } : () -> () {
^foo
  %val = "std.add"(%0, %1) : (f32, f32) -> f32
  print(%val) // Should print the numerical value `%0 - %1`, not %0 + %1`
}
```

Currently, our IR has both structural restrictions (e.g. a block holds a list of ops) and semantic restrictions (e.g. SSA, CFG). The way I'm interpreting this proposal is that by putting {std.noncfg} on the op/region, one opts out of all (?) the semantic restrictions. The RFC says that these become "dialect defined" but doesn't seem to bound or define what that means.

Partially, I think that our current situation is not great, since if one looks at LangRef now it seems to rely a lot on one having an intuitive idea of what an SSA CFG IR is. I would like to understand more what a "dialect-defined" definition for our current SSA CFG semantics would look like.


Sorry for the unorganized thoughts on this. I'm trying to dig into some gut feelings about this proposal (and the "functions as operations" one, which I think is intimately related) and so I don't have a fully coherent statement on it yet.


-- Sean Silva


For instance we should be able to move forward with the current proposal to turn module and function to be operations without the non-CFG regions, and at this point function operation inside a module would be declarative.
Other things that will be possible (independently of the non-CFG region) with the "function as operations" proposal is the ability to have declarative operations for module-level metadata.

Best,

-- 
Mehdi

 


-- Sean Silva
 

The CFG region specification adds the existing constraints on block: “In a CFG Region, a block is a sequential list of operations without control flow (calls are not considered control flow for this purpose) that are executed from top to bottom. The last operation in a block is a terminator operation, which ends the block.”

The successor-list part of an operation is renamed to block-operands, the semantic is specific to the operation.

We expect other changes, the list above is non-exhaustive.

Practical Aspects (Walkers, Traversing the IR, etc.)

The potential restrictions on Region (which includes any attribute independently of the CFG aspects, like a SIMT GPU kernel for instance) affect many pieces of the infrastructure: for example it is no longer clear if the canonicalization patterns and pattern-rewriter are always legal regardless of the dialect restriction on a Region. Similarly the use of a walker can’t be as simple in general, so we need to provide convenient helpers for passes to traverse the nested IR and filter out CFG Regions. Although most dialect-specific traversals filter for dialect-specific operations which are known to be exclusively in CFG Region and thus can’t get away with the current walkers.

It is likely that some added complexity will be warranted by the fact that traversals in general must express their intent with respect to regions in general. We plan to keep convenient walker for the common case.



-- 

Mehdi



--
You received this message because you are subscribed to the Google Groups "MLIR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ml...@tensorflow.org.

James McCartney

unread,
May 28, 2019, 6:47:08 PM5/28/19
to Sean Silva, MLIR
On Tue, May 28, 2019 at 1:29 PM 'Sean Silva' via MLIR <ml...@tensorflow.org> wrote:


On Thursday, May 16, 2019 at 5:45:53 PM UTC-7, Mehdi AMINI wrote:
Specification Changes

LangRef will be updated such that Regions are defined as “a list of MLIR Blocks,'' as opposed to “a CFG of MLIR Blocks”. A new section is introduced to specify CFG regions as a special case of region, defining the properties of CFG regions.

Blocks are redefined to be: “a list of operations without any defined execution order.”


Ops have always represented some sort of "execution". The functions-as-ops proposal seems to generalize this to ops being "declarative" rather than representing something with runtime execution semantics. That seems like a major shift. Is there precendent for that? What are the ramifications of "declarative ops" and "declarative regions"?


-- Sean Silva
 



de-lurking..

I have been looking at MLIR as a way to express operations on multi-rate, multi-dimensional audio signal processing data flow graphs. It would, for example, be convenient to have an operation such as a 2:1 down sampler. This would mean that operations upstream would execute twice as often as those downstream. These different rates would have to be separated when lowering to the next level. The advantage to not separating them early is that the multi-rate graph more closely expresses the intent of the signal flow, and type and rate inference can be done on that graph more conveniently.


-- 
--- james mccartney

Mehdi AMINI

unread,
May 28, 2019, 9:37:22 PM5/28/19
to Sean Silva, MLIR
Returning values instead of using strings was actually our original design (I think it is still mentioned in the alternatives at the end). Multithreading was one motivations that lead us to look for explicit captures for these values, but the effect was that every function calls would have been akin to an "indirect call" instead. We couldn't find an adequate design for this.

You are correct that we could model everything with string reference to side-step the SSA value restriction, but that isn't friendly to work with. It seemed OK to us for global symbol references to use strings, as there is little coupling in general: most passes are intra-procedural and don't need to introspect a callee. On the other hand using string reference inside a function would make transformations much harder and outside of the value traversal infrastructure (you're losing replaceAllUsesWith, etc.).

 

 
 
The functions-as-ops proposal seems to generalize this to ops being "declarative" rather than representing something with runtime execution semantics. That seems like a major shift. Is there precendent for that? What are the ramifications of "declarative ops" and "declarative regions"?

I don't think this proposal is changing anything here: non-CFG region seems orthogonal to have "declarative operations". 


As this RFC is proposing a strict generalization of what we have, it's trivially true that it doesn't change anything here. So I don't think that is a valuable criterion by which to judge the RFC (any generalization would trivially satisfy that property). I think the point that needs more elaboration is what constraints still remain in place and why.

For example, is the following op / non-cfg region permitted? here, "my_dialect.change_op_meaning" encloses a region where the "add" op should be interpreted as having the semantics of the "sub" op:

```
"my_dialect.change_op_meaning"() { old_op: "std.add", new_op: "std.sub", std.noncfg } : () -> () {
^foo
  %val = "std.add"(%0, %1) : (f32, f32) -> f32
  print(%val) // Should print the numerical value `%0 - %1`, not %0 + %1`
}
```

Currently, our IR has both structural restrictions (e.g. a block holds a list of ops) and semantic restrictions (e.g. SSA, CFG). The way I'm interpreting this proposal is that by putting {std.noncfg} on the op/region, one opts out of all (?) the semantic restrictions. The RFC says that these become "dialect defined" but doesn't seem to bound or define what that means.


The proposal is really about the execution model (the control/scheduling) and not about the individual op semantics.
 If this proposal were to be pursued, LangRef would be made clear that about this.
 

Partially, I think that our current situation is not great, since if one looks at LangRef now it seems to rely a lot on one having an intuitive idea of what an SSA CFG IR is. I would like to understand more what a "dialect-defined" definition for our current SSA CFG semantics would look like.


Sorry for the unorganized thoughts on this. I'm trying to dig into some gut feelings about this proposal (and the "functions as operations" one, which I think is intimately related) and so I don't have a fully coherent statement on it yet.

"Function as operations" use to rely on this proposal in an early design, but not anymore. We can move forward with it without the current one I believe.
I think the current proposal (non-CFG regions) can be put on back-burner until we get more motivating use-cases.

-- 
Mehdi

 
For instance we should be able to move forward with the current proposal to turn module and function to be operations without the non-CFG regions, and at this point function operation inside a module would be declarative.
Other things that will be possible (independently of the non-CFG region) with the "function as operations" proposal is the ability to have declarative operations for module-level metadata.

Best,

-- 
Mehdi

 


-- Sean Silva
 

The CFG region specification adds the existing constraints on block: “In a CFG Region, a block is a sequential list of operations without control flow (calls are not considered control flow for this purpose) that are executed from top to bottom. The last operation in a block is a terminator operation, which ends the block.”

The successor-list part of an operation is renamed to block-operands, the semantic is specific to the operation.

We expect other changes, the list above is non-exhaustive.

Practical Aspects (Walkers, Traversing the IR, etc.)

The potential restrictions on Region (which includes any attribute independently of the CFG aspects, like a SIMT GPU kernel for instance) affect many pieces of the infrastructure: for example it is no longer clear if the canonicalization patterns and pattern-rewriter are always legal regardless of the dialect restriction on a Region. Similarly the use of a walker can’t be as simple in general, so we need to provide convenient helpers for passes to traverse the nested IR and filter out CFG Regions. Although most dialect-specific traversals filter for dialect-specific operations which are known to be exclusively in CFG Region and thus can’t get away with the current walkers.

It is likely that some added complexity will be warranted by the fact that traversals in general must express their intent with respect to regions in general. We plan to keep convenient walker for the common case.



-- 

Mehdi



--
You received this message because you are subscribed to the Google Groups "MLIR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ml...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/mlir/15a29729-e903-4e4d-a3eb-a70014af2e45%40tensorflow.org.

--
You received this message because you are subscribed to the Google Groups "MLIR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mlir+uns...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/mlir/0301b7af-bcf3-414a-9452-1ed56c8d156f%40tensorflow.org.
Reply all
Reply to author
Forward
0 new messages