Roadmap about adding dynamic shape support in MLIR HLO dialect

Jun Yang

unread,

Jan 2, 2020, 10:44:11 AM1/2/20

to MLIR

Hi ,

In this doc

https://github.com/llvm/llvm-project/blob/master/mlir/docs/MLIRForGraphAlgorithms.md

It is mentioned that there is a plan to add dynamic support in XLA/HLO for TF/XLA bridge based on MLIR. Is there any concrete plan or roadmap to be shared? Currently we are also working on to implement some PoC to support dynamic shape for code generation and would like to keep sync with MLIR community.

At present we think there might be two possible options to add dynamic support in XLA/HLO within MLIR:

1. Add another dialect(maybe named as DHLO?) and embed the dynamic shape information into the newly introduced dialect representation. Then leverage the Stencil/Linalg dialect to fulfill the fusion&codegen work flow(we could not directly re-use the existing XLA fusion&codegen infras as HLO dialect).

Pros for this approach is that we believe that the best performance could only be gained with fixed shape. So by keeping the original static-shaped HLO dialect, we leave the high-performance fixed-shape codegen untouched. During the JIT compilation process, if the dynamic shape issue is too severe, we could just switch the IR graph from HLO dialect back to DHLO dialect(this should not be difficult to achieve) and fall back to DHLO dialect for dynamic-shape codegen.

2. Directly extend the HLO dialect to add support for dynamic shape. Regarding to the fusion&codegen part, we could either leverage the Stencil/Linalg dialect or just refer to the existing XLA fusion&codegen implementation to add the corresponding support.

Pros for this approach is that we avoid the cost of introducing another dialect. The cons is that we need to take care to not to lose the performance benefit with static-shaped codegen.

Personally I prefer solution 1, so in the following I just describe further thinkings regarding to Solution 1.

Another thing deserve attention is that for dynamic shape and static shape, I think there should be some optimization to be shared, we may need to do some extra work to provide such support(such as optimization pass supporting both DHLO and HLO dialect, maybe we could just introduce another dialect called XHLOOpt and place the corresponding shape-agnostic optimization pass there)

In summary, there might be several potential execution flow with adding dynamic shape support:

a). DHLO-->XHLOOpt-->Linalg/Stencil--->LLVM

b). HLO--->XHLOOpt--->(HLO dialect optimization)--->Linalg/Stencil--->LLVM

c). HLO---->(round trip to XLA world and directly use XLA existing optimization stuffs)

d). HLO--->(check to see that dynamic shape is too severe to fall back to HLO)--->DHLO--->XHLOOpt--->Linalg/Stencil--->LLVM.

Any comments and suggestions are highly welcome.

Thanks

--

yangj...@gmail.com

Mehdi AMINI

unread,

Jan 2, 2020, 10:52:10 AM1/2/20

to Jun Yang, MLIR

Hi Jun,

Have you seen the work on XLA-related dialects here: https://github.com/tensorflow/tensorflow/tree/master/tensorflow/compiler/mlir/xla/

There are already two dialects prototyped: HLO and LHLO (late-HLO). The latter is using memref instead of tensor to model explicitly allocated buffers.

Both of them support dynamic shapes and the lowering from TensorFlow dialect to HLO dialect are written to support dynamic shapes (see the work going on here: https://github.com/tensorflow/tensorflow/commits/master/tensorflow/compiler/mlir/xla/transforms )

We're missing the HLO->LHLO transforms, but we already have some GPU codegen using LHLO->Linalg (see here https://github.com/tensorflow/tensorflow/blob/master/tensorflow/compiler/mlir/xla/transforms/lhlo_legalize_to_gpu.cc ).

Best,

--

Mehdi

--
You received this message because you are subscribed to the Google Groups "MLIR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mlir+uns...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/mlir/CACgUC_DdiKhfj1nDYQm5KD59%3DTfyei3cqaHJyky3JQ3yYoMZxQ%40mail.gmail.com.

Nicolas Vasilache

unread,

Jan 2, 2020, 11:03:05 AM1/2/20

to Mehdi AMINI, Jun Yang, MLIR

Hello Jun and Mehdi,

To complement what Mehdi wrote I'd like to attract your attention on the following point.

One thing to note is that Linalg is adding first class support for tensors in https://reviews.llvm.org/D72022

This has a number of implications in some of the transformations that are traditionally done at the level of the HLO dialect.

In particular, everything related to trivial fusion of pointwise operators can be done immediately using the region.

This avoids the need for the current, more cumbersome and phase-ordered, flow that does:

1. mark fusion with XLA fusion nodes,

2. allocate buffers for everything

3. convert to Linalg

4. apply fusion in Linalg

5. perform an analysis and remove temporary buffers that have been fused.

Note that step 4. may not necessarily do what one wants at step 1. since we are talking about different systems that are not really designed to talk to each other.

Instead, this can be replaced by:

1. apply fusion of ops using regions

Temporary buffers never get materialized or anything.

This becomes especially handy when implicit of explicit broadcast semantics are involved: some things are trivial to fuse at the level of Linalg on tensors and all the unnecessary intermediate memory is never allocated.

There are many other implications on the type of transforms that become available at this level (hint: look at the TASO compiler) but I only listed the most obvious one.

In my mind the codegen path where things are the most natural is:

User

-> Language / Framework

-> HLO + Linalg on tensors

-> LHLO + Linalg on buffers

(note that buffer allocation in Linalg on tensors -> Linalg on buffers can be very progressive intermixing ops with both tensor and buffers arbitrarily)

-> Affine/StructuredControlFlow (still named Loops atm ..)

-> backends

Different transformations apply at each level.

To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/mlir/CANF-O%3Da1iGa0WtF-A7odU8Gertxq8cKraK4bCbq5ChVfUadk7g%40mail.gmail.com.

--

N

Jun Yang

unread,

Jan 2, 2020, 3:42:05 PM1/2/20

to MLIR

Hi Mehdi,

Thanks for the prompt reply.

Regarding to what you said "Both of them support dynamic shapes and the lowering from TensorFlow dialect to HLO dialect are written to support dynamic shapes"

I am a little bit curious how to understand the meaning of "support dynamic shapes". Since in my observation, currently TF2HLO dialect transformation just behaves the same as the original HLO. Let me take one example by showing the code:

// talk is cheap, show the code----begins

PatternMatchResult matchAndRewrite(TF::Conv2DBackpropInputOp op,

PatternRewriter &rewriter) const override {

// Unpack all of the attributes.

tensorflow::TensorFormat data_format;

if (!FormatFromString(op.data_format().str(), &data_format)) {

return matchFailure();

}

tensorflow::Padding padding;

if (!GetPaddingFromString(op.padding().str(), &padding).ok())

return Pattern::matchFailure();

auto out_backprop_ty =

op.out_backprop()->getType().dyn_cast<RankedTensorType>();

if (!out_backprop_ty || !out_backprop_ty.hasStaticShape())

return matchFailure();

ArrayRef<int64_t> out_backprop_shape = out_backprop_ty.getShape();

auto filter_ty = op.filter()->getType().dyn_cast<RankedTensorType>();

if (!filter_ty || !filter_ty.hasStaticShape()) return matchFailure();

ArrayRef<int64_t> filter_shape = filter_ty.getShape();

int num_spatial_dims = 2;

Location loc = op.getLoc();

int num_dims = num_spatial_dims + 2;

int batch_dim = tensorflow::GetTensorBatchDimIndex(num_dims, data_format);

int feature_dim =

tensorflow::GetTensorFeatureDimIndex(num_dims, data_format);

DenseIntElementsAttr input_shape_attr;

if (!matchPattern(op.input_sizes(), m_Constant(&input_shape_attr)) ||

input_shape_attr.getType().getRank() != 1) {

return matchFailure();

// talk is cheap, show the code----ends

From the above code snippet, it can be seen that when we do conversion from tf dialect to HLO dialect, there is still an inherent static shape constraint(see the bold part). At least for some(or lots of) TF operations.

However, from dialect representation perspective, I also think that HLO dialect might be capable to represent its input/output with dynamic shape rather than just static ones since the input and output is represented as HLO_Tensor alike things:

// talk is cheap, show the code----begins

// Any integer tensor types

def HLO_IntTensor : TensorOf<[HLO_Int]>;

// Any floating-point tensor types

def HLO_FpTensor : TensorOf<[AnyFloat]>;

def HLO_PredTensor : TensorOf<[HLO_Pred]>;

def HLO_Tensor : TensorOf<[AnyFloat, AnyInteger, AnyComplex]>;

def HLO_ComplexTensor : TensorOf<[AnyComplex]>;

def HLO_Tuple : NestedTupleOf<[HLO_Tensor, HLO_Token]>;

def HLO_TensorOrTuple : AnyTypeOf<[HLO_Tensor, HLO_Tuple]>;

// talk is cheap, show the code----ends

So from my point of view, there is an inconsistency here.

Could you please help elaborate a little bit more?

Thanks

在 2020年1月2日星期四 UTC+8下午11:52:10，Mehdi AMINI写道：

To unsubscribe from this group and stop receiving emails from it, send an email to ml...@tensorflow.org.

Jun Yang

unread,

Jan 2, 2020, 3:57:02 PM1/2/20

to MLIR

Hi Hicolas

Nice to see the discussion in the LLVM discussion thread.

It is interesting that we add Tensor support in the Linalg dialect to ease the fusion work.

One more thing I think that deserve further discussion.

As to the codegen path you mentioned:

-> Language / Framework

-> HLO + Linalg on tensors

-> LHLO + Linalg on buffers

(note that buffer allocation in Linalg on tensors -> Linalg on buffers can be very progressive intermixing ops with both tensor and buffers arbitrarily)

-> Affine/StructuredControlFlow (still named Loops atm ..)

-> backends

There is intermix of HLO + Linalg and LHLO + Linalg dialects during the conversion process.

I think one possible reason that we need this intermix is that due to the potential limitation of Linalg, it may not support all the fusion related stuffs at present, so for some sub-graphs we could directly leverage Linalg, while for other left sub-graphs we have to resort to HLO/LHLO's

own optimization support. What is your point of view?

Thanks

Jun

在 2020年1月3日星期五 UTC+8上午12:03:05，Nicolas Vasilache写道：

To unsubscribe from this group and stop receiving emails from it, send an email to ml...@tensorflow.org.

To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/mlir/CACgUC_DdiKhfj1nDYQm5KD59%3DTfyei3cqaHJyky3JQ3yYoMZxQ%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "MLIR" group.

To unsubscribe from this group and stop receiving emails from it, send an email to ml...@tensorflow.org.

To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/mlir/CANF-O%3Da1iGa0WtF-A7odU8Gertxq8cKraK4bCbq5ChVfUadk7g%40mail.gmail.com.

--
N

Nicolas Vasilache

unread,

Jan 2, 2020, 4:04:40 PM1/2/20

to Jun Yang, MLIR

On Thu, Jan 2, 2020 at 3:57 PM Jun Yang <yangj...@gmail.com> wrote:

Hi Hicolas

Nice to see the discussion in the LLVM discussion thread.

It is interesting that we add Tensor support in the Linalg dialect to ease the fusion work.

One more thing I think that deserve further discussion.

As to the codegen path you mentioned:

-> Language / Framework
-> HLO + Linalg on tensors
-> LHLO + Linalg on buffers
(note that buffer allocation in Linalg on tensors -> Linalg on buffers can be very progressive intermixing ops with both tensor and buffers arbitrarily)
-> Affine/StructuredControlFlow (still named Loops atm ..)
-> backends

There is intermix of HLO + Linalg and LHLO + Linalg dialects during the conversion process.

I think one possible reason that we need this intermix is that due to the potential limitation of Linalg, it may not support all the fusion related stuffs at present, so for some sub-graphs we could directly leverage Linalg, while for other left sub-graphs we have to resort to HLO/LHLO's
own optimization support. What is your point of view?

Indeed, this is conservative because HLO is already well established, expressive and has had a lot of engineering support.

On the other hand, it does not have true custom ops or dynamic support and breaks everything in small pieces.

As things progress and we have more data it will become clearer what type of representation / algorithms will be the most useful.

To unsubscribe from this group and stop receiving emails from it, send an email to mlir+uns...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/mlir/cffe416e-2a62-4f85-9b56-5e51653bfbce%40tensorflow.org.

--

N

Jun Yang

unread,

Jan 2, 2020, 5:23:48 PM1/2/20

to Nicolas Vasilache, MLIR

Got it and this does make sense to me.

Actually we keep the same design objective. In our mind, we would like to add dynamic support only for those scenarios which have incontrovertible request. For other scenarios, we will resort

to existing HLO static shape support as much as possible. Also I think this is one of the major benefits brought by MLIR since it makes inter-mix of newly-introduced things and legacy things cooperate in a more coherent way.

Jun

--

yangj...@gmail.com

Jun Yang

unread,

Jan 3, 2020, 10:08:48 PM1/3/20

to Mehdi AMINI, MLIR

Another question regarding to "HLO dialect already support dynamic shape".

In my understanding, even from the representation perspective, HLO dialect still doesn't support full dynamic shape semantics.

Let me give one concrete example:

Here is the table gen definition of HLO Slice instruction:

def HLO_SliceOp: HLO_Op<
"slice",
[NoSideEffect, SameOperandsAndResultElementType,
AllTypesMatch<["start_indices", "limit_indices", "strides"]>]> {
let arguments = (ins
HLO_Tensor:$operand,
I64ElementsAttr:$start_indices,
I64ElementsAttr:$limit_indices,
I64ElementsAttr:$strides
);

let results = (outs HLO_Tensor);

let builders = [OpBuilder<
"Builder *builder, OperationState &result, Value operand, "
"DenseIntElementsAttr start_indices, DenseIntElementsAttr limit_indices, "
"DenseIntElementsAttr strides"
>];

let extraClassDeclaration = [{
// Infers output type for given operand and attributes. Result type is
// unranked if any of the attributes is illegal.
static Type InferOutputTypes(Builder *builder, Value operand,
DenseIntElementsAttr start_indices,
DenseIntElementsAttr limit_indices,
DenseIntElementsAttr strides);
}];
}

And the corresponding TensorFlow Slice definition is as following:

op {
name: "Slice"
input_arg {
name: "input"
type_attr: "T"
}
input_arg {
name: "begin"
type_attr: "Index"
}
input_arg {
name: "size"
type_attr: "Index"
}
output_arg {
name: "output"
type_attr: "T"
}
attr {
name: "T"
type: "type"
}
attr {
name: "Index"
type: "type"
allowed_values {
list {
type: DT_INT32
type: DT_INT64
}
}
}
}

From the above code snippet, it can be seen that for Slice op, HLO dialect has limited representation capability against the TF operation semantics.

Since for begin/size, TF operation specify them as Tensors, while HLO dialect specify them as concrete scalar value. Thus dynamic shape representation capability is lost

to a certain extent.

I haven't gone through all the definitions for HLO dialects, but I suspect that there may be other operations having the same issue.

Correct me if my understanding is wrong.

--

yangj...@gmail.com

Jack John

unread,

Jan 7, 2020, 3:36:57 AM1/7/20

to MLIR

Since for begin/size, TF operation specify them as Tensors, while HLO dialect specify them as concrete scalar value. Thus dynamic shape representation capability is lost

to a certain extent.

TF define begin and size attribute in as int32/int64 list while HLO_SliceOp in mlir define them as I64ElementsAttr for start_indices and limit_indices, I64ElementsAttr in mlir represent a vector or a tensor value

I don't quite understand where you get so called "concrete scale value" , maybe a misunderstanding?