[RFC] Proposal to add layout attribute to MLIR Tensor Type.

824 views
Skip to first unread message

Andy Davis

unread,
Jun 18, 2019, 4:21:49 PM6/18/19
to MLIR
MLIR Tensor Layout Proposal

Goal
Propose an extensible way to specify the physical memory layout for the MLIR Tensor type.

Motivation
Compiler systems like XLA and MLIR include a tensor type and progressively lower a computation (typically a data flow graph with operations on tensor types) towards a target backend. At some point in this sequence of progressive lowering steps, the tensor type is assigned a layout to support the following use cases (this is not an exhaustive list):

1. XLA host/TPU-device layout agreement
TPU/accelerator devices and their attached host need to agree on the layout for data that is transferred between the host and the device. One might ask why not use operations to specify layout and padding transformations before transfer? When compiling a module for TPU, it is not always possible to get access to the host side graph to see the sequence of ops which specified the layout. In addition, subsequent transformations on the host side graph may change (or make it hard to infer) the layout. Instead, it is preferred that layout is specified as part of the tensor type, which are the data types passed across these host/device boundaries.

2. XLA Layout assignment
XLA performs a global layout assignment pass on a given XLA computation. The purpose of this pass is to assign the best performing layouts to an XLA computation, but also to ensure that any layout constraints on operations (or backend kernel implementations of XLA) are compatible with the assigned layout.
XLA CPU adds layout constraints on the matmul operation to satisfy the eigen library’s preferred gemm kernel layout.
XLA GPU layout assignment assigns layouts to layout constraints for operands and results of library (CUDNN) calls.

3. Logical dims/physical layout separation
XLA tensors specify the logical dimension numbers separately from the permutation of these numbers (from fastest varying to slowest varying) which represents the physical data layout. This enables optimizations like eliminating unnecessary transposes which operate on the logical dimensions, as well as enabling layout changes by simply updating the layout permutation of the associated tensor.

Intel MKL Custom Layout
Intel MKL kernels have a custom internal layout that needs to be passed between their custom TensorFlow kernels. A TF graph transformation pass enforces this invariant adding layout transformation operations where needed.

Proposal
1. Add an optional single attribute-value to the MLIR tensor type for specifying the layout. This layout attribute can be one the the predefined attribute values (e.g. a string attribute can be used by MKL to specify layout). If the layout attribute is unspecified it means that tensor has the default layout. The default layout and list of allowable layout attribute values are defined by the tensor type (e.g. string and xla-tiled-layout).
2. Add a new “TiledLayoutAttr” attribute value to MLIR which represents XLA’s tiled layout. One change from XLA’s tiled layout will be to list the layout (which is a permutation of logical dimensions) from slowest-varying to fastest-varying dimensions (this ordering is more intuitive to users used to thinking in row-major layouts). Specify the memory space in which the tensor resides as part of  TiledLayoutAttr.The memory space of a Tensor is specified by a target-specific integer index. If no memory space is specified, then the default memory space (0) is used. The default space is target specific but always at index 0.

Tensor Type Syntax:

tensor-type ::= `tensor` `<` dimension-list tensor-memref-element-type (`,` attribute-value)? `>`

TiledLayoutAttr Syntax:
Layout permutation: {0, 1}
Tile specification: (128), (2, 1)
Memory space: 2

Example: tiled_layout<{0, 1}, (128), (2,1), 2>

Examples:

// XLA tiled layout attribute as new attribute value.
tensor<1024x10240xf16, tiled_layout<{0, 1}, (128), (2,1), 2>>

// Custom Intel MKL layout
tensor<256x128xf32, “intel-mkl”>

// GPU layouts
tensor<?x?x?x?xf32, “NCHW”>

Future Directions
Use tensor layout attribute to express alternative dense layouts, as well as sparse layouts.

Jun Qi

unread,
Jun 18, 2019, 11:13:14 PM6/18/19
to MLIR
I have one question here, what's the difference between memory layout in Tensor and layout map in memref?  It seems that these two concepts describe the same thing.

在 2019年6月19日星期三 UTC+8上午4:21:49,Andy Davis写道:

Volodymyr Arbatov

unread,
Jun 19, 2019, 1:00:54 AM6/19/19
to Jun Qi, andy...@google.com, MLIR
Is this attribute converted into index mapping when tensor is lowered into memref?

We use various memory layouts and have a similar concept of layout format on tensor shape descriptor. This attribute is carried all the way down to the codegen stage and helps to simplify pattern matching and lowering. Eventually it's used in setting up DMA and parameters of custom hardware.

Strings, like "WHD" or "DWH", can describe simple formats. Other are more complex and also parametrized.

In our compiler index mapping functions that describe data access (from tiling and other transformations) are composed with index mapping that represents layout format only when scalar, reference, code is generated.

Looking from this perspective I was wondering, is recovering of layout format from memref index mapping function easy? Is this attribute translated into index mapping function, that describes layout, and is it listed in composition of mappings in memref? If so, do you see a need to recover layout format from index mapping or is it a custom lowering should take care of retaining and propagating such details?

Thank you,
Volodymyr.


--
You received this message because you are subscribed to the Google Groups "MLIR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mlir+uns...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/mlir/8095c9ee-2878-4402-97aa-d5d7bbf40a43%40tensorflow.org.

Andy Davis

unread,
Jun 19, 2019, 10:14:00 AM6/19/19
to Jun Qi, MLIR
We are also working on changing the layout map in memref.  We will provide more details soon, but the short story is that we will remove AffineMap from the memref, and either have a more general way to specify layout in memref, or make memref more of a flat buffer and use a "view" instruction to specify layout. Stay tuned...

Andy Davis

unread,
Jun 19, 2019, 10:23:04 AM6/19/19
to Volodymyr Arbatov, Jun Qi, MLIR
The tensor layout attribute can be converted to memrefs layout descriptor during the conversion. Note that, as I mentioned earlier in this thread, we will also be proposing a change to memref layout descriptor (layout map), stay tuned.

Note that with the tensor layout proposal, we are adding an optional layout attribute to the tensor type, so you are free to use your own layout descriptor...

Sana Damani

unread,
Jun 19, 2019, 2:18:32 PM6/19/19
to MLIR
Will these tensor layouts have to be predefined or will there be dialect-specific custom layouts (as with types)?

Andy Davis

unread,
Jun 19, 2019, 3:39:18 PM6/19/19
to Sana Damani, MLIR
The tensor layout attribute does not have to be builtin. Dialect-specific attributes for layout can be used,.

--
You received this message because you are subscribed to the Google Groups "MLIR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mlir+uns...@tensorflow.org.

Sana Damani

unread,
Jun 19, 2019, 4:38:31 PM6/19/19
to MLIR
Thank you for your response Andy. Can the number of attributes for a type also be customized for a dialect? For instance, if MKL requires information such as "isMklTensor", the tensor may need additional attributes. Or would you have to define a new custom type entirely?

Sana
To unsubscribe from this group and stop receiving emails from it, send an email to ml...@tensorflow.org.

Andy Davis

unread,
Jun 19, 2019, 5:40:08 PM6/19/19
to Sana Damani, MLIR
We have support for array and dictionary attributes, which can contain other attributes:


To unsubscribe from this group and stop receiving emails from it, send an email to mlir+uns...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/mlir/f58974af-d2b3-4bd7-81cb-77761f56dae9%40tensorflow.org.

Sana Damani

unread,
Jun 19, 2019, 5:53:41 PM6/19/19
to MLIR
In your proposal, what kind of attribute do you expect TiledLayoutAttr to be? I see that it can be either of custom tiled_layout type or a string. Can this also be extended to be of type array if required by the dialect?

Andy Davis

unread,
Jun 19, 2019, 6:07:54 PM6/19/19
to Sana Damani, MLIR
In the proposal, the MLIR tensor type will take a "generic" optional layout attribute.  You can specify a simple string attribute if you like.  Alternatively, if you would like to use the TIledLayoutAttr, you could specify that instead.  TIledLayoutAttr itself is a specialization.

To unsubscribe from this group and stop receiving emails from it, send an email to mlir+uns...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/mlir/cc6a4a59-6bc1-4cfa-ad0a-dfbdcacafea8%40tensorflow.org.

Mehdi AMINI

unread,
Jul 18, 2019, 3:33:21 PM7/18/19
to Andy Davis, MLIR
On Tue, Jun 18, 2019 at 1:21 PM 'Andy Davis' via MLIR <ml...@tensorflow.org> wrote:
MLIR Tensor Layout Proposal

Goal
Propose an extensible way to specify the physical memory layout for the MLIR Tensor type.

Motivation
Compiler systems like XLA and MLIR include a tensor type and progressively lower a computation (typically a data flow graph with operations on tensor types) towards a target backend. At some point in this sequence of progressive lowering steps, the tensor type is assigned a layout to support the following use cases (this is not an exhaustive list):

1. XLA host/TPU-device layout agreement
TPU/accelerator devices and their attached host need to agree on the layout for data that is transferred between the host and the device. One might ask why not use operations to specify layout and padding transformations before transfer? When compiling a module for TPU, it is not always possible to get access to the host side graph to see the sequence of ops which specified the layout. In addition, subsequent transformations on the host side graph may change (or make it hard to infer) the layout. Instead, it is preferred that layout is specified as part of the tensor type, which are the data types passed across these host/device boundaries.

2. XLA Layout assignment
XLA performs a global layout assignment pass on a given XLA computation. The purpose of this pass is to assign the best performing layouts to an XLA computation, but also to ensure that any layout constraints on operations (or backend kernel implementations of XLA) are compatible with the assigned layout.
XLA CPU adds layout constraints on the matmul operation to satisfy the eigen library’s preferred gemm kernel layout.
XLA GPU layout assignment assigns layouts to layout constraints for operands and results of library (CUDNN) calls.

3. Logical dims/physical layout separation
XLA tensors specify the logical dimension numbers separately from the permutation of these numbers (from fastest varying to slowest varying) which represents the physical data layout. This enables optimizations like eliminating unnecessary transposes which operate on the logical dimensions, as well as enabling layout changes by simply updating the layout permutation of the associated tensor.

Intel MKL Custom Layout
Intel MKL kernels have a custom internal layout that needs to be passed between their custom TensorFlow kernels. A TF graph transformation pass enforces this invariant adding layout transformation operations where needed.

Proposal

Seems like really two proposals here, the first one enabling the second but it can be evaluated on its own I believe.
 
1. Add an optional single attribute-value to the MLIR tensor type for specifying the layout. This layout attribute can be one the the predefined attribute values (e.g. a string attribute can be used by MKL to specify layout). If the layout attribute is unspecified it means that tensor has the default layout. The default layout and list of allowable layout attribute values are defined by the tensor type (e.g. string and xla-tiled-layout).

It isn't clear if the layout is allowed to introduce aliasing, it may not matter for the tensor type because of its immutability but I assuming we'll want the design to work with memref (or buffers, or ...) in the future?
Basically: is a valid layout always injective?
 
2. Add a new “TiledLayoutAttr” attribute value to MLIR which represents XLA’s tiled layout. One change from XLA’s tiled layout will be to list the layout (which is a permutation of logical dimensions) from slowest-varying to fastest-varying dimensions (this ordering is more intuitive to users used to thinking in row-major layouts). Specify the memory space in which the tensor resides as part of  TiledLayoutAttr.The memory space of a Tensor is specified by a target-specific integer index. If no memory space is specified, then the default memory space (0) is used. The default space is target specific but always at index 0.

I am confused why are we conflating the tiling with the memory space actually: these seem like orthogonal constraint to apply?

Also, can we use a string attribute for memory spaces? It seems much future proof in terms of avoiding collision between heterogeneous components in the system. LLVM is a bit limited by this.

-- 
Mehdi

 

Tensor Type Syntax:

tensor-type ::= `tensor` `<` dimension-list tensor-memref-element-type (`,` attribute-value)? `>`

TiledLayoutAttr Syntax:
Layout permutation: {0, 1}
Tile specification: (128), (2, 1)
Memory space: 2

Example: tiled_layout<{0, 1}, (128), (2,1), 2>

Examples:

// XLA tiled layout attribute as new attribute value.
tensor<1024x10240xf16, tiled_layout<{0, 1}, (128), (2,1), 2>>

// Custom Intel MKL layout
tensor<256x128xf32, “intel-mkl”>

// GPU layouts
tensor<?x?x?x?xf32, “NCHW”>

Future Directions
Use tensor layout attribute to express alternative dense layouts, as well as sparse layouts.

--
You received this message because you are subscribed to the Google Groups "MLIR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mlir+uns...@tensorflow.org.

Sana Damani

unread,
Jul 19, 2019, 1:02:28 PM7/19/19
to MLIR
Hi Andy,

Where can I find your slides from yesterday's discussion on tiled layouts?

Sana


On Tuesday, June 18, 2019 at 1:21:49 PM UTC-7, Andy Davis wrote:

Andy Davis

unread,
Jul 19, 2019, 1:10:46 PM7/19/19
to Mehdi AMINI, MLIR
inline...

On Thu, Jul 18, 2019 at 12:33 PM Mehdi AMINI <joke...@gmail.com> wrote:


On Tue, Jun 18, 2019 at 1:21 PM 'Andy Davis' via MLIR <ml...@tensorflow.org> wrote:
MLIR Tensor Layout Proposal

Goal
Propose an extensible way to specify the physical memory layout for the MLIR Tensor type.

Motivation
Compiler systems like XLA and MLIR include a tensor type and progressively lower a computation (typically a data flow graph with operations on tensor types) towards a target backend. At some point in this sequence of progressive lowering steps, the tensor type is assigned a layout to support the following use cases (this is not an exhaustive list):

1. XLA host/TPU-device layout agreement
TPU/accelerator devices and their attached host need to agree on the layout for data that is transferred between the host and the device. One might ask why not use operations to specify layout and padding transformations before transfer? When compiling a module for TPU, it is not always possible to get access to the host side graph to see the sequence of ops which specified the layout. In addition, subsequent transformations on the host side graph may change (or make it hard to infer) the layout. Instead, it is preferred that layout is specified as part of the tensor type, which are the data types passed across these host/device boundaries.

2. XLA Layout assignment
XLA performs a global layout assignment pass on a given XLA computation. The purpose of this pass is to assign the best performing layouts to an XLA computation, but also to ensure that any layout constraints on operations (or backend kernel implementations of XLA) are compatible with the assigned layout.
XLA CPU adds layout constraints on the matmul operation to satisfy the eigen library’s preferred gemm kernel layout.
XLA GPU layout assignment assigns layouts to layout constraints for operands and results of library (CUDNN) calls.

3. Logical dims/physical layout separation
XLA tensors specify the logical dimension numbers separately from the permutation of these numbers (from fastest varying to slowest varying) which represents the physical data layout. This enables optimizations like eliminating unnecessary transposes which operate on the logical dimensions, as well as enabling layout changes by simply updating the layout permutation of the associated tensor.

Intel MKL Custom Layout
Intel MKL kernels have a custom internal layout that needs to be passed between their custom TensorFlow kernels. A TF graph transformation pass enforces this invariant adding layout transformation operations where needed.

Proposal

Seems like really two proposals here, the first one enabling the second but it can be evaluated on its own I believe.
Agreed. But we'd like to have one concrete layout attribute to move forward with as part of this initial work. 
 
1. Add an optional single attribute-value to the MLIR tensor type for specifying the layout. This layout attribute can be one the the predefined attribute values (e.g. a string attribute can be used by MKL to specify layout). If the layout attribute is unspecified it means that tensor has the default layout. The default layout and list of allowable layout attribute values are defined by the tensor type (e.g. string and xla-tiled-layout).

It isn't clear if the layout is allowed to introduce aliasing, it may not matter for the tensor type because of its immutability but I assuming we'll want the design to work with memref (or buffers, or ...) in the future?
Basically: is a valid layout always injective?
 

With tensor layout no. With memrefs/buffers, its under discussion. w.r.t. memref/buffers, I think there were "clipping" cases that Nicolas was interested in, where out of bound accesses would map back and access the same element. 
 
2. Add a new “TiledLayoutAttr” attribute value to MLIR which represents XLA’s tiled layout. One change from XLA’s tiled layout will be to list the layout (which is a permutation of logical dimensions) from slowest-varying to fastest-varying dimensions (this ordering is more intuitive to users used to thinking in row-major layouts). Specify the memory space in which the tensor resides as part of  TiledLayoutAttr.The memory space of a Tensor is specified by a target-specific integer index. If no memory space is specified, then the default memory space (0) is used. The default space is target specific but always at index 0.

I am confused why are we conflating the tiling with the memory space actually: these seem like orthogonal constraint to apply?

Also, can we use a string attribute for memory spaces? It seems much future proof in terms of avoiding collision between heterogeneous components in the system. LLVM is a bit limited by this.


I agree on this point. I think there was pushback on "adding another thing" to the tensor type, and so I think that adding memory space as its own attribute would be another propsal/design.  We added it as part of the tiled layout attribute here, because XLA has already added it to their layout type, so it made it convenient to do the same in this case, and postpone the larger discussion of tensor having its own memory space attribute till a later time.

Mehdi AMINI

unread,
Jul 19, 2019, 1:14:01 PM7/19/19
to Andy Davis, MLIR
If the tiled layout attributes goes in the XLA dialect that can make sense, in the core MLIR repo I am not convinced by the XLA justification to drive adding this on a generic Tiled Layout attribute.

Andy Davis

unread,
Jul 19, 2019, 1:19:35 PM7/19/19
to Mehdi AMINI, MLIR
That seems fair to me, but we'd like to use that layout in other places (i.e. lower level XLA dialects). Would putting that layout in the XLA dialect restrict its uses in other places?

River Riddle

unread,
Jul 19, 2019, 1:23:57 PM7/19/19
to MLIR


On Friday, July 19, 2019 at 10:19:35 AM UTC-7, Andy Davis wrote:
That seems fair to me, but we'd like to use that layout in other places (i.e. lower level XLA dialects). Would putting that layout in the XLA dialect restrict its uses in other places?

Nothing restricts dialects from using types/attributes of another, it is just a dependency question(i.e. if you want to depend on things from another dialect). For this it seems perfectly fine for lower-level XLA dialects to rely on other XLA dialects. As an example, the TFLite dialect currently reuses some of the same types from the TF dialect.

-- River
 

To unsubscribe from this group and stop receiving emails from it, send an email to ml...@tensorflow.org.

Bruestle, Jeremy

unread,
Jul 19, 2019, 1:29:28 PM7/19/19
to Andy Davis, Mehdi AMINI, MLIR

I’m currently working on a dialect to replace PlaidML’s ‘Stripe’ IR.  We’ve currently defined our own tensor datatype since the existing tensor datatype doesn’t have support for controlling layout, which we fundamentally need.  The ‘TiledLayoutAttr’ as is slightly different  to the approach we’ve taken, but I think it’s sufficiently general that we could replace our mechanism with TiledLayoutAttr which would allow us to use the standard tensor type.  To me, layouts are fairly core to any concrete implementation of tensors in real machines, and tiling is used in multiple implementations, so it doesn’t feel like an XLA specific thing.  We currently only have a dependency on the main MLIR repo, and not any of the TF specific stuff, so from my perspective putting the TiledLayoutAttr into the core types would be a big win.

 

-Jeremy

Mehdi AMINI

unread,
Jul 19, 2019, 1:34:24 PM7/19/19
to Bruestle, Jeremy, Andy Davis, MLIR
To be clear: I am fully on-board with such a facility in MLIR, and it should be done in a generic enough way that it could model what PlaidML, XLA, and others expect from tiling.
I was mostly pushing back on having design quirks in an MLIR generic attribute just for the sake of matching exactly what XLA is using.

-- 
Mehdi

Andy Davis

unread,
Jul 19, 2019, 1:45:37 PM7/19/19
to Mehdi AMINI, Bruestle, Jeremy, MLIR
I think we could address Mehdi and Jeremy's concerns by:

1) Keeping the TiledLayoutAttr in the core
2) Moving the memory space field into its own attribute on the Tensor type (as is done with memref).

I am OK with this. Do other people have concerns with extending the Tensor type with one more attribute to accommodate a separate memory space attr?

Bruestle, Jeremy

unread,
Jul 19, 2019, 2:09:07 PM7/19/19
to Andy Davis, Mehdi AMINI, MLIR

That sounds great to me.  I think memory space and layout are independent concerns, but both seem relevant to enough use cases (and in fact PlaidML will use both) to add then to Tensor.

 

-Jeremy

Sana Damani

unread,
Jul 19, 2019, 5:26:46 PM7/19/19
to MLIR
Tile specification: (128), (2, 1)
From what I understand, the (2,1) specifies the dimensions of the tile. What does the 128 refer to? I do not see this in the XLA tiled layout documentation. Does the 128 specify the size of the block that is stored contiguously in memory?

Sana

On Tuesday, June 18, 2019 at 1:21:49 PM UTC-7, Andy Davis wrote:

Mamy Ratsimbazafy

unread,
Jul 21, 2019, 8:41:45 AM7/21/19
to MLIR
1) Beyond XLA tiled layout, there is the MKL-DNN tiled layout described in this webpage and paper that would be nice to support.
Most libraries use a custom "MKL-DNN" tensor type at the moment and have difficulties integrating it with classic functions, see Pytorch thread.

2) Also, Morton layout might be useful as well.

There are a couple of research articles on Morton layout, the most recent being


3) In practical use, Morton layout is the first transformation done in BLAS for both CPU and GPU GEMM implementations.

For reference, you can read on CPU:
  - the pack_A and pack_B, in this high-performance matrix multiplication course
  - my packing implementation in Nim (it reaches similar speed as OpenBLAS and MKL-DNN GEMM implementation without assembly)
  - Halide BLAS swizzling (also reaches OpenBLAS speed without assembly)

And on GPU:
  - NVIDIA's CUTLASS swizzling. Cutlass reaches CuBLAS speed except on Tensor Cores without assembly.

While inside BLAS implementation, the repacking/swizzling in Morton layout could be thought as an implementation detail,
it actually became useful to expose for repeated matrix multiplication with an invariant input to amortize the cost of packing.
One immediate use-case of prepacking is for convolution via im2col + GEMM to avoid the overhead of repeatedly packing the convolution filter.

Unfortunately, while implementing packed gemm, the packing is machine-specific:
  - it depends on the SIMD extension supported
  - it depends on the number of cores as we want each cores to work on a different panel of input matrice, see BLIS parallelism explanation.
  - it depends on tile size, which maybe be adjusted at runtime to fit in L1 caches, L2 caches or avoid TLB misses.
Link to my full implementation of GEMM with prepacking support, I'm not aware of alternative open-source implementation.

In summary:

1) Beyond TPU support, if supporting MLIR tiled layout meant supporting MKL-DNN, it would have a lot of value for deep learning frameworks.
2) Tensors in Morton layout (and other space-filling curve based layout) has been explored in academic research. It's probably too experimental for MLIR at the moment though.
3) In practice, Morton Layout is used as an optimisation for temporary buffers which depends on several runtime parameters and should probably seen as an opaque type for such cases.

--
Mamy

Uday Bondhugula

unread,
Jul 21, 2019, 11:54:23 AM7/21/19
to MLIR


On Sunday, July 21, 2019 at 6:11:45 PM UTC+5:30, Mamy Ratsimbazafy wrote:
1) Beyond XLA tiled layout, there is the MKL-DNN tiled layout described in this webpage and paper that would be nice to support.
Most libraries use a custom "MKL-DNN" tensor type at the moment and have difficulties integrating it with classic functions, see Pytorch thread.

2) Also, Morton layout might be useful as well.

There are a couple of research articles on Morton layout, the most recent being

A Morton layout is conceptually a block recursive layout with block size set to half of the extents at each level. As such the current XLA tiled layout attr proposal will not be able to represent such a layout for dynamically shaped tensors (since the tile sizes needed here will also be unknown at compile time) - AFAIU (@AndyDavis, let me know if this isn't the case).

On another note, the current memref's layout (affine map) is already able to represent such tiled layouts with dynamic / compile-time unknown sizes (tile sizes encoded in the map can bind to SSA values).  Here's a 1-level Morton layout with a memref:

// Creates a 2-d dynamically shaped memref of size N x N with a tiled layout of N/2 x N/2. 
%B = affine.apply (d0) -> (d0 floordiv 2) (%N)
%M = alloc(%N, %N) [%B]  : memref <? x ?, (d0, d1) [B] -> (d0 floordiv B, d0 mod B, d1 floordiv B, d1 mod B) >

I'm in general a bit concerned about introducing new layout mechanisms that have their own rigid "string" syntax that isn't easily extensible the moment you want something outside of it - one'll wind up handling a pot pourri of layout representations (even two or three may be painful to look at by various utilities), and all of this when there are already existing abstractions in MLIR that support a much larger superset, albeit in a lower level form that is closer to what is needed when calculating effective load/store addresses.

Jun Qi

unread,
Jul 22, 2019, 2:17:26 AM7/22/19
to MLIR
Agreed. MKL-DNN uses an enum type to represent layout. This "string" syntax will introduce many concrete layouts values. Please refer to the doc of MKL-DNN

Here are some layout examples from MKL-DNN:


mkldnn_a 

plain 1D tensor

mkldnn_ab 

plain 2D tensor

mkldnn_abc 

plain 3D tensor

mkldnn_abcd 

plain 4D tensor

mkldnn_abcde 

plain 5D tensor

mkldnn_abcdef 

plain 6D tensor

mkldnn_abdec 

permuted 5D tensor

mkldnn_acb 

permuted 3D tensor

mkldnn_acbde 

permuted 5D tensor

mkldnn_acdb 

permuted 4D tensor

mkldnn_acdeb 

permuted 5D tensor

mkldnn_ba 

permuted 2D tensor

mkldnn_bac 

permuted 3D tensor

mkldnn_bacd 

permuted 4D tensor

mkldnn_bca 

permuted 3D tensor

mkldnn_bcda 

permuted 4D tensor

mkldnn_bcdea 

permuted 5D tensor

mkldnn_cba 

permuted 3D tensor

mkldnn_cdba 

permuted 4D tensor

mkldnn_cdeba 

permuted 5D tensor

mkldnn_decab 

permuted 5D tensor

mkldnn_aBc16b 

3D tensor blocked by 2nd dimension with block size 16

mkldnn_aBc4b 

3D tensor blocked by 2nd dimension with block size 4

mkldnn_aBc8b 

3D tensor blocked by 2nd dimension with block size 8

mkldnn_aBcd16b 

4D tensor blocked by 2nd dimension with block size 16

mkldnn_aBcd4b 

4D tensor blocked by 2nd dimension with block size 4

mkldnn_aBcd8b 

4D tensor blocked by 2nd dimension with block size 8

mkldnn_ABcd8b8a 

4D tensor blocked by 1st and 2nd dimension with block size 8

mkldnn_aBcde16b 

5D tensor blocked by 2nd dimension with block size 16

mkldnn_aBcde4b 

5D tensor blocked by 2nd dimension with block size 4

mkldnn_aBcde8b 

5D tensor blocked by 2nd dimension with block size 8

mkldnn_aBcdef16b 

6D tensor blocked by 2nd dimension with block size 16

mkldnn_aBcdef4b 

6D tensor blocked by 2nd dimension with block size 4


It's hard to handle these layout values.

在 2019年7月21日星期日 UTC+8下午11:54:23,Uday Bondhugula写道:

Mamy Ratsimbazafy

unread,
Jul 22, 2019, 5:06:49 AM7/22/19
to MLIR
For the layout in actual implementation we can introduce an extra field to describe blocking
For example this is how I would define a Tensor generic over any Type T (written in Nim pseudocode as that's what I'm most familiar with).

type Tensor[T] = object
  rank
: int
  shape
: SmallVector[int]
  strides
: SmallVector[int]
  offset
: int
  blocking
: SmallVector[int]
  data
: ptr T


The strides and offset fields allow for zero-copy slices (and dimension broadcasting by setting strides to 0)
Without blocking, converting to physical index is just "sum(shape[i] * strides[i]) + offset" over i

I haven't figured the conversion scheme with blocking yet but storing as an array should be the most general.

The main issues are:
- For MLIR: it's quite verbose, but we can have a short memref declaration that just indicates the shape 
  and default to row-major strides and no blocking.
- For implementers: the indices bookkeeping to convert from canonical indexing to physical layout becomes quite large
  and would benefit a lot from the compiler optimizing away unit stride and "unit tiles" on accesses.
Reply all
Reply to author
Forward
0 new messages