--
You received this message because you are subscribed to the Google Groups "MLIR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mlir+uns...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/mlir/8095c9ee-2878-4402-97aa-d5d7bbf40a43%40tensorflow.org.
--
You received this message because you are subscribed to the Google Groups "MLIR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mlir+uns...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/mlir/95ae3190-3bd0-4e6b-b39c-30e1bd147633%40tensorflow.org.
To unsubscribe from this group and stop receiving emails from it, send an email to ml...@tensorflow.org.
To unsubscribe from this group and stop receiving emails from it, send an email to mlir+uns...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/mlir/f58974af-d2b3-4bd7-81cb-77761f56dae9%40tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/mlir/f58974af-d2b3-4bd7-81cb-77761f56dae9%40tensorflow.org.
To unsubscribe from this group and stop receiving emails from it, send an email to mlir+uns...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/mlir/cc6a4a59-6bc1-4cfa-ad0a-dfbdcacafea8%40tensorflow.org.
MLIR Tensor Layout ProposalGoalPropose an extensible way to specify the physical memory layout for the MLIR Tensor type.MotivationCompiler systems like XLA and MLIR include a tensor type and progressively lower a computation (typically a data flow graph with operations on tensor types) towards a target backend. At some point in this sequence of progressive lowering steps, the tensor type is assigned a layout to support the following use cases (this is not an exhaustive list):1. XLA host/TPU-device layout agreementTPU/accelerator devices and their attached host need to agree on the layout for data that is transferred between the host and the device. One might ask why not use operations to specify layout and padding transformations before transfer? When compiling a module for TPU, it is not always possible to get access to the host side graph to see the sequence of ops which specified the layout. In addition, subsequent transformations on the host side graph may change (or make it hard to infer) the layout. Instead, it is preferred that layout is specified as part of the tensor type, which are the data types passed across these host/device boundaries.2. XLA Layout assignmentXLA performs a global layout assignment pass on a given XLA computation. The purpose of this pass is to assign the best performing layouts to an XLA computation, but also to ensure that any layout constraints on operations (or backend kernel implementations of XLA) are compatible with the assigned layout.XLA CPU adds layout constraints on the matmul operation to satisfy the eigen library’s preferred gemm kernel layout.XLA GPU layout assignment assigns layouts to layout constraints for operands and results of library (CUDNN) calls.3. Logical dims/physical layout separationXLA tensors specify the logical dimension numbers separately from the permutation of these numbers (from fastest varying to slowest varying) which represents the physical data layout. This enables optimizations like eliminating unnecessary transposes which operate on the logical dimensions, as well as enabling layout changes by simply updating the layout permutation of the associated tensor.Intel MKL Custom LayoutIntel MKL kernels have a custom internal layout that needs to be passed between their custom TensorFlow kernels. A TF graph transformation pass enforces this invariant adding layout transformation operations where needed.Proposal
1. Add an optional single attribute-value to the MLIR tensor type for specifying the layout. This layout attribute can be one the the predefined attribute values (e.g. a string attribute can be used by MKL to specify layout). If the layout attribute is unspecified it means that tensor has the default layout. The default layout and list of allowable layout attribute values are defined by the tensor type (e.g. string and xla-tiled-layout).
2. Add a new “TiledLayoutAttr” attribute value to MLIR which represents XLA’s tiled layout. One change from XLA’s tiled layout will be to list the layout (which is a permutation of logical dimensions) from slowest-varying to fastest-varying dimensions (this ordering is more intuitive to users used to thinking in row-major layouts). Specify the memory space in which the tensor resides as part of TiledLayoutAttr.The memory space of a Tensor is specified by a target-specific integer index. If no memory space is specified, then the default memory space (0) is used. The default space is target specific but always at index 0.
Tensor Type Syntax:tensor-type ::= `tensor` `<` dimension-list tensor-memref-element-type (`,` attribute-value)? `>`TiledLayoutAttr Syntax:Layout permutation: {0, 1}Tile specification: (128), (2, 1)Memory space: 2Example: tiled_layout<{0, 1}, (128), (2,1), 2>Examples:// XLA tiled layout attribute as new attribute value.tensor<1024x10240xf16, tiled_layout<{0, 1}, (128), (2,1), 2>>// Custom Intel MKL layouttensor<256x128xf32, “intel-mkl”>// GPU layoutstensor<?x?x?x?xf32, “NCHW”>Future DirectionsUse tensor layout attribute to express alternative dense layouts, as well as sparse layouts.
--
You received this message because you are subscribed to the Google Groups "MLIR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mlir+uns...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/mlir/d233dee6-f786-4615-bfb7-8c7a5385f85c%40tensorflow.org.
On Tue, Jun 18, 2019 at 1:21 PM 'Andy Davis' via MLIR <ml...@tensorflow.org> wrote:MLIR Tensor Layout ProposalGoalPropose an extensible way to specify the physical memory layout for the MLIR Tensor type.MotivationCompiler systems like XLA and MLIR include a tensor type and progressively lower a computation (typically a data flow graph with operations on tensor types) towards a target backend. At some point in this sequence of progressive lowering steps, the tensor type is assigned a layout to support the following use cases (this is not an exhaustive list):1. XLA host/TPU-device layout agreementTPU/accelerator devices and their attached host need to agree on the layout for data that is transferred between the host and the device. One might ask why not use operations to specify layout and padding transformations before transfer? When compiling a module for TPU, it is not always possible to get access to the host side graph to see the sequence of ops which specified the layout. In addition, subsequent transformations on the host side graph may change (or make it hard to infer) the layout. Instead, it is preferred that layout is specified as part of the tensor type, which are the data types passed across these host/device boundaries.2. XLA Layout assignmentXLA performs a global layout assignment pass on a given XLA computation. The purpose of this pass is to assign the best performing layouts to an XLA computation, but also to ensure that any layout constraints on operations (or backend kernel implementations of XLA) are compatible with the assigned layout.XLA CPU adds layout constraints on the matmul operation to satisfy the eigen library’s preferred gemm kernel layout.XLA GPU layout assignment assigns layouts to layout constraints for operands and results of library (CUDNN) calls.3. Logical dims/physical layout separationXLA tensors specify the logical dimension numbers separately from the permutation of these numbers (from fastest varying to slowest varying) which represents the physical data layout. This enables optimizations like eliminating unnecessary transposes which operate on the logical dimensions, as well as enabling layout changes by simply updating the layout permutation of the associated tensor.Intel MKL Custom LayoutIntel MKL kernels have a custom internal layout that needs to be passed between their custom TensorFlow kernels. A TF graph transformation pass enforces this invariant adding layout transformation operations where needed.ProposalSeems like really two proposals here, the first one enabling the second but it can be evaluated on its own I believe.
1. Add an optional single attribute-value to the MLIR tensor type for specifying the layout. This layout attribute can be one the the predefined attribute values (e.g. a string attribute can be used by MKL to specify layout). If the layout attribute is unspecified it means that tensor has the default layout. The default layout and list of allowable layout attribute values are defined by the tensor type (e.g. string and xla-tiled-layout).It isn't clear if the layout is allowed to introduce aliasing, it may not matter for the tensor type because of its immutability but I assuming we'll want the design to work with memref (or buffers, or ...) in the future?Basically: is a valid layout always injective?
2. Add a new “TiledLayoutAttr” attribute value to MLIR which represents XLA’s tiled layout. One change from XLA’s tiled layout will be to list the layout (which is a permutation of logical dimensions) from slowest-varying to fastest-varying dimensions (this ordering is more intuitive to users used to thinking in row-major layouts). Specify the memory space in which the tensor resides as part of TiledLayoutAttr.The memory space of a Tensor is specified by a target-specific integer index. If no memory space is specified, then the default memory space (0) is used. The default space is target specific but always at index 0.I am confused why are we conflating the tiling with the memory space actually: these seem like orthogonal constraint to apply?Also, can we use a string attribute for memory spaces? It seems much future proof in terms of avoiding collision between heterogeneous components in the system. LLVM is a bit limited by this.
That seems fair to me, but we'd like to use that layout in other places (i.e. lower level XLA dialects). Would putting that layout in the XLA dialect restrict its uses in other places?
To unsubscribe from this group and stop receiving emails from it, send an email to ml...@tensorflow.org.
I’m currently working on a dialect to replace PlaidML’s ‘Stripe’ IR. We’ve currently defined our own tensor datatype since the existing tensor datatype doesn’t have support for controlling layout, which we fundamentally need. The ‘TiledLayoutAttr’ as is slightly different to the approach we’ve taken, but I think it’s sufficiently general that we could replace our mechanism with TiledLayoutAttr which would allow us to use the standard tensor type. To me, layouts are fairly core to any concrete implementation of tensors in real machines, and tiling is used in multiple implementations, so it doesn’t feel like an XLA specific thing. We currently only have a dependency on the main MLIR repo, and not any of the TF specific stuff, so from my perspective putting the TiledLayoutAttr into the core types would be a big win.
-Jeremy
To view this discussion on the web visit
https://groups.google.com/a/tensorflow.org/d/msgid/mlir/CAFfp4mOA630AzM3xgWVovvAzk9%2BE82hfKYHZHGW-Zzi%2BU-F9DQ%40mail.gmail.com.
That sounds great to me. I think memory space and layout are independent concerns, but both seem relevant to enough use cases (and in fact PlaidML will use both) to add then to Tensor.
-Jeremy
Tile specification: (128), (2, 1)
1) Beyond XLA tiled layout, there is the MKL-DNN tiled layout described in this webpage and paper that would be nice to support.Most libraries use a custom "MKL-DNN" tensor type at the moment and have difficulties integrating it with classic functions, see Pytorch thread.2) Also, Morton layout might be useful as well.There are a couple of research articles on Morton layout, the most recent being
type Tensor[T] = object
rank: int
shape: SmallVector[int]
strides: SmallVector[int]
offset: int
blocking: SmallVector[int]
data: ptr T