Splat op to create tensor/vector vs insertelement + shufflevector

252 views
Skip to first unread message

Uday Bondhugula

unread,
Sep 20, 2019, 3:31:07 AM9/20/19
to MLIR
Hi,

I was wondering whether it was meaningful to have a splat op that creates either a tensor or a vector from an elemental type (non-vector). In the LLVM IR, there is no splat/broadcast op, but the builder for a splat is written using an insertelement + shufflevector. insertelement and shufflevector are lower level ops that provide more flexiblity, and since they needed to exist anyway, I assume there wasn't a need to create a splat op in LLVM --- although  vector hardware often has a broadcast instruction that matches the splat. The special case of insertelement + shufflevector that matches a broadcast/splat is I assume easily pattern matched during LLVM's target codegen.

But for MLIR, it appears to make sense to add such a splat op because:
1) MLIR has a multi-dimensional vector type and so the shufflevector args when using the insertelement + shufflevector coordinates are going to be multi-dimensional coordinates (clumsy?), 
2) MLIR has both vector and tensor types, and the splat op can support either as a result type,
3) MLIR is in general meant to be higher-level, and given that most hardware have vector broadcast/splat, it appears meaningful to have a splat in addition to an insertelement + shufflevector.

I already have an implementation for the splat along with its LLVM lowering and thus the mlir-cpu-runner execution working for it.  The syntax looks like this:

func @foo(%0 : i32) {
   %v = splat %0 : vector<4 x 8 x i32>
   %t =  splat %0 : tensor<8 x 8 x i32>
   return
}

This could be part of StandardOps since it could create either a vector or a tensor. 

PS: MLIR already has an op to splat constants to tensors or vectors (the standard constant op with dense keyword), but not a general splat.

~ Uday

Renato Golin

unread,
Sep 20, 2019, 6:50:19 AM9/20/19
to Uday Bondhugula, MLIR
On Fri, 20 Sep 2019 at 08:31, 'Uday Bondhugula' via MLIR
<ml...@tensorflow.org> wrote:
> I was wondering whether it was meaningful to have a splat op that creates either a tensor or a vector from an elemental type (non-vector). In the LLVM IR, there is no splat/broadcast op, but the builder for a splat is written using an insertelement + shufflevector. insertelement and shufflevector are lower level ops that provide more flexiblity, and since they needed to exist anyway, I assume there wasn't a need to create a splat op in LLVM --- although vector hardware often has a broadcast instruction that matches the splat. The special case of insertelement + shufflevector that matches a broadcast/splat is I assume easily pattern matched during LLVM's target codegen.

Hi Uday,

There were a few discussions in the LLVM list about having a splat
instructions, mostly related to scalable vector instructions (like SVE
and RISCV). The main reason to not have a specific instruction for
splat is that it can get complicated in the edge cases, but also, as
you mention, insert+shuffle patters are "easy" to match in the
back-end. So, while there is no strong push to not have it, there's
also no strong consensus on *how* to have it.

Here's the SVE discussion:
http://lists.llvm.org/pipermail/llvm-dev/2018-June/123780.html

Main key issues were:
- Splat can be constant, scalar or scalar evolution: (X, X+n, X+2n,
...) which not every hardware supports in the same way, or at all
- Variable length vectors have no known iteration length of the
induction variable, making a generic evolution syntax very hard to get
right

I haven't thought much about multi-dimension splats, but I guess it
has implications when in theory you can splat at any given dimension
and the implementation can be radically different (row/column major,
NUMA regions, tiling shapes, etc).

There is an argument that we can get an instruction for the base type
of splat: constant value, compile-time known boundaries, but that's a
very boring one and the simplest insert+shuffle pattern, so we don't
gain much.

The risk of introducing more and more complex splat instruction types
is that the front/middle end can generate instructions that the
back-end cannot handle, so we'd have to implement the lowering of all
unsupported types in the back-end for all different targets. Being
plain IR, albeit complex patterns, it uses known instructions and the
worst case is to generate inefficient code.

> But for MLIR, it appears to make sense to add such a splat op because:
> 3) MLIR is in general meant to be higher-level, and given that most hardware have vector broadcast/splat, it appears meaningful to have a splat in addition to an insertelement + shufflevector.

I think this is the most important reason to have an actual splat in
MLIR. As long as you're able to lower the splat instruction correctly
into whatever dialect later, it should make things a lot easier to
reason with while working in such a high level. "LLVM IR is a compiler
IR", and as such, it needs to be safe and complete. MLIR is meant to
allow for very high level transformations, and we shouldn't have to
worry about hardware implementations at this level.

> 1) MLIR has a multi-dimensional vector type and so the shufflevector args when using the insertelement + shufflevector coordinates are going to be multi-dimensional coordinates (clumsy?),

It would probably look horrendous, yes. :)

What about unknown dimensions (<4 x ? x ? x f32>)? A constant splat
would probably be easy, but any other kind could have the same issues
as SVE.

> 2) MLIR has both vector and tensor types, and the splat op can support either as a result type,

If the elements values are not all the same value, then the dimensions
you start with, or if they are multiples of each other, are also
important.

What about a splat from tensor to tensor? Seems like a natural
evolution from vector->tensor.

cheers,
--renato

Uday Bondhugula

unread,
Sep 20, 2019, 8:19:46 AM9/20/19
to MLIR
Hi Renato,

Thanks very much for your comments - I was indeed looking forward to knowing about the experience and thinking on the LLVM side; so your inputs are very useful. Some comments below. 
My thinking on this has been that given insertelement + shufflevector and with a vector type as the result of a splat, it should always be possible to lower the splat to insertelt + shufflevec (if not anything else), and if the latter two can be handled, we won't be stuck.  Splatting an input type that is already a vector or tensor type risks making the op heavier.

If we aren't up for the risk/complexity with that, we could just restrict it.
 

> But for MLIR, it appears to make sense to add such a splat op because:
> 3) MLIR is in general meant to be higher-level, and given that most hardware have vector broadcast/splat, it appears meaningful to have a splat in addition to an insertelement + shufflevector.

I think this is the most important reason to have an actual splat in
MLIR. As long as you're able to lower the splat instruction correctly
into whatever dialect later, it should make things a lot easier to
reason with while working in such a high level. "LLVM IR is a compiler
IR", and as such, it needs to be safe and complete. MLIR is meant to
allow for very high level transformations, and we shouldn't have to
worry about hardware implementations at this level.

> 1) MLIR has a multi-dimensional vector type and so the shufflevector args when using the insertelement + shufflevector coordinates are going to be multi-dimensional coordinates (clumsy?),

It would probably look horrendous, yes. :)

What about unknown dimensions (<4 x ? x ? x f32>)? A constant splat
would probably be easy, but any other kind could have the same issues
as SVE.

A vector type in MLIR currently can't be dynamically shaped, but a tensor type can be. I actually only had static shapes in mind, but splat to dynamically sized dimensions can be handled in a way consistent with how dynamic tensor/memref shapes are dealt with in the rest of MLIR. MLIR has the notion of an SSA value of 'index' type binding  to each dynamically sized dimension. So, such a splat would look like:

func @foo(%v : f32, %s : index) -> tensor<? x f32> {
  // %s binds to the '?'; creates a 1-d dynamic tensor here.
  %T = splat [%s] %v : tensor<? x f32>
  return %T : tensor<? x f32>
}

If the size has to be recovered later from %T, for eg., in a non-dominated/escaped context, the 'dim' op is used:

// Gives you back the thing that was bound to '?'.
%size = dim %T, 0 : tensor<? x f32>

This is consistent with how dynamically shaped tensors and memrefs work in the rest of MLIR.
 

> 2) MLIR has both vector and tensor types, and the splat op can support either as a result type,

If the elements values are not all the same value, then the dimensions
you start with, or if they are multiples of each other, are also
important.

What about a splat from tensor to tensor? Seems like a natural
evolution from vector->tensor.

This would be more complex. There isn't a tensor of tensor type in MLIR (elemental type of a tensor can't be tensor), and similarly, there is no vector of vector type. But if you meant splatting a tensor to a larger tensor, that would be interesting and powerful, albeit with fewer use cases -- it's sort of creating a hierarchical/tiled tensor. I was only thinking of int or float types as the input type, but I think tensor of vector as the result type will have use cases as well.

Thanks,
Uday

 

cheers,
--renato

Renato Golin

unread,
Sep 20, 2019, 12:48:24 PM9/20/19
to Uday Bondhugula, MLIR
On Fri, 20 Sep 2019 at 13:20, 'Uday Bondhugula' via MLIR
<ml...@tensorflow.org> wrote:
> My thinking on this has been that given insertelement + shufflevector and with a vector type as the result of a splat, it should always be possible to lower the splat to insertelt + shufflevec (if not anything else), and if the latter two can be handled, we won't be stuck. Splatting an input type that is already a vector or tensor type risks making the op heavier.

Right, in MLIR that makes more sense than in LLVM IR, because the
"back-end" is another "slightly lower level" representation, which
will likely have further canonicalisation and simplifications to the
code before emission to machine code.

Also, I totally agree that, if we keep the semantics as "a combination
of existing MLIR concepts", we guarantee that we can always represent
the new nodes in all the existing dialects, even if we have to do the
conversion in two steps.

Further enhancing the syntax can be done later, if/when necessary.


> A vector type in MLIR currently can't be dynamically shaped, but a tensor type can be. I actually only had static shapes in mind, but splat to dynamically sized dimensions can be handled in a way consistent with how dynamic tensor/memref shapes are dealt with in the rest of MLIR. MLIR has the notion of an SSA value of 'index' type binding to each dynamically sized dimension. So, such a splat would look like:
>
> func @foo(%v : f32, %s : index) -> tensor<? x f32> {
> // %s binds to the '?'; creates a 1-d dynamic tensor here.
> %T = splat [%s] %v : tensor<? x f32>
> return %T : tensor<? x f32>
> }
>
> If the size has to be recovered later from %T, for eg., in a non-dominated/escaped context, the 'dim' op is used:
>
> // Gives you back the thing that was bound to '?'.
> %size = dim %T, 0 : tensor<? x f32>
>
> This is consistent with how dynamically shaped tensors and memrefs work in the rest of MLIR.

I just read the MLIR basics doc, so my knowledge is *very* limited,
but this looks like a very sane implementation. :)

The part about making sure the dims are compatible can be done either
statically (if we know them at compile time) or dynamically using the
dim operation.

Also, I guess one could specify directly which dimension we're
splatting at (and how that works) by constructing the right memref /
index notation.


> This would be more complex. There isn't a tensor of tensor type in MLIR (elemental type of a tensor can't be tensor), and similarly, there is no vector of vector type. But if you meant splatting a tensor to a larger tensor, that would be interesting and powerful, albeit with fewer use cases -- it's sort of creating a hierarchical/tiled tensor. I was only thinking of int or float types as the input type, but I think tensor of vector as the result type will have use cases as well.

I meant splatting a tensor into a larger tensor for tiling, yes. But
as you say, that's increase in complexity with diminishing returns, so
can be deferred to a time when it's actually useful. :)

cheers,
--renato

Mehdi AMINI

unread,
Sep 21, 2019, 12:27:24 AM9/21/19
to Uday Bondhugula, MLIR
Hi Uday,

This looks all reasonable to me.

Thanks,

-- 
Mehdi


--
You received this message because you are subscribed to the Google Groups "MLIR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mlir+uns...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/mlir/295858c3-7a41-4af9-8e88-4c674b973897%40tensorflow.org.
Reply all
Reply to author
Forward
0 new messages