On Fri, 20 Sep 2019 at 08:31, 'Uday Bondhugula' via MLIR
<
ml...@tensorflow.org> wrote:
> I was wondering whether it was meaningful to have a splat op that creates either a tensor or a vector from an elemental type (non-vector). In the LLVM IR, there is no splat/broadcast op, but the builder for a splat is written using an insertelement + shufflevector. insertelement and shufflevector are lower level ops that provide more flexiblity, and since they needed to exist anyway, I assume there wasn't a need to create a splat op in LLVM --- although vector hardware often has a broadcast instruction that matches the splat. The special case of insertelement + shufflevector that matches a broadcast/splat is I assume easily pattern matched during LLVM's target codegen.
Hi Uday,
There were a few discussions in the LLVM list about having a splat
instructions, mostly related to scalable vector instructions (like SVE
and RISCV). The main reason to not have a specific instruction for
splat is that it can get complicated in the edge cases, but also, as
you mention, insert+shuffle patters are "easy" to match in the
back-end. So, while there is no strong push to not have it, there's
also no strong consensus on *how* to have it.
Here's the SVE discussion:
http://lists.llvm.org/pipermail/llvm-dev/2018-June/123780.html
Main key issues were:
- Splat can be constant, scalar or scalar evolution: (X, X+n, X+2n,
...) which not every hardware supports in the same way, or at all
- Variable length vectors have no known iteration length of the
induction variable, making a generic evolution syntax very hard to get
right
I haven't thought much about multi-dimension splats, but I guess it
has implications when in theory you can splat at any given dimension
and the implementation can be radically different (row/column major,
NUMA regions, tiling shapes, etc).
There is an argument that we can get an instruction for the base type
of splat: constant value, compile-time known boundaries, but that's a
very boring one and the simplest insert+shuffle pattern, so we don't
gain much.
The risk of introducing more and more complex splat instruction types
is that the front/middle end can generate instructions that the
back-end cannot handle, so we'd have to implement the lowering of all
unsupported types in the back-end for all different targets. Being
plain IR, albeit complex patterns, it uses known instructions and the
worst case is to generate inefficient code.
> But for MLIR, it appears to make sense to add such a splat op because:
> 3) MLIR is in general meant to be higher-level, and given that most hardware have vector broadcast/splat, it appears meaningful to have a splat in addition to an insertelement + shufflevector.
I think this is the most important reason to have an actual splat in
MLIR. As long as you're able to lower the splat instruction correctly
into whatever dialect later, it should make things a lot easier to
reason with while working in such a high level. "LLVM IR is a compiler
IR", and as such, it needs to be safe and complete. MLIR is meant to
allow for very high level transformations, and we shouldn't have to
worry about hardware implementations at this level.
> 1) MLIR has a multi-dimensional vector type and so the shufflevector args when using the insertelement + shufflevector coordinates are going to be multi-dimensional coordinates (clumsy?),
It would probably look horrendous, yes. :)
What about unknown dimensions (<4 x ? x ? x f32>)? A constant splat
would probably be easy, but any other kind could have the same issues
as SVE.
> 2) MLIR has both vector and tensor types, and the splat op can support either as a result type,
If the elements values are not all the same value, then the dimensions
you start with, or if they are multiples of each other, are also
important.
What about a splat from tensor to tensor? Seems like a natural
evolution from vector->tensor.
cheers,
--renato