We're lowering from the XLA dialect (not open source yet) to use our runtime that has a set of max operations. This felt like one of those things that should be in standard. I noticed that the documentation for the standard select operation mentions that max can be implemented as cmp + select, which makes sense, but isn't ideal if our backend has a max op. Since MLIR is trying to avoid raising operations, this feels like a premature lowering.What are people's thoughts on having a first-class set of max operations in std ops?
--
You received this message because you are subscribed to the Google Groups "MLIR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mlir+uns...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/mlir/910ED044-F0A2-4F3D-B3EF-C029BBE9EC45%40google.com.
On May 10, 2019, at 1:39 PM, 'Geoffrey Martin-Noble' via MLIR <ml...@tensorflow.org> wrote:We're lowering from the XLA dialect (not open source yet) to use our runtime that has a set of max operations. This felt like one of those things that should be in standard. I noticed that the documentation for the standard select operation mentions that max can be implemented as cmp + select, which makes sense, but isn't ideal if our backend has a max op. Since MLIR is trying to avoid raising operations, this feels like a premature lowering.What are people's thoughts on having a first-class set of max operations in std ops?Hi Geoffrey,Standard ops is still rapidly evolving, and isn’t really well designed or defined yet. As we get further along, I’d love for us to do a more detailed survey of the prior art in ONNX, nGraph core, and other communities to help define and design it. Lots of smart people have thought about this problem.
That said, there are some principles that are likely to inform the design: When standard ops exists, our goal isn’t to “avoid lowering” in it - such a goal could only be achieved by ’standardizing’ all of the ops that all frontends have, which isn’t practical.Consider something like relu: it is cleanly lowerable to max, which is cleanly lowerable to cmp/select.My view is that we shouldn’t have relu (or max) as standard ops because of that “cleanly” lowerable aspect: once lowered, it is very simple for backends to pattern match max or relu from the primitive operations. This is why we’re investing in powerful graph pattern matching infrastructure, to make this easy to do.
From the LLVM perspective, and the vectorizer in particular, we investigated a similar problem with vector idioms in the vectorizer (optimizer) and how to retrieve or preserve that information all the way down to the backend. Even though we are not talking about the same level of abstraction, maybe that experience might be useful here in some way.
We found that for very simple idioms, like min/max/abs, it was feasible to use an LLVM-IR canonical form for them (cmp + select). However, for idioms just a bit more complex (5-10 instructions), a canonical form was unfeasible due to the high number of variants that we could have for the same idiom and the difficulty of preserving them intact until the backend. For those, intrinsics was the suggested way to go.
> My view is that we shouldn’t have relu (or max) as standard ops because of that “cleanly” lowerable aspect: once lowered, it is very simple for backends to pattern match max or relu from the primitive operations. This is why we’re investing in powerful graph pattern matching infrastructure, to make this easy to do.
This makes sense to me. Maybe, for the reasons I mentioned before, we might need to evaluate this approach case by case to make sure that we will be able to pattern match the high-level op after the lowering. For complex ops that we could pattern much today, we may also want to consider the impact of assuming that the pattern match will always be true. That may impose some constrains in future optimizations that could potentially change the expected patterns. We would have to make them aware of the patterns, which might not be ideal.
> My view is that the backend will be able to declare any ops they want as supported (including frontend specific ops like tf.FusedBatchNorm) and if the backend supports it, then those ops will not be lowered. If a backend does not support it, then the compiler will apply a standard series of expansions to produce the finer grained “standard” ops.
This makes a lot of sense to me. Something to consider here is how keeping an unknown high-level op might prevent other optimizations implemented in the standard dialect. We had the same questions wrt using intrinsics for vector idioms.
Thanks!
Diego Caballero
nGraph
--
One case I remember was related to an LLVM's "CreateVectorSplat" pattern (insertelement + shufflevector). InstCombine commuted some operations with the shufflevector, and would break the vector splat pattern.
Yeah, exactly.
Diego
From: 'Sean Silva' via MLIR [mailto:ml...@tensorflow.org]
Sent: Friday, May 10, 2019 3:41 PM
To: MLIR <ml...@tensorflow.org>
Subject: Re: [mlir] Max Op in Standard
I generally agree, but inline I have given one caveat that people should be aware of.
--
You received this message because you are subscribed to the Google Groups "MLIR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
mlir+uns...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/mlir/365a5c05-f0c5-4931-8a4e-805da0d48ae0%40tensorflow.org.
Does this make sense?tl;dr: I’d prefer to *not* have a max op :-)
---Chris
You received this message because you are subscribed to the Google Groups "MLIR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mlir+uns...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/mlir/910ED044-F0A2-4F3D-B3EF-C029BBE9EC45%40google.com.
Consider something like relu: it is cleanly lowerable to max, which is cleanly lowerable to cmp/select.My view is that we shouldn’t have relu (or max) as standard ops because of that “cleanly” lowerable aspect: once lowered, it is very simple for backends to pattern match max or relu from the primitive operations. This is why we’re investing in powerful graph pattern matching infrastructure, to make this easy to do.One caveat is that this can be broken by transformations mangling your pattern. One failure mode I remember from working on an LLVM backend on my previous project is that sometimes when we would merge from upstream, some new InstCombine transformation would break a pattern we were lowering, causing us ISel failures.One case I remember was related to an LLVM's "CreateVectorSplat" pattern (insertelement + shufflevector). InstCombine commuted some operations with the shufflevector, and would break the vector splat pattern.
On May 10, 2019, at 5:32 PM, Stella Laurenzo <laur...@google.com> wrote:I suspect that there will be an unending list of higher level operations that various backends will want - some will want to have special support for very complex operations like fused batch norm. My view is that the backend will be able to declare any ops they want as supported (including frontend specific ops like tf.FusedBatchNorm) and if the backend supports it, then those ops will not be lowered. If a backend does not support it, then the compiler will apply a standard series of expansions to produce the finer grained “standard” ops.I expect that this will provide a sweet spot where a backend can choose to just implement the fine grained atoms if they want, but they can also choose to implement completely custom high level ops if there is a need or desire to do so.This infra is still extremely early, but we’ve built it before in the LLVM instruction selection framework for the scalar domain, so I have pretty high confidence that it will come together very nicely.This makes sense to me. Internally, I've found myself explaining this 1:1 several times, and I think people would benefit from having a FAQ or design doc (if not an initial implementation) to reason about soon. I'd like to start putting some practical pressure on it soon to see how it works. I think +River Riddle has this on his radar, but we don't have an issue for it yet.
From the LLVM perspective, and the vectorizer in particular, we investigated a similar problem with vector idioms in the vectorizer (optimizer) and how to retrieve or preserve that information all the way down to the backend. Even though we are not talking about the same level of abstraction, maybe that experience might be useful here in some way.We found that for very simple idioms, like min/max/abs, it was feasible to use an LLVM-IR canonical form for them (cmp + select). However, for idioms just a bit more complex (5-10 instructions), a canonical form was unfeasible due to the high number of variants that we could have for the same idiom and the difficulty of preserving them intact until the backend. For those, intrinsics was the suggested way to go.
> My view is that the backend will be able to declare any ops they want as supported (including frontend specific ops like tf.FusedBatchNorm) and if the backend supports it, then those ops will not be lowered. If a backend does not support it, then the compiler will apply a standard series of expansions to produce the finer grained “standard” ops.This makes a lot of sense to me. Something to consider here is how keeping an unknown high-level op might prevent other optimizations implemented in the standard dialect. We had the same questions wrt using intrinsics for vector idioms.
For now, I just lowered directly from XLA max
--
You received this message because you are subscribed to the Google Groups "MLIR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mlir+uns...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/mlir/2400b294-f19c-4996-996c-aafb3c8198ca%40tensorflow.org.
I believe the improvements River is making to the lowering framework will make all this much easier.
It's nice to be able to take advantage of as much of the core infrastructure as possible.
the backend will be able to declare any ops they want as supported (including frontend specific ops like tf.FusedBatchNorm) and if the backend supports it, then those ops will not be lowered. If a backend does not support it, then the compiler will apply a standard series of expansions to produce the finer grained “standard” ops.
To unsubscribe from this group and stop receiving emails from it, send an email to ml...@tensorflow.org.
Thank you for your response Geoffrey.I believe the improvements River is making to the lowering framework will make all this much easier.Do you know where I can read more about these improvements?
It's nice to be able to take advantage of as much of the core infrastructure as possible.I agree. But this would require the back-end specific ops to be part of the standard dialect. Do you know why this might be a problem? And do you think being able to inherit from the standard dialect would help (I have created a separate thread on sub-dialects)? This would allow back-ends to add supported ops to specialized versions of the standard dialect but still have access to the general ops and optimizations that belong to the standard dialect.
the backend will be able to declare any ops they want as supported (including frontend specific ops like tf.FusedBatchNorm) and if the backend supports it, then those ops will not be lowered. If a backend does not support it, then the compiler will apply a standard series of expansions to produce the finer grained “standard” ops.I believe this approach would also result in mixed dialects (high level op + standard ops instead of low level op + standard ops) thereby preventing optimizations.
To unsubscribe from this group and stop receiving emails from it, send an email to mlir+uns...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/mlir/7f126f60-75a2-40d4-af38-4b12e67de9b1%40tensorflow.org.