[llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!

Quentin Colombet via llvm-dev

unread,

Mar 29, 2017, 7:27:20 PM3/29/17

to llvm-dev, Justin Bogner, Ahmed Bougacha, Aditya Nandakumar

Hi,

GlobalISel, the SelectionDAG replacement, has come a long way since initially discussed on the mailing list and its last discussion at the EuroLLVM BoF (https://etherpad.net/p/GlobalISel).

We believe we are close to the point of enabling it by default on AArch64 at O0. We now would like to enlist your help to get there.

*** Quick Status ***

On iOS we are at 100% pass rate in 00 g for the LLVM test suite, standard benchmarks and unit tests. In about 5% of all functions GlobalIsel falls back to SDIsel.

(Kristof Beyls would have the linux numbers.)

The self host compiler correctly builds and runs the LLVM test suite in O0.

*** We Need Your Help ***

Please try GlobalISel for AArch64 at O0 (preferably O0 g) and file PR for:

- Performance problems (compile time, runtime, code size)

- Miscompile

- Crashers

- Poor debug information

- etc.

There is a component GlobalISel in llvm.org/bugs for that.

GlobalISel cannot handle some inputs (e.g., some vector construction), but it falls back to SDISel when it hits such cases. We are also interested to know whether or not GlobalISel fallback on your favorite workloads and why (see next section for the actual options to run). So please file PRs for that too.

As always, patches are welcome!

*** Concretely What Do I Run ***

Please test your favorite workload/scenario on AArch64 at O0 using one of the following additional category of options:

Recommended: (-llvm) -global-isel (-mllvm) -global-isel-abort=2

OkToFallBack:: (-llvm) -global-isel (-mllvm) -global-isel-abort=0

AbortOnFallBack: (-llvm) -global-isel

The Recommended way will issue a warning if it falls back to SelectionDAG, the OkToFallBack will silently fallback to SDISel if it needs to, and the AbortOnFallBack will kill the compiler if GlobalISel cannot handle the input completely.

*** What Is Next? ***

We would like to turn on GobalISel by default in the next couple of weeks. Please help identify any critical issue that needs to be resolved before that can happen. We expect that minor hiccups and outliers in any metrics can be fixed quickly, but are worried about harmful run-time (miscompile) or compile-time crashes.

Many thanks,

-Quentin

Renato Golin via llvm-dev

unread,

Mar 30, 2017, 4:54:18 AM3/30/17

to Quentin Colombet, llvm-dev, Justin Bogner, Ahmed Bougacha, Aditya Nandakumar

On 30 March 2017 at 00:27, Quentin Colombet <qcol...@apple.com> wrote:
> On iOS we are at 100% pass rate in 00 g for the LLVM test suite, standard
> benchmarks and unit tests. In about 5% of all functions GlobalIsel falls
> back to SDIsel.
> (Kristof Beyls would have the linux numbers.)
> The self host compiler correctly builds and runs the LLVM test suite in O0.

Having done no tests at all on my side, I think we need to have
similar numbers on Linux to be able to flip across the board.

I don't want to flip it only for Darwin and not Linux, as that will
fragment the effort too much.

I'll check with Diana and Kristof to know what's the best way forward,
but it should be reasonably quick.

cheers,
--renato
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Evgeny Astigeevich via llvm-dev

unread,

Mar 30, 2017, 10:13:43 AM3/30/17

to Renato Golin, llvm-dev, nd

Hi Renato,

If Kristof is busy I can make runs on AArch64 Linux (Cortex-A53 and Cortex-57).

Thanks,
Evgeny Astigeevich
Senior Compiler Engineer
Compilation Tools
ARM

Diana Picus via llvm-dev

unread,

Mar 30, 2017, 11:44:45 AM3/30/17

to Evgeny Astigeevich, llvm-dev, nd

Hi,

Just a heads-up for anyone building things with CMake: if you use the
default build type (Release), it will put CMAKE_CXX_FLAGS_RELEASE
after the CMAKE_CXX_FLAGS, so you'll end up running -O3 instead of
-O0. You might want to check your command lines.

Kristof has also found an issue in the test-suite where it actually
runs -O2 due to some weirdness in the build system, we'll try to
commit a fix for that.

Cheers,
Diana

On 30 March 2017 at 16:13, Evgeny Astigeevich via llvm-dev

Evgeny Astigeevich via llvm-dev

unread,

Mar 30, 2017, 11:51:29 AM3/30/17

to Diana Picus, llvm-dev, nd

Hi Diana,

Thank you for information. Yes, I know about the problem with O options.
I've been hit by this whilst I was testing my changes for Oz.

Thanks,
Evgeny

> -----Original Message-----
> From: Diana Picus [mailto:diana...@linaro.org]
> Sent: Thursday, March 30, 2017 4:44 PM
> To: Evgeny Astigeevich
> Cc: Renato Golin; llvm-dev; nd
> Subject: Re: [llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for
> O0: Please give it a try!
>

> Hi,
>
> Just a heads-up for anyone building things with CMake: if you use the default
> build type (Release), it will put CMAKE_CXX_FLAGS_RELEASE after the
> CMAKE_CXX_FLAGS, so you'll end up running -O3 instead of -O0. You might
> want to check your command lines.
>
> Kristof has also found an issue in the test-suite where it actually runs -O2 due
> to some weirdness in the build system, we'll try to commit a fix for that.
>
> Cheers,
> Diana
>

> On 30 March 2017 at 16:13, Evgeny Astigeevich via llvm-dev <llvm-

Kristof Beyls via llvm-dev

unread,

Apr 3, 2017, 11:10:36 AM4/3/17

to Renato Golin, Justin Bogner, llvm-dev, Ahmed Bougacha, Aditya Nandakumar, nd

I've kicked off a run to compare "-O0 -g" versus "-O0 -g -mllvm -global-isel -mllvm -global-isel-abort=2".

I've selected the test-suite (albeit a version which is a couple of months old now) and a few short-running proprietary benchmarks to get data back quickly for an initial feel of where things are.

This was running on Cortex-A57 AArch64 Linux.

I saw one assertion failure in GlobalISel, see http://bugs.llvm.org/show_bug.cgi?id=32471. This is in a program compiled at -O2 (my out-dated test-suite still overrides -O0 and instead uses -O for that program). The root cause of the failure seems to be due to LowLevelType not supporting vectors of pointers. I think this demonstrates that for correctness, we should be trying to test more than -O0, or even more than just LLVM-IR produced by clang, as other front-ends could run into this even at -O0.

Due to this assertion failure and the infrastructure I used, the numbers below do not include test-suite/MultiSource/Benchmarks results.

On the non-correctness aspects, LNT tells me that:

- The programs that report execution time, on geomean are about 17% slower.

- The programs that report scores, on geomean are about 21% slower.

- Code size is up on geomean about 11%.

I'm afraid I don't have compile time numbers, nor any feel for debug info quality.

I'll need quite a bit more time to dig into the details to come up with something actionable, although the fact that LowLevelType doesn't support vectors of pointers is already actionable.

Nevertheless, I thought to share what I see as is, to see if others see similar results so far.

I thought Diana was going to look into fallback rate on the test-suite on AArch64 linux?

Thanks,

Kristof

Tim Northover via llvm-dev

unread,

Apr 3, 2017, 12:24:35 PM4/3/17

to Diana Picus, llvm-dev, nd

On 30 March 2017 at 08:44, Diana Picus via llvm-dev

<llvm...@lists.llvm.org> wrote:
> Just a heads-up for anyone building things with CMake: if you use the
> default build type (Release), it will put CMAKE_CXX_FLAGS_RELEASE
> after the CMAKE_CXX_FLAGS, so you'll end up running -O3 instead of
> -O0. You might want to check your command lines.

It does actually work if you add "-fno-vectorize -fno-slp-vectorize"
to CMAKE_CXX_FLAGS_RELASE (and C). Not a standard configuration, but
probably throwing more varied IR at it than just doing -O0.

Cheers.

Tim.

Diana Picus via llvm-dev

unread,

Apr 4, 2017, 9:55:40 AM4/4/17

to Kristof Beyls, Justin Bogner, llvm-dev, Ahmed Bougacha, Aditya Nandakumar, nd

Hi,

Here's my results so far:

On the test-suite, we get 2 timeouts during execution (paq8p and
scimark2). Other than that, everything seems to run just fine. We have
about 7410 unique fallbacks out of 52367 unique functions. Assuming I
counted right (please let me know if this looks fishy, I personally
find the total number of functions terrifyingly small).

On a stage 2 build of clang, we run check-all successfully. We have
about 64784 unique fallbacks out of 661461 functions.

I'm currently trying to run a stage 3 build to compare binaries (just
for kicks) and I'll also try to do some runs without fallbacks and
count what problems we run into most often.

The way I've been counting the total number of functions was to run
objdump -t on all the .o files in the build directory, grab everything
with an "F" flag and remove duplicates. If anyone knows a better way
to do this I'm all ears.

Thanks,
Diana

Tim Northover via llvm-dev

unread,

Apr 4, 2017, 12:55:31 PM4/4/17

to Diana Picus, llvm-dev, Justin Bogner, Ahmed Bougacha, Aditya Nandakumar, nd

On 4 April 2017 at 06:55, Diana Picus via llvm-dev

<llvm...@lists.llvm.org> wrote:
> On the test-suite, we get 2 timeouts during execution (paq8p and
> scimark2).

Interesting. I'd not seen those failures in the configurations I'd
run. I'll look into them (other than that my best bet for debugging is
a kernel panic, this has to be easier!).

> On the test-suite, we get 2 timeouts during execution (paq8p and
> scimark2). Other than that, everything seems to run just fine. We have
> about 7410 unique fallbacks out of 52367 unique functions. Assuming I
> counted right (please let me know if this looks fishy, I personally
> find the total number of functions terrifyingly small).

It's about 275000 before uniqueing (which roughly matches an earlier
measurement I did). The duplication seems to be dominated by the
halide tests, though you have to be a little careful with "main"
(which occurs about 500 times). At a glance I'd put the total closer
to 53000 (52420 + 492 extra copies of main), but that's a small
difference to your figure.

Tim.

Tim Northover via llvm-dev

unread,

Apr 4, 2017, 1:23:14 PM4/4/17

to Diana Picus, llvm-dev, Justin Bogner, Ahmed Bougacha, Aditya Nandakumar, nd

On 4 April 2017 at 09:55, Tim Northover <t.p.no...@gmail.com> wrote:
> On 4 April 2017 at 06:55, Diana Picus via llvm-dev
> <llvm...@lists.llvm.org> wrote:
>> On the test-suite, we get 2 timeouts during execution (paq8p and
>> scimark2).
>
> Interesting. I'd not seen those failures in the configurations I'd
> run. I'll look into them (other than that my best bet for debugging is
> a kernel panic, this has to be easier!).

I just ran the tests on Linux. Those two were last to finish, but they
did terminate eventually for me. Bother, back to the kernel.

Eric Christopher via llvm-dev

unread,

Apr 4, 2017, 3:47:50 PM4/4/17

to Tim Northover, Diana Picus, llvm-dev, nd, Ahmed Bougacha, Justin Bogner, Aditya Nandakumar

I'll try a run here shortly, it's just going to take a while.

-eric

Matthias Braun via llvm-dev

unread,

Apr 4, 2017, 4:21:42 PM4/4/17

to Quentin Colombet, llvm-dev, Ahmed Bougacha, Justin Bogner, Aditya Nandakumar

Would this completely replace FastISel (on AArch64) then? So that we have to look for bugs only in GlobalISel and SelectionDAG?

- Matthias

Kristof Beyls via llvm-dev

unread,

Apr 6, 2017, 9:53:11 AM4/6/17

to Quentin Colombet, llvm-dev, Justin Bogner, Ahmed Bougacha, Aditya Nandakumar, nd

I've been digging a little bit deeper into the biggest performance regressions I've observed.

What I've observed so far is:

* A lot of the biggest regressions are caused by unnecessarily moving floating point values through general purpose registers. I've raised http://bugs.llvm.org/show_bug.cgi?id=32550 for this. I think this one definitely needs fixing before enabling GlobalISel by default at -O0.

* FastISel seems to transform division-by-constant-power-of-2 into right shift (see https://github.com/llvm-mirror/llvm/blob/master/lib/CodeGen/SelectionDAG/FastISel.cpp#L456-L468). GlobalISel does not. It seems to me that at -O0 there may be reasons not perform this transformation, but maybe there is a good reason why FastISel does this?

* FastISel doesn't seem to handle functions with switch statements, so it falls back to DAGISel. DAGISel produces code that's a lot better than GlobalISel for switch statement at -O0. I'm not sure if we need to do something here before enabling GlobalISel by default. I'm thinking we may need to add a smarter way to lower switch statements rather than just a cascaded sequence of conditional branches.

I'll try to add the above content to the document Diana created at https://goo.gl/IS2Bdw too.

Thanks,

Kristof

Ahmed Bougacha via llvm-dev

unread,

Apr 6, 2017, 3:07:22 PM4/6/17

to Kristof Beyls, llvm-dev, nd, Justin Bogner, Aditya Nandakumar

On Thu, Apr 6, 2017 at 6:53 AM, Kristof Beyls via llvm-dev
<llvm...@lists.llvm.org> wrote:
> I've been digging a little bit deeper into the biggest performance
> regressions I've observed.
>
> What I've observed so far is:
> * A lot of the biggest regressions are caused by unnecessarily moving
> floating point values through general purpose registers. I've raised
> http://bugs.llvm.org/show_bug.cgi?id=32550 for this. I think this one
> definitely needs fixing before enabling GlobalISel by default at -O0.
> * FastISel seems to transform division-by-constant-power-of-2 into right
> shift (see
> https://github.com/llvm-mirror/llvm/blob/master/lib/CodeGen/SelectionDAG/FastISel.cpp#L456-L468).
> GlobalISel does not. It seems to me that at -O0 there may be reasons not
> perform this transformation, but maybe there is a good reason why FastISel
> does this?

So, FastISel on AArch64 isn't really an "O0" selector: it has a lot
of smarts and peepholes, because some JIT users had it as the main
optimizing selector for a while.

In that sense, it's a pretty aggressive target that IMO we don't have to match.

> * FastISel doesn't seem to handle functions with switch statements, so it
> falls back to DAGISel. DAGISel produces code that's a lot better than
> GlobalISel for switch statement at -O0. I'm not sure if we need to do
> something here before enabling GlobalISel by default. I'm thinking we may
> need to add a smarter way to lower switch statements rather than just a
> cascaded sequence of conditional branches.

D31080 seems promising, I've been wanting to take a look, hoping we
can use that to emit an optimized lowering. I'm not sure we want that
at O0 though (even if only for FastISel+DAGISel parity).

> I'll try to add the above content to the document Diana created at
> https://goo.gl/IS2Bdw too.

Thanks for the investigation! These are also some of the biggest
problems I've seen (in particular the FP regbanks).

I'll make sure I find the time to file bugs for all the other issues
I'm aware of. (sorry I haven't done that earlier!)

-Ahmed

Gerolf Hoflehner via llvm-dev

unread,

Apr 7, 2017, 3:34:35 AM4/7/17

to Tim Northover, llvm-dev, Justin Bogner, Ahmed Bougacha, Aditya Nandakumar, nd

> On Apr 4, 2017, at 10:23 AM, Tim Northover via llvm-dev <llvm...@lists.llvm.org> wrote:
>
> On 4 April 2017 at 09:55, Tim Northover <t.p.no...@gmail.com> wrote:
>> On 4 April 2017 at 06:55, Diana Picus via llvm-dev
>> <llvm...@lists.llvm.org> wrote:
>>> On the test-suite, we get 2 timeouts during execution (paq8p and
>>> scimark2).
>>
>> Interesting. I'd not seen those failures in the configurations I'd
>> run. I'll look into them (other than that my best bet for debugging is
>> a kernel panic, this has to be easier!).
>
> I just ran the tests on Linux. Those two were last to finish, but they
> did terminate eventually for me. Bother, back to the kernel.

But how do the compile-times compare vs FastIsel?

Kristof Beyls via llvm-dev

unread,

Apr 7, 2017, 4:14:44 AM4/7/17

to Ahmed Bougacha, Justin Bogner, llvm-dev, Aditya Nandakumar, nd

On 6 Apr 2017, at 21:06, Ahmed Bougacha <ahmed.b...@gmail.com> wrote:

On Thu, Apr 6, 2017 at 6:53 AM, Kristof Beyls via llvm-dev
<llvm...@lists.llvm.org> wrote:

I've been digging a little bit deeper into the biggest performance
regressions I've observed.

What I've observed so far is:
* A lot of the biggest regressions are caused by unnecessarily moving
floating point values through general purpose registers. I've raised
http://bugs.llvm.org/show_bug.cgi?id=32550 for this. I think this one
definitely needs fixing before enabling GlobalISel by default at -O0.
* FastISel seems to transform division-by-constant-power-of-2 into right
shift (see
https://github.com/llvm-mirror/llvm/blob/master/lib/CodeGen/SelectionDAG/FastISel.cpp#L456-L468).
GlobalISel does not. It seems to me that at -O0 there may be reasons not
perform this transformation, but maybe there is a good reason why FastISel
does this?

So, FastISel on AArch64 isn't really an "O0" selector: it has a lot
of smarts and peepholes, because some JIT users had it as the main
optimizing selector for a while.

In that sense, it's a pretty aggressive target that IMO we don't have to match.

OK, that makes sense to me now, and indeed it doesn't seem a good idea to try and do lots

of peepholes at -O0.

* FastISel doesn't seem to handle functions with switch statements, so it
falls back to DAGISel. DAGISel produces code that's a lot better than
GlobalISel for switch statement at -O0. I'm not sure if we need to do
something here before enabling GlobalISel by default. I'm thinking we may
need to add a smarter way to lower switch statements rather than just a
cascaded sequence of conditional branches.

D31080 seems promising, I've been wanting to take a look, hoping we
can use that to emit an optimized lowering. I'm not sure we want that
at O0 though (even if only for FastISel+DAGISel parity).

I wasn't aware of D31080: good to know!

My thinking here is that one of the reasons people use -O0 is they want a pretty straightforward

mapping between source code and the generated assembly code. For switch statements,

mapping to a cascaded sequence of conditional branches, a jump table, a binary search tree,

or any of the other ways to lower switch statements is equally good from this

perspective, I think. So if one of these other lowering schemes is as good as a cascaded sequence

of branches for aspects such as compile time and debug info quality, I think it's best to choose

one of these alternative lowering schemes.

It seems to me that e.g. on MultiSource/Applications/sqlite3/sqlite3, this may be the cause

of the almost 2x slowdown compared to non-globalisel -O0.

I'll try to add the above content to the document Diana created at
https://goo.gl/IS2Bdw too.

Thanks for the investigation! These are also some of the biggest
problems I've seen (in particular the FP regbanks).

I'll make sure I find the time to file bugs for all the other issues
I'm aware of. (sorry I haven't done that earlier!)

I've seen you added 2 bugs so far. I've slotted them in to https://goo.gl/IS2Bdw.

I'm starting to think that it may be easiest if we had a "Meta bug" in bugzilla that combines

all the issues we think should be fixed before GlobalISel can be enabled by default at -O0

for AArch64. In the same style as e.g. http://bugs.llvm.org/show_bug.cgi?id=32061.

What do you think?

Thanks!

Kristof

Diana Picus via llvm-dev

unread,

Apr 7, 2017, 4:56:24 AM4/7/17

to Kristof Beyls, Justin Bogner, llvm-dev, Aditya Nandakumar, nd

On 7 April 2017 at 10:14, Kristof Beyls <Kristo...@arm.com> wrote:
> I'm starting to think that it may be easiest if we had a "Meta bug" in
> bugzilla that combines
> all the issues we think should be fixed before GlobalISel can be enabled by
> default at -O0
> for AArch64. In the same style as e.g.
> http://bugs.llvm.org/show_bug.cgi?id=32061.

+1, we already do this for lots of other things (inline assembly bugs,
bugs building the Linux kernel etc) and it's a good way to keep things
organized.

Quentin Colombet via llvm-dev

unread,

Apr 26, 2017, 7:48:54 PM4/26/17

to Kristof Beyls, llvm-dev, Justin Bogner, Ahmed Bougacha, Aditya Nandakumar, nd

Hi Kristof,

On Apr 6, 2017, at 6:53 AM, Kristof Beyls <kristo...@arm.com> wrote:

I've been digging a little bit deeper into the biggest performance regressions I've observed.

What I've observed so far is:

* A lot of the biggest regressions are caused by unnecessarily moving floating point values through general purpose registers. I've raised http://bugs.llvm.org/show_bug.cgi?id=32550 for this. I think this one definitely needs fixing before enabling GlobalISel by default at -O0.

I commented in the PR. This is a known problem and we have a solution. Given this is an optimization in the sense that it does not affect the correctness of the program, we didn’t push for fixing it now.

For O0 we wanted to focus ourselves on generating correct code. Unless the regressions you are seeing are preventing debugging/running of the program, I wouldn’t block the flip of the switch on that.

What do you think?

* FastISel seems to transform division-by-constant-power-of-2 into right shift (see https://github.com/llvm-mirror/llvm/blob/master/lib/CodeGen/SelectionDAG/FastISel.cpp#L456-L468). GlobalISel does not. It seems to me that at -O0 there may be reasons not perform this transformation, but maybe there is a good reason why FastISel does this?

I think FastISel tries to generate the best code it can no matter what. For GISel O0 however, not doing this optimization sounds sensible to me.

Now, I would say that the same remark as the previous bullet point apply: we shouldn’t do it unless it gets in the way of running/debugging the program.

* FastISel doesn’t\ seem to handle functions with switch statements, so it falls back to DAGISel. DAGISel produces code that's a lot better than GlobalISel for switch statement at -O0. I'm not sure if we need to do something here before enabling GlobalISel by default. I'm thinking we may need to add a smarter way to lower switch statements rather than just a cascaded sequence of conditional branches.

Sounds optimization-ish to me. Same remark.

Kristof Beyls via llvm-dev

unread,

Apr 27, 2017, 12:48:08 PM4/27/17

to Quentin Colombet, llvm-dev, Justin Bogner, Ahmed Bougacha, Aditya Nandakumar, nd

Hi Quentin,

On 27 Apr 2017, at 00:48, Quentin Colombet <qcol...@apple.com> wrote:

Hi Kristof,

On Apr 6, 2017, at 6:53 AM, Kristof Beyls <kristo...@arm.com> wrote:

I've been digging a little bit deeper into the biggest performance regressions I've observed.

What I've observed so far is:

* A lot of the biggest regressions are caused by unnecessarily moving floating point values through general purpose registers. I've raised http://bugs.llvm.org/show_bug.cgi?id=32550 for this. I think this one definitely needs fixing before enabling GlobalISel by default at -O0.

I commented in the PR. This is a known problem and we have a solution. Given this is an optimization in the sense that it does not affect the correctness of the program, we didn’t push for fixing it now.

For O0 we wanted to focus ourselves on generating correct code. Unless the regressions you are seeing are preventing debugging/running of the program, I wouldn’t block the flip of the switch on that.

What do you think?

For O0, I think most users care about fast compile time and an excellent debug experience.

Next to that, I am told that some users also use -O0 to have a straightforward, simple, mapping between source code and the generated assembly.

Out of the issues I've seen so far, I think this is the single "optimization" issue that I feel gives a negative first impression of GlobalISel.

If users look at the generated assembly for floating point code it looks more like active de-optimization rather than "no optimization".

My guess is also that this may be the biggest reason for the about 20% performance slow-down and 11% code size increase I measured earlier.

OTOH, I clearly agree this is an optimization issue, not a correctness issue.

Combining the above, I think it would be best if this issue got fixed before we turn on GlobalISel by default at -O0, or we may end up hearing quite a few (non-critical) complaints about this from users.

Basically I think this is a tradeoff between giving a better first impression of GlobalISel vs getting more people to use and test it earlier.

Thanks for the write-up on the PR, that is very useful.

Do you have any rough idea how much effort would be involved in getting this fixed? I got the impression Daniel is making good progress on the tablegen generation, which is key to getting this issue fixed?

I think that no matter whether we decide to switch the default before or after fixing this issue, this is one of the most urgent issues to fix as far as I can see.

If there is some way I can help contribute to fixing PR32550, I would like to help; but with the dependency on tablegen generation, I'm not sure what the best way is to help make that PR get fixed faster?

* FastISel seems to transform division-by-constant-power-of-2 into right shift (see https://github.com/llvm-mirror/llvm/blob/master/lib/CodeGen/SelectionDAG/FastISel.cpp#L456-L468). GlobalISel does not. It seems to me that at -O0 there may be reasons not perform this transformation, but maybe there is a good reason why FastISel does this?

I think FastISel tries to generate the best code it can no matter what. For GISel O0 however, not doing this optimization sounds sensible to me.

Now, I would say that the same remark as the previous bullet point apply: we shouldn’t do it unless it gets in the way of running/debugging the program.

I agree that these optimizations should not be done at -O0. I think not doing them is actually an improvement: you give the user what they asked, i.e. "no optimization", and an as-straigthforward-as-possible mapping from source to assembly.

* FastISel doesn’t\ seem to handle functions with switch statements, so it falls back to DAGISel. DAGISel produces code that's a lot better than GlobalISel for switch statement at -O0. I'm not sure if we need to do something here before enabling GlobalISel by default. I'm thinking we may need to add a smarter way to lower switch statements rather than just a cascaded sequence of conditional branches.

Sounds optimization-ish to me. Same remark.

Agreed, optimization-ish. In comparison to the above point on FastISel "peepholes", I think that lowering switch statements to something else than a cascaded sequence of conditional branches doesn't make the generated code harder to map to the source. So, in comparison to the above point on FastISel "peepholes", I'm not actively against being smarter about lowering switch statements at -O0.

But I agree, this shouldn't hold up turning on GlobalISel by default at -O0.

Other than the above remarks, before turning on GlobalISel by default, we'd better test/verify that debug info quality remains good.

I haven't looked into that at all, but am hoping to start looking into that soon, with the help of the DIVA tool (https://github.com/SNSystems/DIVA) presented at last EuroLLVM (http://llvm.org/devmtg/2017-03//assets/slides/diva_debug_information_visual_analyzer.pdf). I don't recall anyone so far making any statements about the quality of the debug info they've measured or experienced with GlobalISel enabled?

Other than the all of the above, I just wanted to mention that Oliver recently started running csmith on AArch64 GlobalISel and found one issue so far that already got fixed (https://bugs.llvm.org//show_bug.cgi?id=32733).

If he finds other correctness issues, I'm sure he'll keep on reporting them.

Quentin Colombet via llvm-dev

unread,

Apr 27, 2017, 1:40:11 PM4/27/17

to Kristof Beyls, llvm-dev, Justin Bogner, Ahmed Bougacha, Aditya Nandakumar, nd

Hi Kristof,

On Apr 27, 2017, at 9:47 AM, Kristof Beyls <kristo...@arm.com> wrote:

Hi Quentin,

On 27 Apr 2017, at 00:48, Quentin Colombet <qcol...@apple.com> wrote:

Hi Kristof,

On Apr 6, 2017, at 6:53 AM, Kristof Beyls <kristo...@arm.com> wrote:

I've been digging a little bit deeper into the biggest performance regressions I've observed.

What I've observed so far is:
* A lot of the biggest regressions are caused by unnecessarily moving floating point values through general purpose registers. I've raised http://bugs.llvm.org/show_bug.cgi?id=32550 for this. I think this one definitely needs fixing before enabling GlobalISel by default at -O0.

I commented in the PR. This is a known problem and we have a solution. Given this is an optimization in the sense that it does not affect the correctness of the program, we didn’t push for fixing it now.

For O0 we wanted to focus ourselves on generating correct code. Unless the regressions you are seeing are preventing debugging/running of the program, I wouldn’t block the flip of the switch on that.

What do you think?

For O0, I think most users care about fast compile time and an excellent debug experience.
Next to that, I am told that some users also use -O0 to have a straightforward, simple, mapping between source code and the generated assembly.

Out of the issues I've seen so far, I think this is the single "optimization" issue that I feel gives a negative first impression of GlobalISel.
If users look at the generated assembly for floating point code it looks more like active de-optimization rather than "no optimization".
My guess is also that this may be the biggest reason for the about 20% performance slow-down and 11% code size increase I measured earlier.
OTOH, I clearly agree this is an optimization issue, not a correctness issue.
Combining the above, I think it would be best if this issue got fixed before we turn on GlobalISel by default at -O0, or we may end up hearing quite a few (non-critical) complaints about this from users.
Basically I think this is a tradeoff between giving a better first impression of GlobalISel vs getting more people to use and test it earlier.

Thanks for the write-up on the PR, that is very useful.
Do you have any rough idea how much effort would be involved in getting this fixed?

I’d say 2-3 weeks. Could actually be shorter if we don’t do all the refactoring I have in mind to make the table for the alternative mappings smaller, but I don’t think it is worth taking any shortcut here.

I got the impression Daniel is making good progress on the tablegen generation, which is key to getting this issue fixed?

To be accurate, Daniel’s work on table gen is for the select phase, not regbankselect. In other words, right now, all the table for mappings for regbankselect are hand written and Daniel’s work is not changing that. The (weak) rationale is that it is not on the critical path :).

I think that no matter whether we decide to switch the default before or after fixing this issue, this is one of the most urgent issues to fix as far as I can see.

Agree. This is one of the top items of my todo list.

If there is some way I can help contribute to fixing PR32550, I would like to help; but with the dependency on tablegen generation, I'm not sure what the best way is to help make that PR get fixed faster?

Thanks for offering to help. I agree that with the dependency on the tabelgen generation in the way, this is probably not a good use of your time. Depending when I can spare some time on this, I’ll either do it or explain the refactoring in the google doc.

In the meantime, I’d suggest to focus on validating the debug info on your side.

* FastISel seems to transform division-by-constant-power-of-2 into right shift (see https://github.com/llvm-mirror/llvm/blob/master/lib/CodeGen/SelectionDAG/FastISel.cpp#L456-L468). GlobalISel does not. It seems to me that at -O0 there may be reasons not perform this transformation, but maybe there is a good reason why FastISel does this?

I think FastISel tries to generate the best code it can no matter what. For GISel O0 however, not doing this optimization sounds sensible to me.
Now, I would say that the same remark as the previous bullet point apply: we shouldn’t do it unless it gets in the way of running/debugging the program.

I agree that these optimizations should not be done at -O0. I think not doing them is actually an improvement: you give the user what they asked, i.e. "no optimization", and an as-straigthforward-as-possible mapping from source to assembly.

* FastISel doesn’t\ seem to handle functions with switch statements, so it falls back to DAGISel. DAGISel produces code that's a lot better than GlobalISel for switch statement at -O0. I'm not sure if we need to do something here before enabling GlobalISel by default. I'm thinking we may need to add a smarter way to lower switch statements rather than just a cascaded sequence of conditional branches.

Sounds optimization-ish to me. Same remark.

Agreed, optimization-ish. In comparison to the above point on FastISel "peepholes", I think that lowering switch statements to something else than a cascaded sequence of conditional branches doesn't make the generated code harder to map to the source. So, in comparison to the above point on FastISel "peepholes", I'm not actively against being smarter about lowering switch statements at -O0.
But I agree, this shouldn't hold up turning on GlobalISel by default at -O0.

Other than the above remarks, before turning on GlobalISel by default, we'd better test/verify that debug info quality remains good.
I haven't looked into that at all, but am hoping to start looking into that soon, with the help of the DIVA tool (https://github.com/SNSystems/DIVA) presented at last EuroLLVM (http://llvm.org/devmtg/2017-03//assets/slides/diva_debug_information_visual_analyzer.pdf). I don't recall anyone so far making any statements about the quality of the debug info they've measured or experienced with GlobalISel enabled?

We ran the lldb test suite with GISel. IIRC, Tim has the details, GISel was on part with SDISel.

Other than the all of the above, I just wanted to mention that Oliver recently started running csmith on AArch64 GlobalISel and found one issue so far that already got fixed (https://bugs.llvm.org//show_bug.cgi?id=32733).
If he finds other correctness issues, I'm sure he'll keep on reporting them.

Great!

Thanks,

-Quentin

Quentin Colombet via llvm-dev

unread,

May 8, 2017, 2:38:36 PM5/8/17

to Quentin Colombet, llvm-dev, Justin Bogner, Ahmed Bougacha, Aditya Nandakumar, nd

Hi Kristof,

I made a fix for PR32550, that is much less involved in r302453.

In particular, it does not rely on the greedy mode of the regbankselect pass, thus there is no compile time implication.

Could you try it and check where we stand?

Thanks,

-Quentin

Kristof Beyls via llvm-dev

unread,

May 9, 2017, 6:41:30 AM5/9/17

to Quentin Colombet, philli...@sony.com, llvm-dev, Justin Bogner, Ahmed Bougacha, Aditya Nandakumar, nd

Great Quentin :).

I've rerun the benchmarks comparing "-O0 -g" with "-O0 -g -mllvm -global-isel -mllvm -global-isel-abort=2" on r302453, on AArch64 Cortex-A57.

I indeed see almost no moves between GPR and FPR registers anymore (see details below for where I still see some).

On geomean, I see 13% slow down (down from 17% on my previous run).

On geomean, code size increase is about 3% (down from 11% on my previous run).

I also checked debug info quality with the DIVA tool on the test-suite; and asked one of our DS-5 validation guys to test debug experience for GlobalISel in combination with the ARM DS-5 Development Studio debugger. Both of us didn't find any issues with the debug info produced at -O0 -g with GlobalISel.

I didn't check compile time, nor do I intend to check that as I don't have a good infrastructure set up for that.

Overall, my impression is that GlobalISel is ready to be enabled by default for AArch64 at -O0, if others also believe it's ready.

Thanks,

Kristof

Detailed observations:

Debug info analysis with DIVA:
I used DIVA options "-a --show-generated --show-level --show-location" to dump and diff debug info differences for the test-suite for "-O0 -g" comparing GlobalISel with non-GlobalISel.
The only difference I found was on MultiSource/Benchmarks/tramp3d-v4:
001 37577 {Function} "operator*<double>" -> "Zero<double>"
- No declaration
- Template
002 {TemplateParameter} "T" <- "double"
- 002 37577 {Parameter} <- "Zero<double>"
002 37577 {Parameter} <- "const double &"
Which indicates that for non-GlobalISel, the Zero<double> parameter isn't present in the debug info for the instantiation of the below template function with T=double:
template<class T>
inline Zero<T> operator*(Zero<T>, const T&) { return Zero<T>(); }
In other words, the debug info looks ever so slightly better with GlobalISel.
Quick analysis of reasons for slow down for the 10 programs that regress the most when enabling GlobalISel at -O0:
- SingleSource/Benchmarks/BenchmarkGame/nsieve-bits (114% slowdown): no conversion of divide by power-of-2 to shift right. I think this is an improvement for -O0: no such optimization should be done when not requesting optimizations.
- MultiSource/Benchmarks/MallocBench/gs/gs (88%): "interp" function: switch statement lowered as cascaded-sequence-of-conditional-branches.
  Same issue causes MultiSource/Applications/sqlite3/sqlite3 (71%).
  Same issue causes MultiSource/Applications/lua/lua (46%).
- SingleSource/Benchmarks/Misc/flops-2 (75%): Poor lowering of fneg:
  - FastISel:
    ldur d0, [x29,#-16]
    fneg d0, d0
    stur d0, [x29,#-16]
  - GlobalISel:
    ldur d0, [x29,#-64]
    orr x8, xzr, #0x8000000000000000
    fmov d1, x8
    fsub d0, d1, d0
    fmov x8, d0
    stur x8, [x29,#-64]
- MultiSource/Benchmarks/Prolangs-C++/city/city (74%): a call to memcpy for copying 4 bytes is present with GlobalISel that isn't present with FastISel, in function vehicle::select_move().
  Same issue causes SingleSource/Benchmarks/Shootout-C++/Shootout-C++-moments (58.5%)
- MultiSource/Applications/siod/siod (72%): Seems to be mainly due to loading constants in the entry BB, but I'm not sure that indeed is the biggest cause. (https://bugs.llvm.org//show_bug.cgi?id=32561)
- MultiSource/Benchmarks/MiBench/consumer-typeset/consumer-typeset (48%): Due to creating all constants in the entry BB, see function CopyObject. (https://bugs.llvm.org//show_bug.cgi?id=32561)
- MultiSource/Benchmarks/mediabench/mpeg2/mpeg2dec/mpeg2decode (46%): Function Reference_IDCT: Probably due to creating all constants in the entry BB + spilling floating point data through an X register:
  - FastISel:
    fadd d0, d1, d0
    str d0, [sp,#528]
  - GlobalISel:
    fadd d0, d1, d0
    fmov x9, d0
    stur x9, [x29,#-48]
Other overall impression when comparing code generation: The code size increase is probably mainly due to creating constants always in the entry BB of the function. For functions with many constants, this leads to lots and lots of constant creation and then immediately spilling it to the stack. (https://bugs.llvm.org//show_bug.cgi?id=32561)

Quentin Colombet via llvm-dev

unread,

May 9, 2017, 2:47:36 PM5/9/17

to Kristof Beyls, philli...@sony.com, llvm-dev, Justin Bogner, Ahmed Bougacha, Aditya Nandakumar, nd

Hi Kristof,

On May 9, 2017, at 3:41 AM, Kristof Beyls <kristo...@arm.com> wrote:

Great Quentin :).

I've rerun the benchmarks comparing "-O0 -g" with "-O0 -g -mllvm -global-isel -mllvm -global-isel-abort=2" on r302453, on AArch64 Cortex-A57.

I indeed see almost no moves between GPR and FPR registers anymore (see details below for where I still see some).

On geomean, I see 13% slow down (down from 17% on my previous run).

On geomean, code size increase is about 3% (down from 11% on my previous run).

I also checked debug info quality with the DIVA tool on the test-suite; and asked one of our DS-5 validation guys to test debug experience for GlobalISel in combination with the ARM DS-5 Development Studio debugger. Both of us didn't find any issues with the debug info produced at -O0 -g with GlobalISel.

I didn't check compile time, nor do I intend to check that as I don't have a good infrastructure set up for that.

Overall, my impression is that GlobalISel is ready to be enabled by default for AArch64 at -O0, if others also believe it's ready.

Sounds good to me!

Renato, Diana, how does that sound?

Good finding, I forgot to do stores in my previous fix. I’ll do them shortly.

Other overall impression when comparing code generation: The code size increase is probably mainly due to creating constants always in the entry BB of the function. For functions with many constants, this leads to lots and lots of constant creation and then immediately spilling it to the stack. (https://bugs.llvm.org//show_bug.cgi?id=32561)

About a month (r299283), I mistakenly committed a prototype I was working on to solve this kind of issue. I reverted it in r299287, but you can give it a try.

Basically, that’s an additional generic pass that runs only at O0. It localizes (as in basic block local) the definition of the constants to workaround the limitations of O0 regalloc.

Cheers,

-Quentin

Eric Christopher via llvm-dev

unread,

May 9, 2017, 2:55:19 PM5/9/17

to Quentin Colombet, Kristof Beyls, philli...@sony.com, llvm-dev, Justin Bogner, Ahmed Bougacha, Aditya Nandakumar, nd

Hi Quentin,

On Tue, May 9, 2017 at 11:47 AM Quentin Colombet via llvm-dev <llvm...@lists.llvm.org> wrote:

Hi Kristof,

On May 9, 2017, at 3:41 AM, Kristof Beyls <kristo...@arm.com> wrote:

Great Quentin :).

I've rerun the benchmarks comparing "-O0 -g" with "-O0 -g -mllvm -global-isel -mllvm -global-isel-abort=2" on r302453, on AArch64 Cortex-A57.

I indeed see almost no moves between GPR and FPR registers anymore (see details below for where I still see some).

On geomean, I see 13% slow down (down from 17% on my previous run).

On geomean, code size increase is about 3% (down from 11% on my previous run).

I also checked debug info quality with the DIVA tool on the test-suite; and asked one of our DS-5 validation guys to test debug experience for GlobalISel in combination with the ARM DS-5 Development Studio debugger. Both of us didn't find any issues with the debug info produced at -O0 -g with GlobalISel.

I didn't check compile time, nor do I intend to check that as I don't have a good infrastructure set up for that.

Overall, my impression is that GlobalISel is ready to be enabled by default for AArch64 at -O0, if others also believe it's ready.

Sounds good to me!

Renato, Diana, how does that sound?

I'd definitely like to give this a run through on my side as well before you flip the switch. I'll try to do that this week.

-eric

Quentin Colombet via llvm-dev

unread,

May 9, 2017, 3:03:42 PM5/9/17

to Eric Christopher, nd, llvm-dev, Justin Bogner, Ahmed Bougacha, Aditya Nandakumar, philli...@sony.com

Hi Eric,

On May 9, 2017, at 11:55 AM, Eric Christopher <echr...@gmail.com> wrote:

Hi Quentin,

On Tue, May 9, 2017 at 11:47 AM Quentin Colombet via llvm-dev <llvm...@lists.llvm.org> wrote:
Hi Kristof,

On May 9, 2017, at 3:41 AM, Kristof Beyls <kristo...@arm.com> wrote:

Great Quentin :).

I've rerun the benchmarks comparing "-O0 -g" with "-O0 -g -mllvm -global-isel -mllvm -global-isel-abort=2" on r302453, on AArch64 Cortex-A57.
I indeed see almost no moves between GPR and FPR registers anymore (see details below for where I still see some).
On geomean, I see 13% slow down (down from 17% on my previous run).
On geomean, code size increase is about 3% (down from 11% on my previous run).
I also checked debug info quality with the DIVA tool on the test-suite; and asked one of our DS-5 validation guys to test debug experience for GlobalISel in combination with the ARM DS-5 Development Studio debugger. Both of us didn't find any issues with the debug info produced at -O0 -g with GlobalISel.
I didn't check compile time, nor do I intend to check that as I don't have a good infrastructure set up for that.

Overall, my impression is that GlobalISel is ready to be enabled by default for AArch64 at -O0, if others also believe it's ready.

Sounds good to me!

Renato, Diana, how does that sound?

I'd definitely like to give this a run through on my side as well before you flip the switch. I'll try to do that this week.

Thanks for doing this. Waiting for your inputs then.

Cheers,

-Quentin

Renato Golin via llvm-dev

unread,

May 9, 2017, 3:34:17 PM5/9/17

to Quentin Colombet, philli...@sony.com, llvm-dev, Justin Bogner, Ahmed Bougacha, Aditya Nandakumar, nd

On 9 May 2017 at 19:47, Quentin Colombet <qcol...@apple.com> wrote:
> Sounds good to me!
>
> Renato, Diana, how does that sound?

Hi Quentin,

I'll defer to Diana, as she was looking into this. If she's happy, I'm happy. :)

Quentin Colombet via llvm-dev

unread,

May 10, 2017, 11:37:28 AM5/10/17

to Kristof Beyls, philli...@sony.com, llvm-dev, Justin Bogner, Ahmed Bougacha, Aditya Nandakumar, nd

On May 9, 2017, at 11:47 AM, Quentin Colombet via llvm-dev <llvm...@lists.llvm.org> wrote:

Hi Kristof,

Should be fixed by r302679

Kristof Beyls via llvm-dev

unread,

May 11, 2017, 2:44:35 AM5/11/17

to Quentin Colombet, llvm-dev, nd, Ahmed Bougacha, Justin Bogner, Aditya Nandakumar

On 10 May 2017, at 17:36, Quentin Colombet <qcol...@apple.com> wrote:

MultiSource/Benchmarks/mediabench/mpeg2/mpeg2dec/mpeg2decode (46%): Function Reference_IDCT: Probably due to creating all constants in the entry BB + spilling floating point data through an X register:

FastISel:
fadd d0, d1, d0
str d0, [sp,#528]
GlobalISel:
fadd d0, d1, d0
fmov x9, d0
stur x9, [x29,#-48]

Good finding, I forgot to do stores in my previous fix. I’ll do them shortly.

Should be fixed by r302679

Thanks Quentin,

That reduces the slow-down when enabling globalisel at -O0 from 13% (on r302453) to 9.5% (on r302679) in my experiments.

The code size increase also reduces from just over 3% to 2.8%.

Kristof

Diana Picus via llvm-dev

unread,

May 11, 2017, 7:02:14 AM5/11/17

to Kristof Beyls, Justin Bogner, llvm-dev, Ahmed Bougacha, Aditya Nandakumar, nd

Hi all,

I'm still running some validation on this, I'll send an email when
it's done. If that goes well I don't have anything against making the
switch.

For the record, here's a summary of issues that were deferred for
later on (some of which are optimization-ish and we might decide to
never do at -O0 at all):
* Crash in RegBankSelect for half fp types:
https://bugs.llvm.org/show_bug.cgi?id=32560
* Improving constant placement: http://bugs.llvm.org/show_bug.cgi?id=32561
* Fancy switch lowering
* Transforming division-by-constant-power-of-2 into right shift

AFAICT all the other issues that were brought up were fixed (yay!).

Cheers,
Diana

On 11 May 2017 at 08:44, Kristof Beyls via llvm-dev

Quentin Colombet via llvm-dev

unread,

May 11, 2017, 12:40:22 PM5/11/17

to Diana Picus, llvm-dev, Justin Bogner, Ahmed Bougacha, Aditya Nandakumar, nd

Hi Diana,

Thanks for the summary.

On May 11, 2017, at 4:01 AM, Diana Picus <diana...@linaro.org> wrote:

Hi all,

I'm still running some validation on this, I'll send an email when
it's done. If that goes well I don't have anything against making the
switch.

For the record, here's a summary of issues that were deferred for
later on (some of which are optimization-ish and we might decide to
never do at -O0 at all):
* Crash in RegBankSelect for half fp types:
https://bugs.llvm.org/show_bug.cgi?id=32560

I’ll have a look.

* Improving constant placement: http://bugs.llvm.org/show_bug.cgi?id=32561

I’ve commented in the PR to mention the localizer technic I was playing with, if someone wants to give it a try.

* Fancy switch lowering
* Transforming division-by-constant-power-of-2 into right shift

AFAICT all the other issues that were brought up were fixed (yay!).

Cheers,
Diana

Cheers,

-Quentin

Diana Picus via llvm-dev

unread,

May 12, 2017, 7:45:10 AM5/12/17

to Quentin Colombet, llvm-dev, Justin Bogner, Ahmed Bougacha, Aditya Nandakumar, nd

Hi,

I ran into a little snag on the test-suite:
https://bugs.llvm.org/show_bug.cgi?id=33021
It boils down to GlobalISel generating calls to fabs instead of using
FABSDr (so we get undefined references).

Cheers,
Diana

Diana Picus via llvm-dev

unread,

May 12, 2017, 8:22:46 AM5/12/17

to Quentin Colombet, llvm-dev, Justin Bogner, Ahmed Bougacha, Aditya Nandakumar, nd

It turns out that can be fixed by adding -lm to the link line, so I
will probably convert it into a test-suite bug.

I don't suppose it's crucial to handle the fabs intrinsic nicely at -O0.

Eric Christopher via llvm-dev

unread,

May 12, 2017, 2:07:14 PM5/12/17

to Diana Picus, Quentin Colombet, llvm-dev, nd, Ahmed Bougacha, Justin Bogner, Aditya Nandakumar

Agreed. That was probably just luck before :)

-eric

Diana Picus via llvm-dev

unread,

May 15, 2017, 3:38:58 AM5/15/17

to Eric Christopher, Justin Bogner, llvm-dev, Ahmed Bougacha, Aditya Nandakumar, nd

Got another one: https://bugs.llvm.org/show_bug.cgi?id=33036

I'm still investigating whether this is an actual GlobalISel issue or
something else (I'll also start a build on a more recent revision to
see how that behaves).

Diana Picus via llvm-dev

unread,

May 16, 2017, 8:06:36 AM5/16/17

to Eric Christopher, Justin Bogner, llvm-dev, Ahmed Bougacha, Aditya Nandakumar, nd

Turns out it really is a GlobalISel issue - we eat up too much stack
space because all the constants used in the switches are stored on the
stack. We need to fix this somehow before changing the default. I'll
try to give it a run with Quentin's r299283 on top to see if it helps.

Cheers,
Diana

Kristof Beyls via llvm-dev

unread,

May 18, 2017, 3:06:48 AM5/18/17

to Diana Picus, llvm-dev, Justin Bogner, Ahmed Bougacha, Aditya Nandakumar, nd

FWIW, I ran Quentin's r299283 Localizer patch (see also http://bugs.llvm.org/show_bug.cgi?id=32561) on my benchmark set.

This is taking the commit in r299283 + adding the pass to the pipeline right after RegBankSelect.

It reduces the slow-down when enabling globalisel at -O0 from 9.5% (on r302453) to 6.3% (on r302679) in my experiments.

The code size increase also reduces from just over 2.8% to 1.8%.

I haven't measured the impact on compile-time.

I think those are nice improvements; but I wouldn't hold up enabling-by-default for those improvements.

However, the increase in stack size usage being so huge that a bootstrap fails seems like something that should be addressed before enabling-by-default.

I think Diana found that when enabling r299283, the bootstrap failed with llvm-tblgen segfaulting.

So there clearly is some work required there.

Thanks,

Kristof

Diana Picus via llvm-dev

unread,

May 18, 2017, 4:15:45 AM5/18/17

to Kristof Beyls, llvm-dev, Justin Bogner, Ahmed Bougacha, Aditya Nandakumar, nd

On 18 May 2017 at 09:06, Kristof Beyls <Kristo...@arm.com> wrote:
> I think Diana found that when enabling r299283, the bootstrap failed with
> llvm-tblgen segfaulting.
> So there clearly is some work required there.

Indeed.

@Quentin, what is the status of that patch? Have you been working on
it since then? Should I investigate the segfault more and send you a
reproducer? Is this patch the way forward, or do you have other ideas
for reducing the stack usage?

Thanks,
Diana

Quentin Colombet via llvm-dev

unread,

May 18, 2017, 1:09:57 PM5/18/17

to Diana Picus, llvm-dev, Justin Bogner, Ahmed Bougacha, Aditya Nandakumar, nd

Hi Diana,

> On May 18, 2017, at 1:15 AM, Diana Picus <diana...@linaro.org> wrote:
>
> On 18 May 2017 at 09:06, Kristof Beyls <Kristo...@arm.com> wrote:
>> I think Diana found that when enabling r299283, the bootstrap failed with
>> llvm-tblgen segfaulting.
>> So there clearly is some work required there.
>
> Indeed.
>
> @Quentin, what is the status of that patch?

Initially it was meant as a prototype to investigate how we could address those issues at O0. I didn’t mean to publish it.

> Have you been working on
> it since then?

No, I haven’t touched it and I honestly didn’t plan to do that.

> Should I investigate the segfault more and send you a
> reproducer?

Depends, do you guys want to pursue in that direction?
My though was that it is probably best to rely on a better reg allocator (basic or greedy) scheme for O0. I believe the only concern may be compile time but it may just fly (+ I have ideas how to make it better pretty easy).

> Is this patch the way forward, or do you have other ideas
> for reducing the stack usage?

Better reg alloc scheme for O0 :).

Diana Picus via llvm-dev

unread,

May 19, 2017, 4:34:07 AM5/19/17

to Quentin Colombet, llvm-dev, Justin Bogner, Ahmed Bougacha, Aditya Nandakumar, nd

On 18 May 2017 at 19:09, Quentin Colombet <qcol...@apple.com> wrote:
> Hi Diana,
>

>> On May 18, 2017, at 1:15 AM, Diana Picus <diana...@linaro.org> wrote:
>>
>> On 18 May 2017 at 09:06, Kristof Beyls <Kristo...@arm.com> wrote:
>>> I think Diana found that when enabling r299283, the bootstrap failed with
>>> llvm-tblgen segfaulting.
>>> So there clearly is some work required there.
>>
>> Indeed.
>>
>> @Quentin, what is the status of that patch?
>
> Initially it was meant as a prototype to investigate how we could address those issues at O0. I didn’t mean to publish it.
>
>> Have you been working on
>> it since then?
>
> No, I haven’t touched it and I honestly didn’t plan to do that.
>
>> Should I investigate the segfault more and send you a
>> reproducer?
>
> Depends, do you guys want to pursue in that direction?

Not necessarily, if you think a better reg alloc scheme will do the
trick then I see no point in complicating the pass pipeline for now.

> My though was that it is probably best to rely on a better reg allocator (basic or greedy) scheme for O0. I believe the only concern may be compile time but it may just fly (+ I have ideas how to make it better pretty easy).
>
>> Is this patch the way forward, or do you have other ideas
>> for reducing the stack usage?
>
> Better reg alloc scheme for O0 :).

Ok, let's go with that then :)

Quentin Colombet via llvm-dev

unread,

May 19, 2017, 1:06:29 PM5/19/17

to Diana Picus, llvm-dev, Justin Bogner, Ahmed Bougacha, Aditya Nandakumar, nd

Hi Diana,

On May 19, 2017, at 1:33 AM, Diana Picus <diana...@linaro.org> wrote:

On 18 May 2017 at 19:09, Quentin Colombet <qcol...@apple.com> wrote:
Hi Diana,

On May 18, 2017, at 1:15 AM, Diana Picus <diana...@linaro.org> wrote:

On 18 May 2017 at 09:06, Kristof Beyls <Kristo...@arm.com> wrote:
I think Diana found that when enabling r299283, the bootstrap failed with
llvm-tblgen segfaulting.
So there clearly is some work required there.

Indeed.

@Quentin, what is the status of that patch?

Initially it was meant as a prototype to investigate how we could address those issues at O0. I didn’t mean to publish it.

Have you been working on
it since then?

No, I haven’t touched it and I honestly didn’t plan to do that.

Should I investigate the segfault more and send you a
reproducer?

Depends, do you guys want to pursue in that direction?

Not necessarily, if you think a better reg alloc scheme will do the
trick then I see no point in complicating the pass pipeline for now.

My though was that it is probably best to rely on a better reg allocator (basic or greedy) scheme for O0. I believe the only concern may be compile time but it may just fly (+ I have ideas how to make it better pretty easy).

Is this patch the way forward, or do you have other ideas
for reducing the stack usage?

Better reg alloc scheme for O0 :).

Ok, let's go with that then :)

Sounds good, let me push a patch that would allow to use the greedy allocator at O0 and see if that’s sufficient. Then, we will look at the compile time that change will imply.

Cheers,

-Quentin

Quentin Colombet via llvm-dev

unread,

May 19, 2017, 1:23:40 PM5/19/17

to Eric Christopher, philli...@sony.com, llvm-dev, Justin Bogner, Ahmed Bougacha, Aditya Nandakumar, nd

Hi Eric,

How is this looking on your side?

Just curious if you have additional feedbacks, we are still fixing the latest concerns Diana and Krystof reported.

Cheers,

-Quentin

Diana Picus via llvm-dev

unread,

May 22, 2017, 3:10:23 AM5/22/17

to Quentin Colombet, llvm-dev, Justin Bogner, Ahmed Bougacha, Aditya Nandakumar, nd

Hi Quentin,

I actually did a run with -mllvm -optimize-regalloc -mllvm
-regalloc=greedy over the weekend and the test does pass with that.
Haven't measured the compile time though.

Cheers,
Diana

Kristof Beyls via llvm-dev

unread,

May 22, 2017, 3:51:13 AM5/22/17

to Diana Picus, Justin Bogner, llvm-dev, Ahmed Bougacha, Aditya Nandakumar, nd

> On 22 May 2017, at 09:09, Diana Picus <diana...@linaro.org> wrote:
>
> Hi Quentin,
>
> I actually did a run with -mllvm -optimize-regalloc -mllvm
> -regalloc=greedy over the weekend and the test does pass with that.
> Haven't measured the compile time though.
>
> Cheers,
> Diana

I also did my usual benchmarking run with the same options as Diana did above:
- Comparing against -O0 without globalisel: 2.5% performance drop, 0.8% code size improvement.
- Comparing against -O0 without globalisel but with the above regalloc options: 5.6% performance drop, 1% code size drop.

In summary, the measurements indicate some good improvements.
I also haven't measure the impact on compile time.

Doing a few quick spot checks on the generated code, I still see some constants being created in the entry BB and stored on the stack.
I haven't tried to investigate if the entry BB seems like a good place to materialize those remaining constants.

Thanks,

Kristof

Quentin Colombet via llvm-dev

unread,

May 23, 2017, 3:49:07 PM5/23/17

to Kristof Beyls, llvm-dev, Justin Bogner, Ahmed Bougacha, Aditya Nandakumar, nd

Great!

I thought I had to look at our pipeline at O0 to make sure optimized regalloc was supported (https://bugs.llvm.org/show_bug.cgi?id=33022 in mind). Glad I was wrong, it saves me some time.

On May 22, 2017, at 12:51 AM, Kristof Beyls <kristo...@arm.com> wrote:

On 22 May 2017, at 09:09, Diana Picus <diana...@linaro.org> wrote:

Hi Quentin,

I actually did a run with -mllvm -optimize-regalloc -mllvm
-regalloc=greedy over the weekend and the test does pass with that.
Haven't measured the compile time though.

Cheers,
Diana

I also did my usual benchmarking run with the same options as Diana did above:
- Comparing against -O0 without globalisel: 2.5% performance drop, 0.8% code size improvement.

That’s compared to 9.5% performance drop and 2.8% code size regression, without that regalloc scheme, right?

- Comparing against -O0 without globalisel but with the above regalloc options: 5.6% performance drop, 1% code size drop.

In summary, the measurements indicate some good improvements.
I also haven't measure the impact on compile time.

Do you have a mean to make this measurement?

Ahmed did a bunch of compile time measurements on our side and I wanted to see if I need to put him on the hook again :).

Doing a few quick spot checks on the generated code, I still see some constants being created in the entry BB and stored on the stack.

Feel free to file PRs if you think that shouldn’t happen.

@Eric, how does it look on your side?

Cheers,

-Quentin

Kristof Beyls via llvm-dev

unread,

May 24, 2017, 9:00:48 AM5/24/17

to Quentin Colombet, llvm-dev, Justin Bogner, Ahmed Bougacha, Aditya Nandakumar, nd

On 23 May 2017, at 21:48, Quentin Colombet <qcol...@apple.com> wrote:

Great!
I thought I had to look at our pipeline at O0 to make sure optimized regalloc was supported (https://bugs.llvm.org/show_bug.cgi?id=33022 in mind). Glad I was wrong, it saves me some time.

On May 22, 2017, at 12:51 AM, Kristof Beyls <kristo...@arm.com> wrote:

On 22 May 2017, at 09:09, Diana Picus <diana...@linaro.org> wrote:

Hi Quentin,

I actually did a run with -mllvm -optimize-regalloc -mllvm
-regalloc=greedy over the weekend and the test does pass with that.
Haven't measured the compile time though.

Cheers,
Diana

I also did my usual benchmarking run with the same options as Diana did above:
- Comparing against -O0 without globalisel: 2.5% performance drop, 0.8% code size improvement.

That’s compared to 9.5% performance drop and 2.8% code size regression, without that regalloc scheme, right?

Indeed.

- Comparing against -O0 without globalisel but with the above regalloc options: 5.6% performance drop, 1% code size drop.

In summary, the measurements indicate some good improvements.
I also haven't measure the impact on compile time.

Do you have a mean to make this measurement?

Ahmed did a bunch of compile time measurements on our side and I wanted to see if I need to put him on the hook again :).

I did a quick setup with CTMark (part of the test-suite). I ran each of

* '-O0 -g',

* '-O0 -g -mllvm -global-isel=true -mllvm -global-isel-abort=0', and

* '-O0 -g -mllvm -global-isel=true -mllvm -global-isel-abort=0 -mllvm -optimize-regalloc -mllvm -regalloc=greedy'

5 times, cross-compiling from X86 to AArch64, and took the median measured compile times.

In summary, I see GlobalISel having a compile time that's 3.5% higher than the current -O0 default.

With enabling the greedy register allocator, this increases to 28%.

28% is probably too high? At the moment I can't think of an alternative to having a "constant materialization localizer" pass at -O0 to hit all the metrics we thought of as necessary before enabling GISel by default.

It would be good if someone else could also do a compilation time experiment - just to make sure I didn't make any silly mistakes in my experiment.

Here are the details I see:

	gisel	gisel+greedy
CTMark/7zip/7zip-benchmark	102.8%	106.5%
CTMark/Bullet/bullet	100.5%	105.1%
CTMark/ClamAV/clamscan	101.6%	130.8%
CTMark/SPASS/SPASS	101.2%	120.0%
CTMark/consumer-typeset/consumer-typeset	105.7%	138.2%
CTMark/kimwitu++/kc	103.1%	122.6%
CTMark/lencod/lencod	106.2%	143.4%
CTMark/mafft/pairlocalalign	96.2%	135.4%
CTMark/sqlite3/sqlite3	109.1%	155.1%
CTMark/tramp3d-v4/tramp3d-v4	109.1%	132.0%
GEOMEAN	103.5%	128.0%

Thanks,

Kristof

Quentin Colombet via llvm-dev

unread,

May 24, 2017, 1:32:03 PM5/24/17

to Kristof Beyls, llvm-dev, Justin Bogner, Ahmed Bougacha, Aditya Nandakumar, nd

Hi Kristof,

Thanks for the measurements.

I think it is yes.

I have attached a quick hack to the greedy allocator to feature a fast mode.

Could you give it a try?

To enable the fast mode, please use (-mllvm) -regalloc-greedy-fast=true (default is false).

Thanks,

-Quentin

regalloc-fastmode.diff

Kristof Beyls via llvm-dev

unread,

May 24, 2017, 3:57:23 PM5/24/17

to Quentin Colombet, llvm-dev, Justin Bogner, Ahmed Bougacha, Aditya Nandakumar, nd

On 24 May 2017, at 19:31, Quentin Colombet <qcol...@apple.com> wrote:

Hi Kristof,

Thanks for the measurements.

On May 24, 2017, at 6:00 AM, Kristof Beyls <kristo...@arm.com> wrote:

- Comparing against -O0 without globalisel but with the above regalloc options: 5.6% performance drop, 1% code size drop.

In summary, the measurements indicate some good improvements.
I also haven't measure the impact on compile time.

Do you have a mean to make this measurement?

Ahmed did a bunch of compile time measurements on our side and I wanted to see if I need to put him on the hook again :).

I did a quick setup with CTMark (part of the test-suite). I ran each of

* '-O0 -g',

* '-O0 -g -mllvm -global-isel=true -mllvm -global-isel-abort=0', and

* '-O0 -g -mllvm -global-isel=true -mllvm -global-isel-abort=0 -mllvm -optimize-regalloc -mllvm -regalloc=greedy'

5 times, cross-compiling from X86 to AArch64, and took the median measured compile times.

In summary, I see GlobalISel having a compile time that's 3.5% higher than the current -O0 default.

With enabling the greedy register allocator, this increases to 28%.

28% is probably too high?

I think it is yes.

I have attached a quick hack to the greedy allocator to feature a fast mode.

Could you give it a try?

To enable the fast mode, please use (-mllvm) -regalloc-greedy-fast=true (default is false).

I'm afraid it doesn't seem to save much compile time. On geomean, I see about 26% compile time increase against the current -O0 default (compared to 28% increase for regalloc greedy without your patch).

<regalloc-fastmode.diff>

Quentin Colombet via llvm-dev

unread,

May 24, 2017, 4:02:07 PM5/24/17

to Kristof Beyls, llvm-dev, Justin Bogner, Ahmed Bougacha, Aditya Nandakumar, nd

Hi Kristof,

Thanks for going back so fast!

On May 24, 2017, at 12:57 PM, Kristof Beyls <kristo...@arm.com> wrote:

On 24 May 2017, at 19:31, Quentin Colombet <qcol...@apple.com> wrote:

Hi Kristof,

Thanks for the measurements.

On May 24, 2017, at 6:00 AM, Kristof Beyls <kristo...@arm.com> wrote:

- Comparing against -O0 without globalisel but with the above regalloc options: 5.6% performance drop, 1% code size drop.

In summary, the measurements indicate some good improvements.
I also haven't measure the impact on compile time.

Do you have a mean to make this measurement?
Ahmed did a bunch of compile time measurements on our side and I wanted to see if I need to put him on the hook again :).

I did a quick setup with CTMark (part of the test-suite). I ran each of
* '-O0 -g',
* '-O0 -g -mllvm -global-isel=true -mllvm -global-isel-abort=0', and
* '-O0 -g -mllvm -global-isel=true -mllvm -global-isel-abort=0 -mllvm -optimize-regalloc -mllvm -regalloc=greedy'
5 times, cross-compiling from X86 to AArch64, and took the median measured compile times.
In summary, I see GlobalISel having a compile time that's 3.5% higher than the current -O0 default.
With enabling the greedy register allocator, this increases to 28%.
28% is probably too high?

I think it is yes.
I have attached a quick hack to the greedy allocator to feature a fast mode.
Could you give it a try?

To enable the fast mode, please use (-mllvm) -regalloc-greedy-fast=true (default is false).

I'm afraid it doesn't seem to save much compile time. On geomean, I see about 26% compile time increase against the current -O0 default (compared to 28% increase for regalloc greedy without your patch).

Interesting, I guess a lot of time is spent in the coalescer. Could you give a try with -join-liveintervals=false?

Do you know where the time is spent (-time-passes)?

Anyhow, fixing all of those, although this is I think the right approach, will take time, so we can go with the localizer.

Cheers,

-Quentin

Kristof Beyls via llvm-dev

unread,

May 25, 2017, 5:09:15 AM5/25/17

to Quentin Colombet, llvm-dev, Justin Bogner, Ahmed Bougacha, Aditya Nandakumar, nd

With adding -join-liveintervals=false, I see the compile time increase going up to 28% again.

Do you know where the time is spent (-time-passes)?

I'm afraid I won't have time to have a closer look in the next couple of days - I don't know where the time is spent at the moment.

Anyhow, fixing all of those, although this is I think the right approach, will take time, so we can go with the localizer.

Right, I don't understand the register allocator well enough to know if that compile time overhead can be fixed, while still getting the necessary codegen benefits the greedy allocator gives.

Is there any specific help you're looking for with getting the localizer work well enough for production use?

Thanks,

Kristof

Quentin Colombet via llvm-dev

unread,

May 25, 2017, 4:53:53 PM5/25/17

to Kristof Beyls, llvm-dev, Justin Bogner, Ahmed Bougacha, Aditya Nandakumar, nd

Hi Kristof,

Heh, I am mildly surprised we hand much more live-ranges to the allocator when we do that.

Do you know where the time is spent (-time-passes)?

I'm afraid I won't have time to have a closer look in the next couple of days - I don't know where the time is spent at the moment.

Fair enough, will investigate later.

Anyhow, fixing all of those, although this is I think the right approach, will take time, so we can go with the localizer.

Right, I don't understand the register allocator well enough to know if that compile time overhead can be fixed, while still getting the necessary codegen benefits the greedy allocator gives.
Is there any specific help you're looking for with getting the localizer work well enough for production use?

I’ll clean-up the WIP patch for the localizer, then you guys can fix the bug that you found.

I’ll do that tomorrow.

Cheers,

-Quentin

Thanks,

Kristof

Quentin Colombet via llvm-dev

unread,

May 26, 2017, 9:36:39 PM5/26/17

to Kristof Beyls, llvm-dev, Justin Bogner, Ahmed Bougacha, Aditya Nandakumar, nd

Hi Kristof,

I’ve pushed the localizer in r304051 and added it in the AArch64 O0 pipeline in r304052.

I let Diana investigate the seg fault she was seeing.

@Diana, let me know if you need help.

Cheers,

-Quentin

Diana Picus via llvm-dev

unread,

May 29, 2017, 4:07:00 AM5/29/17

to Quentin Colombet, llvm-dev, Justin Bogner, Ahmed Bougacha, Aditya Nandakumar, nd

Thanks Quentin, it's in progress now, I'll let you know how it goes.

Cheers,
Diana

Diana Picus via llvm-dev

unread,

May 30, 2017, 9:56:57 AM5/30/17

to Quentin Colombet, llvm-dev, Justin Bogner, Ahmed Bougacha, Aditya Nandakumar, nd

Hi Quentin,

I've attached a reproducer for the problem.

I've described what I think the problem is in the file, but the short
version is that the localizer shouldn't assume that the iteration
order for the uses corresponds to the logical order of instructions in
a basic block (we're now localizing before the first use that we find,
but that may be later in the basic block, so we'd end up with uses
before the def).

I'm not sure it's possible to test this without running a couple of
passes. You might be able to trigger it only with reg bank select +
localize, but I haven't tried. Using only the localizer would mean
that the iteration order for the uses would be the order in which
they're read in, so you wouldn't have this problem.

Hope that helps,
Diana

localizer-mo-order.mir

Quentin Colombet via llvm-dev

unread,

May 30, 2017, 10:42:51 AM5/30/17

to Diana Picus, llvm-dev, Justin Bogner, Ahmed Bougacha, Aditya Nandakumar, nd

Thanks Diana.

That is indeed the assumption in the code and this is obviously wrong.

Could you try the attached patch?

(I haven’t even tried to compile it though)

Cheers,
-Quentin

localizer_tentative_fix.diff

Quentin Colombet via llvm-dev

unread,

May 30, 2017, 4:57:16 PM5/30/17

to Quentin Colombet, llvm-dev, Justin Bogner, Ahmed Bougacha, Aditya Nandakumar, nd

Hi Diana,

I’ve actually gone ahead and pushed the fix as I was able to produce a small reproducer.

This is r304244

Let me know if you encounter any other problem.

Cheers,

-Quentin

<localizer_tentative_fix.diff>

<localizer-mo-order.mir>

Diana Picus via llvm-dev

unread,

May 31, 2017, 9:33:58 AM5/31/17

to Quentin Colombet, llvm-dev, nd, Ahmed Bougacha, Justin Bogner, Aditya Nandakumar

Cool test :)

It seems to work fine now, I don't see any new failures. IIUC, Kristof is also giving it another run.

Cheers,

Diana

Kristof Beyls via llvm-dev

unread,

May 31, 2017, 10:46:06 AM5/31/17

to Diana Picus, Justin Bogner, llvm-dev, Ahmed Bougacha, Aditya Nandakumar, nd

On 31 May 2017, at 15:33, Diana Picus via llvm-dev <llvm...@lists.llvm.org> wrote:

Cool test :)
It seems to work fine now, I don't see any new failures. IIUC, Kristof is also giving it another run.

Cheers,

Diana

Latest comparisons on my side, after picking up r304244, i.e. the correct Localizer pass.

* CTMark compile time, comparing "-O0 -g" vs '-O0 -g -mllvm -global-isel=true -mllvm -global-isel-abort=0': about 6% increase with globalisel. This was about 3.5% before the Localizer pass landed.

* My usual performance benchmarking run: 8.5% slow-down. This was about 9.5% before the Localizer pass landed, so a slight improvement.

* Code size: 3.14% larger. This was about 2.8% before the Localizer pass landed, so a slight regression.

* Debug info quality: I didn't do another recheck, trusting that the Localizer pass wouldn't change debug info quality.

* Stack size usage: I don't know of a good way to measure this, but Diana's experiments show that at least for bootstrapping it went from "problematically bad" to "OK".

Thanks,

Kristof

Quentin Colombet via llvm-dev

unread,

May 31, 2017, 11:07:16 AM5/31/17

to Kristof Beyls, llvm-dev, Justin Bogner, Ahmed Bougacha, Aditya Nandakumar, nd

Hi Kristof,

Thanks for the updated numbers.

On May 31, 2017, at 7:45 AM, Kristof Beyls <kristo...@arm.com> wrote:

On 31 May 2017, at 15:33, Diana Picus via llvm-dev <llvm...@lists.llvm.org> wrote:

Cool test :)
It seems to work fine now, I don't see any new failures. IIUC, Kristof is also giving it another run.

Cheers,
Diana

Latest comparisons on my side, after picking up r304244, i.e. the correct Localizer pass.
* CTMark compile time, comparing "-O0 -g" vs '-O0 -g -mllvm -global-isel=true -mllvm -global-isel-abort=0': about 6% increase with globalisel. This was about 3.5% before the Localizer pass landed.

That one is surprising too. I wouldn’t have expected this pass to show up in the compile time profile. At least not to this extend.

What is the biggest offender?

* My usual performance benchmarking run: 8.5% slow-down. This was about 9.5% before the Localizer pass landed, so a slight improvement.
* Code size: 3.14% larger. This was about 2.8% before the Localizer pass landed, so a slight regression.

That one is surprising. Do you have an idea of what is happening?

Alternatively if you can point me to the biggest offender, I can have a look.

The only thing I can think of is that we duplicate constants that are expensive to materialize. If that’s the case, we were discussing with Ahmed an alternative to the localizer pass that would operate during InstructionSelect so may be worth pursuing.

* Debug info quality: I didn't do another recheck, trusting that the Localizer pass wouldn't change debug info quality.
* Stack size usage: I don't know of a good way to measure this, but Diana's experiments show that at least for bootstrapping it went from "problematically bad" to "OK".

Thanks,

Thanks,

-Quentin

Kristof Beyls via llvm-dev

unread,

Jun 1, 2017, 10:46:59 AM6/1/17

to Quentin Colombet, llvm-dev, Justin Bogner, Ahmed Bougacha, Aditya Nandakumar, nd

On 31 May 2017, at 17:07, Quentin Colombet <qcol...@apple.com> wrote:

Latest comparisons on my side, after picking up r304244, i.e. the correct Localizer pass.

* CTMark compile time, comparing "-O0 -g" vs '-O0 -g -mllvm -global-isel=true -mllvm -global-isel-abort=0': about 6% increase with globalisel. This was about 3.5% before the Localizer pass landed.

That one is surprising too. I wouldn’t have expected this pass to show up in the compile time profile. At least not to this extend.

What is the biggest offender?

Hmmm. So I took the 3.5% compile time overhead from my last measurement before the localizer landed, from around 24th of May.

When using -ftime-report, I see the Localizer pass typically taking very roughly about 1% of compile time.

Maybe another part of GlobalISel became a bit slower since I did that 3.5% measurement?

Or maybe the Localizer pass changes the structure of the program so that another later pass gets a different compile time profile?

Basically, I'd have to do more experiments to figure that one out.

As far as where time is spent in the gisel-passes itself, on average, I saw the following on the latest CTMark experiment I ran:

Avg compile time spent in IRTranslator: 4.61%

Avg compile time spent in InstructionSelect: 7.51%

Avg compile time spent in Legalizer: 1.06%

Avg compile time spent in Localizer: 0.76%

Avg compile time spent in RegBankSelect: 2.12%

* My usual performance benchmarking run: 8.5% slow-down. This was about 9.5% before the Localizer pass landed, so a slight improvement.

* Code size: 3.14% larger. This was about 2.8% before the Localizer pass landed, so a slight regression.

That one is surprising. Do you have an idea of what is happening?

Alternatively if you can point me to the biggest offender, I can have a look.

So the biggest offenders on the mem_bytes metric in LNT are:

	O0 -g	O0 -g gisel-with-localizer	O0 -g gisel-without-localizer
SingleSource/Benchmarks/Misc/perlin	14272	14640	18344	25.95%
SingleSource/Benchmarks/Dhrystone/dry	16560	17144	20160	18.21%
SingleSource/Benchmarks/Stanford/QueensProfile	13912	14192	15136	6.79%
MultiSource/Benchmarks/Trimaran/netbench-url/netbench-url	71400	72272	75504	4.53%

I haven't had time to investigate what exact changes make the code size go up that much with the localizer pass in those cases...

Quentin Colombet via llvm-dev

unread,

Jun 6, 2017, 1:12:00 PM6/6/17

to Kristof Beyls, echr...@gmail.com, llvm-dev, Justin Bogner, Ahmed Bougacha, Aditya Nandakumar, nd

Thanks Kristof.

Sounds like we'll need to investigate though I'd say it is not blocking the switch.

At this point I think everybody is on board to flip the switch.

@Eric, how does that sound to you?

Thanks,

Q

Diana Picus via llvm-dev

unread,

Jun 12, 2017, 12:55:07 PM6/12/17

to Quentin Colombet, llvm-dev, Justin Bogner, Ahmed Bougacha, Aditya Nandakumar, nd

Hi all,

I added a buildbot [1] running the test-suite with -O0 -global-isel. It runs into the same 2 timeouts that I reported previously on this thread (paq8p and scimark2). It would be nice to make it green before flipping the switch.

At the moment, it lives in an internal buildmaster that I've setup for this purpose. If we fix it and it proves to be stable for a week or two, I'll move it to the public master.

Cheers,

Diana

[1] http://master2.llvm.validation.linaro.org/builders/clang-cmake-aarch64-global-isel

Diana Picus via llvm-dev

unread,

Jun 14, 2017, 10:28:27 AM6/14/17

to Quentin Colombet, Kristof Beyls, Matthias Braun, chris.m...@apple.com, llvm-dev, Justin Bogner, Ahmed Bougacha, Aditya Nandakumar, nd

On 12 June 2017 at 18:54, Diana Picus <diana...@linaro.org> wrote:

Hi all,

I added a buildbot [1] running the test-suite with -O0 -global-isel. It runs into the same 2 timeouts that I reported previously on this thread (paq8p and scimark2). It would be nice to make it green before flipping the switch.

I did some more investigations on a machine similar to the one running the buildbot. For paq8p and scimark2, I get these results for O0:

PAQ8p:

Fast isel: 666.344

Global isel: 731.384

SciMark2-C:

Fast isel: 463.908

Global isel: 496.22

The current timeout is 500s (so in this particular case we didn't hit it for scimark2, and it ran successfully to completion). I don't think the difference between FastISel and GlobalISel is too atrocious, so I would propose increasing the timeout for these 2 benchmarks. I'm not sure if we can do this on a per-bot basis, but I see some precedent for setting custom timeout thresholds for various benchmarks on different architectures (sometimes with comments that it's done so we can run O0 on that particular benchmark).

Something along these lines works:

https://reviews.llvm.org/differential/diff/102547/

What do you guys think about this approach?

Thanks,

Diana

PS: The buildbot is using the Makefiles because that's what our other AArch64 test-suite bots use. Moving all of them to CMake is a transition for another time.

Quentin Colombet via llvm-dev

unread,

Jun 16, 2017, 6:06:47 PM6/16/17

to Diana Picus, llvm-dev, Justin Bogner, Ahmed Bougacha, Aditya Nandakumar, nd

On Jun 14, 2017, at 7:27 AM, Diana Picus <diana...@linaro.org> wrote:

On 12 June 2017 at 18:54, Diana Picus <diana...@linaro.org> wrote:
Hi all,

I added a buildbot [1] running the test-suite with -O0 -global-isel. It runs into the same 2 timeouts that I reported previously on this thread (paq8p and scimark2). It would be nice to make it green before flipping the switch.

I did some more investigations on a machine similar to the one running the buildbot. For paq8p and scimark2, I get these results for O0:

PAQ8p:
Fast isel: 666.344
Global isel: 731.384

SciMark2-C:
Fast isel: 463.908
Global isel: 496.22

The current timeout is 500s (so in this particular case we didn't hit it for scimark2, and it ran successfully to completion). I don't think the difference between FastISel and GlobalISel is too atrocious, so I would propose increasing the timeout for these 2 benchmarks. I'm not sure if we can do this on a per-bot basis, but I see some precedent for setting custom timeout thresholds for various benchmarks on different architectures (sometimes with comments that it's done so we can run O0 on that particular benchmark).

Something along these lines works:
https://reviews.llvm.org/differential/diff/102547/

What do you guys think about this approach?

Looks reasonable to me.

Quentin Colombet via llvm-dev

unread,

Jun 16, 2017, 7:43:44 PM6/16/17

to Quentin Colombet, llvm-dev, Justin Bogner, Ahmed Bougacha, Aditya Nandakumar, nd

Hi all,

We had some internal discussions about flipping the default for O0 and we concluded that we wanted to postpone it.

*** Why Is That? ***

We don’t want to send the wrong message that GlobalISel’s design is set in stone and ready for broader adoption.

In particular,

1. The APIs are still evolving and can still possibly change significantly

2. The TableGen backend to reuse the existing SD patterns is still at its early stage

3. We want to investigate closely the performance of global-isel (compile-time, runtime, code size, fallbacks)

The rationale behind those items is that we want to minimize the pain of moving forward for everybody. We also want the out-of-the-box experience to be pleasant (like all/most of the tablegen patterns just work, we have documentation on how to target a new backend, etc.) Finally, we want to gain confidence we are going to be able to address the performance issues we have with the current design and if not, derive a plan for that.

We purposely left out of the conversation what will be the right time and requirements to flip the switch. We want to gather more data first. Your help would be appreciated!

*** Short-Term Proposal ***

What we would like to do instead short-term is:

A. Repurpose or create an option “-aarch64-enable-global-isel-at-O” to enable GISel with fallbacks and warnings enables (i.e., equivalent of -global-isel -global-isel-abort=2)

B. Advertise this option in the next open source release to allow compiler enthusiastic to try it and report problems

C. Have GISel always built so we can push thing in the right place, MachineVerifier in mind, and stop doing some weird gymnastic

What do people think?

*** Your Help Is Needed ***

- Please share your experience in using the GISel APIs and how we can make them better. Moving forward we’ll have those conversations on open source instead of internally/with a narrower audience.

- Report any performance problem you identify

- Propose patches!

Cheers,

-Quentin

Eric Christopher via llvm-dev

unread,

Jun 16, 2017, 7:58:38 PM6/16/17

to Quentin Colombet, llvm-dev, Justin Bogner, Ahmed Bougacha, Aditya Nandakumar, nd

On Fri, Jun 16, 2017 at 4:43 PM Quentin Colombet <qcol...@apple.com> wrote:

Hi all,

We had some internal discussions about flipping the default for O0 and we concluded that we wanted to postpone it.

*** Why Is That? ***

We don’t want to send the wrong message that GlobalISel’s design is set in stone and ready for broader adoption.
In particular,
1. The APIs are still evolving and can still possibly change significantly
2. The TableGen backend to reuse the existing SD patterns is still at its early stage
3. We want to investigate closely the performance of global-isel (compile-time, runtime, code size, fallbacks)

The rationale behind those items is that we want to minimize the pain of moving forward for everybody. We also want the out-of-the-box experience to be pleasant (like all/most of the tablegen patterns just work, we have documentation on how to target a new backend, etc.) Finally, we want to gain confidence we are going to be able to address the performance issues we have with the current design and if not, derive a plan for that.

We purposely left out of the conversation what will be the right time and requirements to flip the switch. We want to gather more data first. Your help would be appreciated!

*** Short-Term Proposal ***

What we would like to do instead short-term is:
A. Repurpose or create an option “-aarch64-enable-global-isel-at-O” to enable GISel with fallbacks and warnings enables (i.e., equivalent of -global-isel -global-isel-abort=2)
B. Advertise this option in the next open source release to allow compiler enthusiastic to try it and report problems
C. Have GISel always built so we can push thing in the right place, MachineVerifier in mind, and stop doing some weird gymnastic

What do people think?

How about -fexperimental-global-isel as a flag to clang?

-eric

Quentin Colombet via llvm-dev

unread,

Jun 16, 2017, 8:11:32 PM6/16/17

to Eric Christopher, llvm-dev, Justin Bogner, Ahmed Bougacha, Aditya Nandakumar, nd

On Jun 16, 2017, at 4:58 PM, Eric Christopher <echr...@gmail.com> wrote:

On Fri, Jun 16, 2017 at 4:43 PM Quentin Colombet <qcol...@apple.com> wrote:
Hi all,

We had some internal discussions about flipping the default for O0 and we concluded that we wanted to postpone it.

*** Why Is That? ***

We don’t want to send the wrong message that GlobalISel’s design is set in stone and ready for broader adoption.
In particular,
1. The APIs are still evolving and can still possibly change significantly
2. The TableGen backend to reuse the existing SD patterns is still at its early stage
3. We want to investigate closely the performance of global-isel (compile-time, runtime, code size, fallbacks)

The rationale behind those items is that we want to minimize the pain of moving forward for everybody. We also want the out-of-the-box experience to be pleasant (like all/most of the tablegen patterns just work, we have documentation on how to target a new backend, etc.) Finally, we want to gain confidence we are going to be able to address the performance issues we have with the current design and if not, derive a plan for that.

We purposely left out of the conversation what will be the right time and requirements to flip the switch. We want to gather more data first. Your help would be appreciated!

*** Short-Term Proposal ***

What we would like to do instead short-term is:
A. Repurpose or create an option “-aarch64-enable-global-isel-at-O” to enable GISel with fallbacks and warnings enables (i.e., equivalent of -global-isel -global-isel-abort=2)
B. Advertise this option in the next open source release to allow compiler enthusiastic to try it and report problems
C. Have GISel always built so we can push thing in the right place, MachineVerifier in mind, and stop doing some weird gymnastic

What do people think?

How about -fexperimental-global-isel as a flag to clang?

I thought about that and that would work, but I thought we had an implicit contract that clang options are not going away.

If that’s not the case, then, yes, we should do that!

Eric Christopher via llvm-dev

unread,

Jun 16, 2017, 8:17:59 PM6/16/17

to Quentin Colombet, llvm-dev, Justin Bogner, Ahmed Bougacha, Aditya Nandakumar, nd

On Fri, Jun 16, 2017 at 5:11 PM Quentin Colombet <qcol...@apple.com> wrote:

On Jun 16, 2017, at 4:58 PM, Eric Christopher <echr...@gmail.com> wrote:

On Fri, Jun 16, 2017 at 4:43 PM Quentin Colombet <qcol...@apple.com> wrote:
Hi all,

We had some internal discussions about flipping the default for O0 and we concluded that we wanted to postpone it.

*** Why Is That? ***

We don’t want to send the wrong message that GlobalISel’s design is set in stone and ready for broader adoption.
In particular,
1. The APIs are still evolving and can still possibly change significantly
2. The TableGen backend to reuse the existing SD patterns is still at its early stage
3. We want to investigate closely the performance of global-isel (compile-time, runtime, code size, fallbacks)

The rationale behind those items is that we want to minimize the pain of moving forward for everybody. We also want the out-of-the-box experience to be pleasant (like all/most of the tablegen patterns just work, we have documentation on how to target a new backend, etc.) Finally, we want to gain confidence we are going to be able to address the performance issues we have with the current design and if not, derive a plan for that.

We purposely left out of the conversation what will be the right time and requirements to flip the switch. We want to gather more data first. Your help would be appreciated!

*** Short-Term Proposal ***

What we would like to do instead short-term is:
A. Repurpose or create an option “-aarch64-enable-global-isel-at-O” to enable GISel with fallbacks and warnings enables (i.e., equivalent of -global-isel -global-isel-abort=2)
B. Advertise this option in the next open source release to allow compiler enthusiastic to try it and report problems
C. Have GISel always built so we can push thing in the right place, MachineVerifier in mind, and stop doing some weird gymnastic

What do people think?

How about -fexperimental-global-isel as a flag to clang?

I thought about that and that would work, but I thought we had an implicit contract that clang options are not going away.
If that’s not the case, then, yes, we should do that!

*shrug* we have one for the new pass manager to make testing easier, this seems reasonable as well for something that's going to be tested over an extended period of time.

-eric

Quentin Colombet via llvm-dev

unread,

Jun 16, 2017, 8:26:39 PM6/16/17

to Eric Christopher, llvm-dev, Justin Bogner, Ahmed Bougacha, Aditya Nandakumar, nd

On Jun 16, 2017, at 5:17 PM, Eric Christopher <echr...@gmail.com> wrote:

On Fri, Jun 16, 2017 at 5:11 PM Quentin Colombet <qcol...@apple.com> wrote:
On Jun 16, 2017, at 4:58 PM, Eric Christopher <echr...@gmail.com> wrote:

On Fri, Jun 16, 2017 at 4:43 PM Quentin Colombet <qcol...@apple.com> wrote:
Hi all,

We had some internal discussions about flipping the default for O0 and we concluded that we wanted to postpone it.

*** Why Is That? ***

We don’t want to send the wrong message that GlobalISel’s design is set in stone and ready for broader adoption.
In particular,
1. The APIs are still evolving and can still possibly change significantly
2. The TableGen backend to reuse the existing SD patterns is still at its early stage
3. We want to investigate closely the performance of global-isel (compile-time, runtime, code size, fallbacks)

The rationale behind those items is that we want to minimize the pain of moving forward for everybody. We also want the out-of-the-box experience to be pleasant (like all/most of the tablegen patterns just work, we have documentation on how to target a new backend, etc.) Finally, we want to gain confidence we are going to be able to address the performance issues we have with the current design and if not, derive a plan for that.

We purposely left out of the conversation what will be the right time and requirements to flip the switch. We want to gather more data first. Your help would be appreciated!

*** Short-Term Proposal ***

What we would like to do instead short-term is:
A. Repurpose or create an option “-aarch64-enable-global-isel-at-O” to enable GISel with fallbacks and warnings enables (i.e., equivalent of -global-isel -global-isel-abort=2)
B. Advertise this option in the next open source release to allow compiler enthusiastic to try it and report problems
C. Have GISel always built so we can push thing in the right place, MachineVerifier in mind, and stop doing some weird gymnastic

What do people think?

How about -fexperimental-global-isel as a flag to clang?

I thought about that and that would work, but I thought we had an implicit contract that clang options are not going away.
If that’s not the case, then, yes, we should do that!

*shrug* we have one for the new pass manager to make testing easier, this seems reasonable as well for something that's going to be tested over an extended period of time.

Sounds good to me. We’ll do that.

Thanks!

Diana Picus via llvm-dev

unread,

Jun 19, 2017, 4:44:02 AM6/19/17

to Quentin Colombet, llvm-dev, Justin Bogner, Ahmed Bougacha, Aditya Nandakumar, nd

On 17 June 2017 at 01:43, Quentin Colombet <qcol...@apple.com> wrote:
> *** Your Help Is Needed ***
>
> - Please share your experience in using the GISel APIs and how we can make
> them better. Moving forward we’ll have those conversations on open source
> instead of internally/with a narrower audience.
> - Report any performance problem you identify
> - Propose patches!

While we're at it, these patches could use some review:
* [GlobalISel] combine not symmetric merge/unmerge nodes [1]
* [GlobalISel] Make multi-step legalization work [2]
* [GlobalISel] Enable specifying how to legalize non-power-of-2 size
types. [NFC-ish] [3]

Note that [2] is a spin-off from [3], which has been in review for a
long, long time.

[1] https://reviews.llvm.org/D33626
[2] https://reviews.llvm.org/D32529
[3] https://reviews.llvm.org/D30529

Cheers,
Diana

Kristof Beyls via llvm-dev

unread,

Jun 19, 2017, 8:28:26 AM6/19/17

to Diana Picus, Justin Bogner, llvm-dev, Ahmed Bougacha, Aditya Nandakumar, nd

On 19 Jun 2017, at 10:43, Diana Picus via llvm-dev <llvm...@lists.llvm.org> wrote:

On 17 June 2017 at 01:43, Quentin Colombet <qcol...@apple.com> wrote:

*** Your Help Is Needed ***

- Please share your experience in using the GISel APIs and how we can make
them better. Moving forward we’ll have those conversations on open source
instead of internally/with a narrower audience.
- Report any performance problem you identify
- Propose patches!

While we're at it, these patches could use some review:
* [GlobalISel] combine not symmetric merge/unmerge nodes [1]

* [GlobalISel] Make multi-step legalization work [2] (https://reviews.llvm.org/D32529)

As the author of the above patch, I think this one should be relatively easy to review.

* [GlobalISel] Enable specifying how to legalize non-power-of-2 size

types. [NFC-ish] [3] (https://reviews.llvm.org/D30529)

I'm working on rebasing this one on top of the above patch ("Make multi-step legalization work").

Most of that is done, but I'm still trying to write a better description of what this large patch actually

does, so that it becomes easier to review. Unfortunately, I haven't found a way to split that

patch up further in smaller incremental pieces.

I had put this on the back-burner and instead prioritized what was needed to enable GlobalISel

by default. Now that that is not being pushed as hard, I'll start looking again at it.

I hope to have an updated patch with a description that makes review easier towards the end of the week.

Thanks,

Kristof

Diana Picus via llvm-dev

unread,

Jul 3, 2017, 8:09:26 AM7/3/17

to Quentin Colombet, llvm-dev, Justin Bogner, Ahmed Bougacha, Aditya Nandakumar, nd

On 12 June 2017 at 18:54, Diana Picus <diana...@linaro.org> wrote:
>

> Hi all,
>
> I added a buildbot [1] running the test-suite with -O0 -global-isel. It runs into the same 2 timeouts that I reported previously on this thread (paq8p and scimark2). It would be nice to make it green before flipping the switch.
>
> At the moment, it lives in an internal buildmaster that I've setup for this purpose. If we fix it and it proves to be stable for a week or two, I'll move it to the public master.
>

FYI, this is now live on the public master:
http://lab.llvm.org:8011/builders/clang-cmake-aarch64-global-isel

I hope people will find it useful.

Quentin Colombet via llvm-dev

unread,

Jul 5, 2017, 12:13:21 PM7/5/17

to Diana Picus, llvm-dev, Justin Bogner, Ahmed Bougacha, Aditya Nandakumar, nd

On Jul 3, 2017, at 5:08 AM, Diana Picus <diana...@linaro.org> wrote:

On 12 June 2017 at 18:54, Diana Picus <diana...@linaro.org> wrote:

Hi all,

I added a buildbot [1] running the test-suite with -O0 -global-isel. It runs into the same 2 timeouts that I reported previously on this thread (paq8p and scimark2). It would be nice to make it green before flipping the switch.

At the moment, it lives in an internal buildmaster that I've setup for this purpose. If we fix it and it proves to be stable for a week or two, I'll move it to the public master.

FYI, this is now live on the public master:
http://lab.llvm.org:8011/builders/clang-cmake-aarch64-global-isel

Sweet!

I hope people will find it useful.

Thanks for doing this.

Quentin Colombet via llvm-dev

unread,

Nov 7, 2017, 7:42:24 PM11/7/17

to Quentin Colombet, llvm-dev, Justin Bogner, Ahmed Bougacha, Aditya Nandakumar, nd

Hi all,

I’d like to resurrect this thread and ask if people are on board for enabling this by default for AArch64 O0.

*** What Changed Since June? ***

- We added a way to describe the legalization actions for non-power-of-2

- We gave a tutorial that covers the best practices to target GlobalISel

- We improved the TableGen backend to reuse existing SDISel patterns

- We built and ran huge internal software with GISel

- We evaluated the performance of GISel and are confident things are in a good shape (with https://reviews.llvm.org/D39034) and moving forward would look even better (see the last LLVM Dev talk: GlobalISel: Present, Past, and Future when it is available)

*** So What’s he Plan? ***

- Switch the default instruction selector to GISel for AArch64 at O0

- Enable the fallback path by default for AArch64 (with warnings enabled when that path is hit)

- Provide a clang option to turn GISel off

What do you think?

Thanks,

-Quentin

via llvm-dev

unread,

Nov 10, 2017, 3:36:57 PM11/10/17

to Quentin Colombet, llvm-dev, nd, Ahmed Bougacha, Justin Bogner, Aditya Nandakumar

SGTM, Quentin.

On 2017-11-07 19:42, Quentin Colombet via llvm-dev wrote:
> Hi all,
>
> I’d like to resurrect this thread and ask if people are on board for
> enabling this by default for AArch64 O0.
>
> *** What Changed Since June? ***
>
> - We added a way to describe the legalization actions for
> non-power-of-2
> - We gave a tutorial that covers the best practices to target
> GlobalISel
> - We improved the TableGen backend to reuse existing SDISel patterns
> - We built and ran huge internal software with GISel
> - We evaluated the performance of GISel and are confident things are
> in a good shape (with https://reviews.llvm.org/D39034) and moving
> forward would look even better (see the last LLVM Dev talk:

> _GlobalISel: Present, Past, and Future_ when it is available)

>> https://reviews.llvm.org/differential/diff/102547/ [1]

>>
>> What do you guys think about this approach?
>
> Looks reasonable to me.
>
>> Thanks,
>> Diana
>>
>> PS: The buildbot is using the Makefiles because that's what our
>> other AArch64 test-suite bots use. Moving all of them to CMake is a
>> transition for another time.
>
>
>

> Links:
> ------
> [1] https://reviews.llvm.org/differential/diff/102547/

Kristof Beyls via llvm-dev

unread,

Nov 13, 2017, 12:10:39 PM11/13/17

to Quentin Colombet, llvm-dev, nd, Ahmed Bougacha, Justin Bogner, Aditya Nandakumar

Hi Quentin,

My only remaining concern is around ABI compatibility.

The following commit seems to indicate that in the previous round of evaluation, we didn’t find an existing ABI compatibility issue:

http://llvm.org/viewvc/llvm-project?view=revision&revision=311388.

I haven’t looked into the details of this issue - so maybe I’m worried over nothing?

I’m wondering if since then on your side you did any testing around ABI compatibility?

E.g. building software where you semi-randomly build some functions through GlobalISel and some functions through DAGISel?

Thanks,

Kristof

Quentin Colombet via llvm-dev

unread,

Nov 13, 2017, 1:26:58 PM11/13/17

to Kristof Beyls, llvm-dev, nd, Ahmed Bougacha, Justin Bogner, Aditya Nandakumar

Hi Kristof,

On Nov 13, 2017, at 9:10 AM, Kristof Beyls <Kristo...@arm.com> wrote:

Hi Quentin,

My only remaining concern is around ABI compatibility.

The following commit seems to indicate that in the previous round of evaluation, we didn’t find an existing ABI compatibility issue:

http://llvm.org/viewvc/llvm-project?view=revision&revision=311388.

I haven’t looked into the details of this issue - so maybe I’m worried over nothing?

No, you’re right. The problem with ABI is if you are consistently wrong, then you won’t notice :).

I’m wondering if since then on your side you did any testing around ABI compatibility?

E.g. building software where you semi-randomly build some functions through GlobalISel and some functions through DAGISel?

Justin will look into that. Clang has utility script for that utils/ABITest.

Given we will only be able to check iOS ABI, you may want to follow the same kind of validation on your side.

I let you sync up with Justin for the method.

Cheers,

-Quentin

Oliver Stannard via llvm-dev

unread,

Nov 14, 2017, 5:27:56 AM11/14/17

to Quentin Colombet, llvm...@lists.llvm.org, Justin Bogner, Ahmed Bougacha, Aditya Nandakumar, nd

Hi Quentin,

I’ve started running an ABI test suite with global isel on AArch64, and while it hasn’t found any ABI issues it has hit an assertion in clang when using the __fp16 type. Here’s a reproducer:

__fp16 pass_f16(__fp16 p) {

return p;

}

$ /work/llvm/build/bin/clang --target=aarch64-arm-none-eabi -march=armv8-a -c test.c -O0 -mllvm -global-isel -mllvm -global-isel-abort=0

clang-6.0: /work/llvm/llvm/lib/CodeGen/GlobalISel/RegisterBankInfo.cpp:446: static void llvm::RegisterBankInfo::applyDefaultMapping(const llvm::RegisterBankInfo::OperandsMapper &): Assertion `OrigTy.getSizeInBits() == NewTy.getSizeInBits() && "Types with difference size cannot be handled by the default " "mapping"' failed.

#0 0x000000000362a764 PrintStackTraceSignalHandler(void*) (/work/llvm/build/bin/clang-6.0+0x362a764)

#1 0x000000000362aac6 SignalHandler(int) (/work/llvm/build/bin/clang-6.0+0x362aac6)

#2 0x00007f9193b78330 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x10330)

#3 0x00007f919276bc37 gsignal /build/eglibc-oGUzwX/eglibc-2.19/signal/../nptl/sysdeps/unix/sysv/linux/raise.c:56:0

#4 0x00007f919276f028 abort /build/eglibc-oGUzwX/eglibc-2.19/stdlib/abort.c:91:0

#5 0x00007f9192764bf6 __assert_fail_base /build/eglibc-oGUzwX/eglibc-2.19/assert/assert.c:92:0

#6 0x00007f9192764ca2 (/lib/x86_64-linux-gnu/libc.so.6+0x2fca2)

#7 0x0000000003d70eb9 (/work/llvm/build/bin/clang-6.0+0x3d70eb9)

#8 0x0000000003d6b00c llvm::RegBankSelect::applyMapping(llvm::MachineInstr&, llvm::RegisterBankInfo::InstructionMapping const&, llvm::SmallVectorImpl<llvm::RegBankSelect::RepairingPlacement>&) (/work/llvm/build/bin/clang-6.0+0x3d6b00c)

#9 0x0000000003d6b366 llvm::RegBankSelect::assignInstr(llvm::MachineInstr&) (/work/llvm/build/bin/clang-6.0+0x3d6b366)

#10 0x0000000003d6b7f1 llvm::RegBankSelect::runOnMachineFunction(llvm::MachineFunction&) (/work/llvm/build/bin/clang-6.0+0x3d6b7f1)

#11 0x0000000002d934c8 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (/work/llvm/build/bin/clang-6.0+0x2d934c8)

#12 0x00000000030c998f llvm::FPPassManager::runOnFunction(llvm::Function&) (/work/llvm/build/bin/clang-6.0+0x30c998f)

#13 0x00000000030c9c53 llvm::FPPassManager::runOnModule(llvm::Module&) (/work/llvm/build/bin/clang-6.0+0x30c9c53)

#14 0x00000000030ca136 llvm::legacy::PassManagerImpl::run(llvm::Module&) (/work/llvm/build/bin/clang-6.0+0x30ca136)

#15 0x00000000037c3dcf clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::DataLayout const&, llvm::Module*, clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream> >) (/work/llvm/build/bin/clang-6.0+0x37c3dcf)

#16 0x0000000003d421a0 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (/work/llvm/build/bin/clang-6.0+0x3d421a0)

#17 0x0000000004457376 clang::ParseAST(clang::Sema&, bool, bool) (/work/llvm/build/bin/clang-6.0+0x4457376)

#18 0x0000000003ca6ea0 clang::FrontendAction::Execute() (/work/llvm/build/bin/clang-6.0+0x3ca6ea0)

#19 0x0000000003c1fa31 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/work/llvm/build/bin/clang-6.0+0x3c1fa31)

#20 0x0000000003d3bf4b clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/work/llvm/build/bin/clang-6.0+0x3d3bf4b)

#21 0x0000000001f85629 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/work/llvm/build/bin/clang-6.0+0x1f85629)

#22 0x0000000001f83096 main (/work/llvm/build/bin/clang-6.0+0x1f83096)

#23 0x00007f9192756f45 __libc_start_main /build/eglibc-oGUzwX/eglibc-2.19/csu/libc-start.c:321:0

#24 0x0000000001f80029 _start (/work/llvm/build/bin/clang-6.0+0x1f80029)

Stack dump:

0. Program arguments: /work/llvm/build/bin/clang-6.0 -cc1 -triple aarch64-arm-none-eabi -emit-obj -mrelax-all -disable-free -main-file-name test.c -mrelocation-model static -mthread-model posix -mdisable-fp-elim -fmath-errno -masm-verbose -mconstructor-aliases -fuse-init-array -target-cpu generic -target-feature +neon -target-abi aapcs -fallow-half-arguments-and-returns -dwarf-column-info -debugger-tuning=gdb -coverage-notes-file /work/innovation/cctest/test.gcno -resource-dir /work/llvm/build/lib/clang/6.0.0 -O0 -fdebug-compilation-dir /work/innovation/cctest -ferror-limit 19 -fmessage-length 226 -fno-signed-char -fobjc-runtime=gcc -fdiagnostics-show-option -fcolor-diagnostics -mllvm -global-isel -mllvm -global-isel-abort=0 -o test.o -x c test.c

1. <eof> parser at end of file

2. Code generation

3. Running pass 'Function Pass Manager' on module 'test.c'.

4. Running pass 'RegBankSelect' on function '@pass_f16'

clang-6.0: error: unable to execute command: Aborted (core dumped)

clang-6.0: error: clang frontend command failed due to signal (use -v to see invocation)

clang version 6.0.0 (ssh://olis...@ds-gerrit.euhpc.arm.com:29418/armcompiler/clang aa2b9952ef98a5fe2d47384ef17106855b8bae51) (ssh://olis...@ds-gerrit.euhpc.arm.com:29418/armcompiler/llvm 29f89772107a79b5f2a816d4748ed9c19416c1b6)

Target: aarch64-arm-none-eabi

Thread model: posix

InstalledDir: /work/llvm/build/bin

clang-6.0: note: diagnostic msg: PLEASE submit a bug report to http://llvm.org/bugs/ and include the crash backtrace, preprocessed source, and associated run script.

clang-6.0: note: diagnostic msg:

********************

PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:

Preprocessed source(s) and associated run script(s) are located at:

clang-6.0: note: diagnostic msg: /tmp/test-e06964.c

clang-6.0: note: diagnostic msg: /tmp/test-e06964.sh

clang-6.0: note: diagnostic msg:

********************

Oliver

Kristof Beyls via llvm-dev

unread,

Nov 14, 2017, 10:17:50 AM11/14/17

to Quentin Colombet, llvm-dev, nd, Ahmed Bougacha, Justin Bogner, Aditya Nandakumar

On 13 Nov 2017, at 18:26, Quentin Colombet <qcol...@apple.com> wrote:

Hi Kristof,

On Nov 13, 2017, at 9:10 AM, Kristof Beyls <Kristo...@arm.com> wrote:

Hi Quentin,

My only remaining concern is around ABI compatibility.

The following commit seems to indicate that in the previous round of evaluation, we didn’t find an existing ABI compatibility issue:

http://llvm.org/viewvc/llvm-project?view=revision&revision=311388.

I haven’t looked into the details of this issue - so maybe I’m worried over nothing?

No, you’re right. The problem with ABI is if you are consistently wrong, then you won’t notice :).

I’m wondering if since then on your side you did any testing around ABI compatibility?

E.g. building software where you semi-randomly build some functions through GlobalISel and some functions through DAGISel?

Justin will look into that. Clang has utility script for that utils/ABITest.

Given we will only be able to check iOS ABI, you may want to follow the same kind of validation on your side.

I let you sync up with Justin for the method.

Thanks Quentin & Justin!

I had a brief look at utils/ABITest. It seems a bit light on documentation on how to best run the test.

I’m happy to try and run it on linux if you can share some info on the best way to run that test, Justin?

Thanks!

Kristof

Quentin Colombet via llvm-dev

unread,

Nov 14, 2017, 12:29:23 PM11/14/17

to Oliver Stannard, llvm...@lists.llvm.org, Justin Bogner, Ahmed Bougacha, Aditya Nandakumar, nd

Thanks Oliver.

I’ll have a look. This typically means that we miss a mapping for this type/instruction, which is not surprising given how little code we have we fp16.

Quentin Colombet via llvm-dev

unread,

Nov 14, 2017, 6:12:19 PM11/14/17

to Quentin Colombet, llvm...@lists.llvm.org, Justin Bogner, Ahmed Bougacha, Aditya Nandakumar, nd

To give an update here, we actually are not missing a mapping. The code complains because we are copying around a fp16 into a gpr32 and that shouldn’t be done with a copy (default mapping).

I extended the repairing code to issue G_ANYEXT in those cases instead of asserting.

However, now, I have to teach instruction select about those ANYEXT otherwise we’ll fallback in that case. But that’s a different story.

I’ll try to commit today or tomorrow (I have to strengthen the tests).

Justin Bogner via llvm-dev

unread,

Nov 14, 2017, 8:34:03 PM11/14/17

to Kristof Beyls via llvm-dev, Ahmed Bougacha, nd, Aditya Nandakumar

A bit light is putting it lightly ;)

> I’m happy to try and run it on linux if you can share some info on the best
> way to run that test, Justin?

I played around with utils/ABITest a bit and I managed to get it to run
with some minor changes, though it is not at all obvious how this thing
works.

The main thing is to change from trying to compare gcc to clang to
compare clang and clang -mllvm -global-isel instead:

diff --git a/utils/ABITest/Makefile.test.common b/utils/ABITest/Makefile.test.common
index 3c208adf0c..419d7cfab1 100644
--- a/utils/ABITest/Makefile.test.common
+++ b/utils/ABITest/Makefile.test.common
@@ -11,13 +11,14 @@ TESTARGS := --no-unsigned --no-vector --no-complex --no-bool
COUNT := 1
TIMEOUT := 5

-CFLAGS := -std=gnu99
+CFLAGS := -std=gnu99 -target arm64-apple-ios

-X_COMPILER := gcc
-X_LL_CFLAGS := -emit-llvm -S
-Y_COMPILER := clang
-Y_LL_CFLAGS := -emit-llvm -S
-CC := gcc
+CC := ${HOME}/build/llvm/bin/clang
+X_COMPILER := ${CC}
+Y_COMPILER := ${CC}
+
+X_CFLAGS :=
+Y_CFLAGS := -mllvm -global-isel

###

I also made a change to the "temps/test.%.out" target to copy the binary
onto a device before trying to run it, since I was cross compiling, and
removed the 32-bit cases from build-and-summarize-all.sh since we're
only dealing with AArch64 here.

To see if it works, `cd` into one of the test directories (like
single-args-64) and run `make VERBOSE=1 test.0.report`. If it finds a
bug make will fail with something like "TEST 0: temps/test.0.yy.diff
failed".

Then, once things work, you can run ./build-and-summarize-all.sh 100
(or any number you like) from the ABITest directory and it will tell
you if it finds anything.

Oliver Stannard via llvm-dev

unread,

Nov 17, 2017, 8:32:19 AM11/17/17

to qcol...@apple.com, llvm...@lists.llvm.org, nd

Hi Quentin,

At Kristof’s suggestion, I tried running our ABI test suite for a big-endian AArch64 target, and this found an ABI mismatch between global-isel and regular -O0. Here’s a reproducer for the first one I’ve investigated:

struct foo {

float first;

float second;

};

float get_first(foo p) {

return p.first;

}

This is the code that global-isel currently generates:

// /work/llvm/build/bin/clang --target=aarch64--none-eabi -march=armv8-a -c callees.cpp -O0 -mllvm -global-isel=true -mllvm -global-isel-abort=0 -mbig-endian -o - -S

_Z9get_first3foo: // @_Z9get_first3foo

// BB#0: // %entry

sub sp, sp, #16 // =16

// implicit-def: %X8

fmov w9, s0

mov w10, w9

bfxil x8, x10, #0, #32

fmov w9, s1

mov w10, w9

bfi x8, x10, #32, #32

add x10, sp, #8 // =8

str x8, [sp, #8]

ldr w9, [x10]

fmov s0, w9

add sp, sp, #16 // =16

ret

When run on a big-endian target, this incorrectly returns the second member of the struct, instead of the first.

Oliver

Oliver Stannard via llvm-dev

unread,

Nov 17, 2017, 9:56:56 AM11/17/17

to qcol...@apple.com, llvm...@lists.llvm.org, nd

Hi Quentin,

It seems that we also get the calling convention wrong for vector types on big-endian:

#include <arm_neon.h>

int32x2_t load_vector(int32x2_t *p) {

return *p;

}

Global-isel generates this:

// armclang --target=aarch64-arm-none-eabi -march=armv8-a -c callees.cpp -O0 -Wall -std=c++11 -mllvm -global-isel=true -mllvm -global-isel-abort=0 -mbig-endian -o - -S

_Z11load_vectorP11__Int32x2_t: // @_Z11load_vectorP11__Int32x2_t

// BB#0: // %entry

sub sp, sp, #16 // =16

str x0, [sp, #8]

ldr x0, [sp, #8]

ld1 { v0.2s }, [x0]

add sp, sp, #16 // =16

ret

With global-isel off, there is a rev64 instruction between the ld1 and the add, which fixes up the endianness of the vector.

Oliver

From: qcol...@apple.com [mailto:qcol...@apple.com]

Oliver Stannard via llvm-dev

unread,

Nov 17, 2017, 12:14:42 PM11/17/17

to qcol...@apple.com, llvm...@lists.llvm.org, nd

Hi Quentin,

One more reproducer, this time with small (<64bit) values being passed on the stack:

int foo(int x0, int x1, int x2, int x3, int x4, int x5, int x6, int x7,

int stack1) {

return stack1;

}

int main() {

int ret = foo(0,1,2,3,4,5,6,7,8);

printf("%d\n", ret);

}

Global isel thinks that the incoming value of stack1 is stored in bytes [0,4) above SP, but for big-endian targets this should be in bytes [4,8):

// /work/llvm/build/bin/clang --target=aarch64-arm-none-eabi -march=armv8-a -c callees.cpp -O0 -Wall -std=c++11 -mllvm -global-isel=true -mllvm -global-isel-abort=0 -mbig-endian -o - -S

_Z3fooiiiiiiiii: // @_Z3fooiiiiiiiii

// BB#0: // %entry

sub sp, sp, #48 // =48

ldr w8, [sp, #48] // <= Should be [sp, #52]

str w0, [sp, #44]

str w1, [sp, #40]

str w2, [sp, #36]

str w3, [sp, #32]

str w4, [sp, #28]

str w5, [sp, #24]

str w6, [sp, #20]

str w7, [sp, #16]

str w8, [sp, #12]

ldr w0, [sp, #12]

add sp, sp, #48 // =48

ret

Oliver

From: qcol...@apple.com [mailto:qcol...@apple.com]

Daniel Sanders via llvm-dev

unread,

Nov 17, 2017, 4:38:37 PM11/17/17

to Oliver Stannard, llvm...@lists.llvm.org, nd

It seems we've hit the lists 100KB size limit. Re-sending my email with the quotes trimmed.

On 17 Nov 2017, at 10:34, Daniel Sanders <daniel_l...@apple.com> wrote:

Does the MIR for this one have a G_BITCAST from one vector type to another? It sounds like we haven't implemented the fact that bitcasts are sometimes shuffles on big-endian.

Justin Bogner via llvm-dev

unread,

Nov 17, 2017, 4:51:43 PM11/17/17

to Kristof Beyls via llvm-dev, Ahmed Bougacha, nd, Aditya Nandakumar

Justin Bogner <ma...@justinbogner.com> writes:
> Kristof Beyls via llvm-dev <llvm...@lists.llvm.org> writes:
>> On 13 Nov 2017, at 18:26, Quentin Colombet <qcol...@apple.com> wrote:
>>> Hi Kristof,
>>>
>>> On Nov 13, 2017, at 9:10 AM, Kristof Beyls <Kristo...@arm.com> wrote:
>>>> Hi Quentin,
>>>>
>>>> My only remaining concern is around ABI compatibility.
>>>> The following commit seems to indicate that in the previous round of
>>>> evaluation, we didn’t find an existing ABI compatibility issue:
>>>> http://llvm.org/viewvc/llvm-project?view=revision&revision=311388.
>>>> I haven’t looked into the details of this issue - so maybe I’m worried
>>>> over nothing?
>>>
>>> No, you’re right. The problem with ABI is if you are consistently wrong,
>>> then you won’t notice :).
>>>
>>>> I’m wondering if since then on your side you did any testing around
>>>> ABI compatibility?
>>>> E.g. building software where you semi-randomly build some functions
>>>> through GlobalISel and some functions through DAGISel?
>>>
>>> Justin will look into that. Clang has utility script for that utils/
>>> ABITest.
>>>
>>> Given we will only be able to check iOS ABI, you may want to follow the
>>> same kind of validation on your side.

Looping back here, I generated and ran 1000000 test cases with the
ABITest utility and hit some fallbacks and an "invalid zero-sized type"
assert that need looking at, but it didn't uncover any ABI issues.

Quentin Colombet via llvm-dev

unread,

Nov 17, 2017, 11:40:28 PM11/17/17

to Quentin Colombet, llvm...@lists.llvm.org, Justin Bogner, Ahmed Bougacha, Aditya Nandakumar, nd

Fixed the fp16 mapping in r318586-318589.

For the record, I could go with the G_ANYEXT in the end, because that would mean that the instruction select pass would have to deal with a truncated store (store s32 with a memory operand of 2 bytes) and it is not quite ready for that.

Given how we do legalization now, I decide to relax the assert so that strictly matching the size of the type is not required anymore (the mapping should be at least as big of the type now). We may want to revisit this by either widening all the types to existing storage size or by tracking the size of the container and the types separately.

Quentin Colombet via llvm-dev

unread,

Nov 19, 2017, 1:34:21 PM11/19/17

to Oliver Stannard, llvm...@lists.llvm.org, nd

Hi Oliver,

Thanks for trying this.

Could you file a different PR for each of the problem you found and reference the umbrella PR: http://llvm.org/PR35347?

Thanks,

-Quentin

Kristof Beyls via llvm-dev

unread,

Nov 20, 2017, 5:56:46 AM11/20/17

to Justin Bogner, llvm-dev, Ahmed Bougacha, nd, Aditya Nandakumar

Hi Justin,

I followed the instructions you provided earlier in the thread to get these tests running on linux.

I also had to change the “#!/bin/sh” lines in the script to “#!/bin/bash”, as apparently the shell scripts in there rely on some bash-isms.

I ran 10000 test cases and didn’t find ABI issues in those; but also found the same “invalid zero-size type” assert.

I’ve file a PR at https://bugs.llvm.org/show_bug.cgi?id=35358 for that assert and linked it to the umbrella issue https://bugs.llvm.org/show_bug.cgi?id=35347.

Thanks!

Kristof

Oliver Stannard via llvm-dev

unread,

Nov 21, 2017, 3:13:45 AM11/21/17

to qcol...@apple.com, llvm...@lists.llvm.org, nd

Hi Quentin,

I’ve raised:

https://bugs.llvm.org/show_bug.cgi?id=35359

https://bugs.llvm.org/show_bug.cgi?id=35360

https://bugs.llvm.org/show_bug.cgi?id=35361

I also left the test suite running over the weekend in little-endian mode with the __fp16 type disabled, and it didn’t find any bugs there.

Quentin Colombet via llvm-dev

unread,

Nov 27, 2017, 1:43:58 PM11/27/17

to Amara Emerson, llvm...@lists.llvm.org, nd

Thanks all.

Amara, could you take a look?

Amara Emerson via llvm-dev

unread,

Dec 11, 2017, 12:12:15 PM12/11/17

to Quentin Colombet, llvm...@lists.llvm.org, nd

As of r320388 we’ve either fixed the blocker bugs or disabled big-endian on GISel, falling back to SDAG. Fixing at least one of the big-endian issues will need use to change the way we handle aggregates, which will take a bit longer (it’s next on my list of things to do).

Do we have any other issues preventing us flipping the switch?

Amara

Kristof Beyls via llvm-dev

unread,

Dec 15, 2017, 4:56:07 AM12/15/17

to Amara Emerson, llvm-dev, nd

I don’t know of any further issues preventing us flipping the switch.
At this point, I’d aim to flip the switch shortly after the creation of the 6.0.0 release branch, so that GlobalISel can harden a bit more enabled-by-default on trunk before it goes into an LLVM release (presumably 7.0.0 then).

Thanks,

Kristof

> On 11 Dec 2017, at 17:08, Amara Emerson <aeme...@apple.com> wrote:
>
> As of r320388 we’ve either fixed the blocker bugs or disabled big-endian on GISel, falling back to SDAG. Fixing at least one of the big-endian issues will need use to change the way we handle aggregates, which will take a bit longer (it’s next on my list of things to do).
>
> Do we have any other issues preventing us flipping the switch?
>
> Amara

Quentin Colombet via llvm-dev

unread,

Dec 15, 2017, 12:30:13 PM12/15/17

to Kristof Beyls, llvm-dev, nd

Sounds good to me.

Amara Emerson via llvm-dev

unread,

Dec 15, 2017, 1:33:50 PM12/15/17

to Kristof Beyls, llvm-dev, nd

Hi Kristof,

We’ve had some more internal discussion and we think that we can, and should, still aim to enable this by default before the release. We have a month of testing time available to shake out critical issues which is normally enough. After the release branch is taken we can also merge in any fixes required during the stabilization period.

Thanks,
Amara

Kristof Beyls via llvm-dev

unread,

Dec 18, 2017, 7:37:50 AM12/18/17

to Amara Emerson, llvm-dev, nd

Hi Amara,

My reasons for preferring the switch to happen after the release branch is based on the following observations:

As far as I can see, the projects and products following top-of-trunk tend to test much more extensively than the testing that happens for llvm.org releases. I expect that the llvm.org release testing won’t find potential remaining issues, whereas the projects and products following trunk are far more likely to find remaining issues with GlobalISel. Therefore, I think it’s best to give as much time as possible for these projects/products following top-of-trunk to find issues before there’s an llvm.org release with GlobalISel enabled-by-default.
For the projects and products that follow top-of-trunk that I know off, if globalisel-by-default would really break things, it’s possible to disable it within that project/product without end users of it needing to know about this. If for the llvm.org release it would turn out that globalisel has a big issue after release, we’d need to somehow let all users of the release know to disable it by adding a convoluted command line option (‘-mllvm -global-isel=0’?)

Combining the 2 observations above, I think it’s better to do the switch shortly after a release branch is taken, rather than just before it.

I hope that makes sense?

Thanks,

Kristof

Amara Emerson via llvm-dev

unread,

Dec 18, 2017, 9:11:17 AM12/18/17

to Kristof Beyls, llvm-dev, nd

On Dec 18, 2017, at 12:37 PM, Kristof Beyls <Kristo...@arm.com> wrote:

Hi Amara,

My reasons for preferring the switch to happen after the release branch is based on the following observations:

As far as I can see, the projects and products following top-of-trunk tend to test much more extensively than the testing that happens for llvm.org releases. I expect that the llvm.org release testing won’t find potential remaining issues, whereas the projects and products following trunk are far more likely to find remaining issues with GlobalISel. Therefore, I think it’s best to give as much time as possible for these projects/products following top-of-trunk to find issues before there’s an llvm.org release with GlobalISel enabled-by-default.
For the projects and products that follow top-of-trunk that I know off, if globalisel-by-default would really break things, it’s possible to disable it within that project/product without end users of it needing to know about this. If for the llvm.org release it would turn out that globalisel has a big issue after release, we’d need to somehow let all users of the release know to disable it by adding a convoluted command line option (‘-mllvm -global-isel=0’?)

Combining the 2 observations above, I think it’s better to do the switch shortly after a release branch is taken, rather than just before it.

I wasn’t suggesting that we proceed with the full release with GISel enabled by default if there are serious issues, I was more saying that we have additional time during the RC period to test further on trunk, and then merge in fixes into the RC. Worst case is that we back out the change completely but I doubt it’ll get to that stage. I do agree that trunk sees more of the testing coverage. If one month isn’t enough, what would have been the cut off point?

Thanks,

Amara

Kristof Beyls via llvm-dev

unread,

Dec 18, 2017, 10:25:41 AM12/18/17

to Amara Emerson, llvm-dev, nd

On 18 Dec 2017, at 15:11, Amara Emerson <aeme...@apple.com> wrote:

On Dec 18, 2017, at 12:37 PM, Kristof Beyls <Kristo...@arm.com> wrote:

Hi Amara,

My reasons for preferring the switch to happen after the release branch is based on the following observations:

As far as I can see, the projects and products following top-of-trunk tend to test much more extensively than the testing that happens for llvm.org releases. I expect that the llvm.org release testing won’t find potential remaining issues, whereas the projects and products following trunk are far more likely to find remaining issues with GlobalISel. Therefore, I think it’s best to give as much time as possible for these projects/products following top-of-trunk to find issues before there’s an llvm.org release with GlobalISel enabled-by-default.
For the projects and products that follow top-of-trunk that I know off, if globalisel-by-default would really break things, it’s possible to disable it within that project/product without end users of it needing to know about this. If for the llvm.org release it would turn out that globalisel has a big issue after release, we’d need to somehow let all users of the release know to disable it by adding a convoluted command line option (‘-mllvm -global-isel=0’?)

Combining the 2 observations above, I think it’s better to do the switch shortly after a release branch is taken, rather than just before it.

I wasn’t suggesting that we proceed with the full release with GISel enabled by default if there are serious issues, I was more saying that we have additional time during the RC period to test further on trunk, and then merge in fixes into the RC. Worst case is that we back out the change completely but I doubt it’ll get to that stage. I do agree that trunk sees more of the testing coverage. If one month isn’t enough, what would have been the cut off point?

Thanks,

Amara

Yeah, I agree that the time needed before the next release happens when switching is debatable.

After looking it up, I see that the final release tag is planned for no earlier than 21 February 2018, so we’d indeed have a bit of time to revert it on the release branch if serious issues were discovered.

Seeing the actual release date is that far out, I don’t have strong objections to flipping the switch now if others want to press on with it.

Thanks,

Kristof

Amara Emerson via llvm-dev

unread,

Dec 18, 2017, 12:44:19 PM12/18/17

to Kristof Beyls, llvm-dev, nd

Ok. We’ll look at what we can do to further stress test it in the next two months, additional suggestions from the community is welcome. Patch should be incoming to enable it later today.

Thanks,

Amara

Amara Emerson via llvm-dev

unread,

Dec 18, 2017, 2:14:45 PM12/18/17

to llvm-dev, nd

FYI all: the patch to enable it is available to review here: https://reviews.llvm.org/D41362

Thanks,

Amara