[llvm-dev] New pass manager for optimization pipeline status and questions

Arthur Eubanks via llvm-dev

unread,

Jul 22, 2020, 5:39:31 PM7/22/20

to llvm-dev

Hi all,

I wanted to give a quick update on the status of NPM for the IR optimization pipeline and ask some questions.

In the past I believe there were thoughts that NPM was basically ready because all of check-llvm and check-clang passed when -DENABLE_EXPERIMENTAL_NEW_PASS_MANAGER=ON was specified. But that CMake flag did not apply to opt and any tests running something like `opt -foo-pass -bar-pass` (which is the vast majority of check-llvm tests) were still using the legacy PM. The intended way to use NPM was to use the -passes flag, e.g. `opt -passes='foo,bar'`.

I've added a -enable-new-pm flag to opt to force running NPM passes even when `opt -foo-pass` is used. This is because I didn't want to go through every single test and figure out which ones should be using both -foo-pass and -passes=foo. Switching on -enable-new-pm currently leads to ~1800 check-llvm failures. I've documented the failed tests count per directory in https://bugs.llvm.org/show_bug.cgi?id=46651 (some have been fixed since that was posted).

This has led to real bugs in NPM being discovered and fixed (e.g. some optnone issues).

But a large portion of the remaining failures are because codegen-only passes haven't been ported to NPM yet. That's fine for the optimization pipeline NPM transition since it doesn't affect the optimization pipeline, but it does present an issue with the approach of the -enable-new-pm flag (which would by default become true alongside the NPM transition). Lots of tests are testing codegen-specific passes via opt (e.g. `opt -amdgpu-lower-intrinsics`) and they can't use NPM (yet).

I was thinking either we have a way of identifying codegen-only passes and revert back to the legacy PM in opt whenever we see one, or we go back to considering the originally intended approach of adding an equivalent `-passes=` RUN to all tests that should be also running under NPM.

I'm not sure of a nice and clean solution to identify codegen-only passes. We could go and update every instance of INITIALIZE_PASS to take another parameter indicating if it's codegen-only. Or we could just have a central list somewhere where we check if the pass is in some hardcoded list or has some prefix (e.g. "x86-").

The approach of adding equivalent `-passes=` RUN lines to all relevant tests seems daunting, but not exactly sure how daunting. Maybe it's possible to script something and see what fails? We'd still need some way to identify codegen-only passes to make sure we don't miss anything, and we'd need to distinguish between analyses and normal passes. Also, it would slow down test execution since we'd run a lot more tests twice, but maybe that's not such a big deal? Maybe it's good to have most tests running against the legacy PM even when NPM is on by default?

Thoughts?

This is split off from http://lists.llvm.org/pipermail/llvm-dev/2020-July/143395.html.

Philip Reames via llvm-dev

unread,

Jul 22, 2020, 6:15:00 PM7/22/20

to Arthur Eubanks, llvm-dev

(I'm probably going to derail your thread, sorry about that.)

I think at this point, we should just bite the bullet and make the switch to NPM by default for Clang's optimization pipeline. Today.

Why? Because many of our downstream consumers have already switched. Google has. We (Azul) have. I think I've heard the same for a couple other major contributors. Why does this matter? Testing. At the current moment, the vast majority of testing the project gets is exercising NPM, not LPM.

NPM is functionally complete for Clang optimization. There might be a few missing cases around the sanitizers, but last I heard those were on the edge of being fixed.

I think we should make the switch, and deal with any fall out as regressions. If we made the change immediately after a release branch, we'd have several months to address any major issues before the next release.

Philip

_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Chen, Yuanfang via llvm-dev

unread,

Jul 22, 2020, 7:05:19 PM7/22/20

to Arthur Eubanks, LLVM Developers' List

From: llvm-dev <llvm-dev...@lists.llvm.org> On Behalf Of Arthur Eubanks via llvm-dev
Sent: Wednesday, July 22, 2020 2:39 PM
To: llvm-dev <llvm...@lists.llvm.org>
Subject: [llvm-dev] New pass manager for optimization pipeline status and questions

Hi all,

I wanted to give a quick update on the status of NPM for the IR optimization pipeline and ask some questions.

In the past I believe there were thoughts that NPM was basically ready because all of check-llvm and check-clang passed when -DENABLE_EXPERIMENTAL_NEW_PASS_MANAGER=ON was specified. But that CMake flag did not apply to opt and any tests running something like `opt -foo-pass -bar-pass` (which is the vast majority of check-llvm tests) were still using the legacy PM. The intended way to use NPM was to use the -passes flag, e.g. `opt -passes='foo,bar'`.

I've added a -enable-new-pm flag to opt to force running NPM passes even when `opt -foo-pass` is used. This is because I didn't want to go through every single test and figure out which ones should be using both -foo-pass and -passes=foo. Switching on -enable-new-pm currently leads to ~1800 check-llvm failures. I've documented the failed tests count per directory in https://bugs.llvm.org/show_bug.cgi?id=46651 (some have been fixed since that was posted).

This has led to real bugs in NPM being discovered and fixed (e.g. some optnone issues).

But a large portion of the remaining failures are because codegen-only passes haven't been ported to NPM yet. That's fine for the optimization pipeline NPM transition since it doesn't affect the optimization pipeline, but it does present an issue with the approach of the -enable-new-pm flag (which would by default become true alongside the NPM transition). Lots of tests are testing codegen-specific passes via opt (e.g. `opt -amdgpu-lower-intrinsics`) and they can't use NPM (yet).

I think the ideal way is just to port these to NPM. The problem is if the opt pipeline NPM switch is blocked on this, we’re forcing the targets to start porting which I’m not sure if target owners want to do.

On the other end, we’re (almost) actually ready to make these target IR passes use NPM for testing purpose with `opt` tool. Except there needs a way to expose these passes through llvm/lib/Target, llvm/lib/CodeGen rather than llvm/lib/Passes to `opt`. As part of the codegen using NPM work, this is almost done.

I was thinking either we have a way of identifying codegen-only passes and revert back to the legacy PM in opt whenever we see one, or we go back to considering the originally intended approach of adding an equivalent `-passes=` RUN to all tests that should be also running under NPM.

I would prefer the former since it sounds less pervasive.

I'm not sure of a nice and clean solution to identify codegen-only passes. We could go and update every instance of INITIALIZE_PASS to take another parameter indicating if it's codegen-only. Or we could just have a central list somewhere where we check if the pass is in some hardcoded list or has some prefix (e.g. "x86-").

The latter seems in line with the progress has been made on codegen using NPM work where each target would maintain their own pass registry like PassRegister.def. For x86, it is (tentatively) called X86PassRegistry.def. (https://reviews.llvm.org/D83613). If It is useful for the opt pipeline NPM switch, we could find a way to factor it out.

Eric Christopher via llvm-dev

unread,

Jul 22, 2020, 9:06:20 PM7/22/20

to Philip Reames, Alina Sbirlea, Chandler Carruth, llvm-dev

FWIW I'm in favor of this direction while making sure that we keep focus on removing the vestiges of the old pass manager for the code health impact to the project.

-eric

Jay Foad via llvm-dev

unread,

Jul 23, 2020, 5:41:48 AM7/23/20

to Chen, Yuanfang, LLVM Developers' List

On Thu, 23 Jul 2020 at 00:05, Chen, Yuanfang via llvm-dev
<llvm...@lists.llvm.org> wrote:
> > But a large portion of the remaining failures are because codegen-only passes haven't been ported to NPM yet. That's fine for the optimization pipeline NPM transition since it doesn't affect the optimization pipeline, but it does present an issue with the approach of the -enable-new-pm flag (which would by default become true alongside the NPM transition). Lots of tests are testing codegen-specific passes via opt (e.g. `opt -amdgpu-lower-intrinsics`) and they can't use NPM (yet).
>
> I think the ideal way is just to port these to NPM. The problem is if the opt pipeline NPM switch is blocked on this, we’re forcing the targets to start porting which I’m not sure if target owners want to do.

I can't speak for other targets but I'd love to get AMDGPU passes
converted to the NPM. Is there a howto somewhere?

Thanks,
Jay.

Sjoerd Meijer via llvm-dev

unread,

Jul 23, 2020, 5:59:49 AM7/23/20

to Philip Reames, Alina Sbirlea, Chandler Carruth, Eric Christopher, llvm-dev

I am not in favour of just flipping the switch and then deal with all the fall-out, because we see major regressions that would be unacceptable for our users. Thus, not only would this be very disruptive, also our releases are based on a certain trunk versions, so we would need to revert back to the legacy PM downstream and thus diverge from upstream which wouldn't be ideal for us. About the regressions, see the message/thread that I kicked off earlier (http://lists.llvm.org/pipermail/llvm-dev/2020-July/143646.html) which was quickly followed up by this thread.

I would like to see here if we are interesting in defining a few criteria that must be met before we switch:

Correctness, which obviously always must come first: looks like this is covered by bots that are running with the NPM, and by downstream users. From the latest messages, I am getting we are there, or nearly there.
Performance (i.e. optimising for speed),
Code-size.

With 1) correctness box covered and ticked, is now the time to look at codegen quality: 2) performance and 3) code-size? Would it be reasonable that we create a plan or timeline to address this, and thus allow time that these issues can be addressed?

We are now ready to start tuning the NPM for code-size. Perhaps we are late to the NPM party (but that was a priority and bandwidth issue), but perhaps with correctness fixed this is actually the right time. I only ran numbers for code-size, and haven't even looked at performance numbers yet, which we would also need to do and takes time.

Cheers,

Sjoerd.

From: llvm-dev <llvm-dev...@lists.llvm.org> on behalf of Eric Christopher via llvm-dev <llvm...@lists.llvm.org>
Sent: 23 July 2020 02:05
To: Philip Reames <list...@philipreames.com>; Alina Sbirlea <asbi...@google.com>; Chandler Carruth <chan...@gmail.com>
Cc: llvm-dev <llvm...@lists.llvm.org>
Subject: Re: [llvm-dev] New pass manager for optimization pipeline status and questions

Hans Wennborg via llvm-dev

unread,

Jul 23, 2020, 12:27:53 PM7/23/20

to Philip Reames, llvm-dev

While I'm also excited about moving to the new pass manager, I do
think you may be overestimating how many users have switched when you
write that the vast majority of testing is now using the NPM.

My guess would instead be that the vast majority of projects using
LLVM are using default pass manager. I know Chrome uses the default. I
know that's how the upstream releases are built and tested. Defaults
tend to be popular. All our docs about writing passes and such seem to
assume the default pm.

I guess what I'm saying is I think the transition has to be more
careful than biting the bullet and throwing the switch today.

Arthur Eubanks via llvm-dev

unread,

Jul 23, 2020, 1:58:37 PM7/23/20

to Chen, Yuanfang, LLVM Developers' List

On Wed, Jul 22, 2020 at 4:05 PM Chen, Yuanfang <Yuanfa...@sony.com> wrote:

From: llvm-dev <llvm-dev...@lists.llvm.org> On Behalf Of Arthur Eubanks via llvm-dev
Sent: Wednesday, July 22, 2020 2:39 PM
To: llvm-dev <llvm...@lists.llvm.org>
Subject: [llvm-dev] New pass manager for optimization pipeline status and questions

Hi all,

I wanted to give a quick update on the status of NPM for the IR optimization pipeline and ask some questions.

In the past I believe there were thoughts that NPM was basically ready because all of check-llvm and check-clang passed when -DENABLE_EXPERIMENTAL_NEW_PASS_MANAGER=ON was specified. But that CMake flag did not apply to opt and any tests running something like `opt -foo-pass -bar-pass` (which is the vast majority of check-llvm tests) were still using the legacy PM. The intended way to use NPM was to use the -passes flag, e.g. `opt -passes='foo,bar'`.

I've added a -enable-new-pm flag to opt to force running NPM passes even when `opt -foo-pass` is used. This is because I didn't want to go through every single test and figure out which ones should be using both -foo-pass and -passes=foo. Switching on -enable-new-pm currently leads to ~1800 check-llvm failures. I've documented the failed tests count per directory in https://bugs.llvm.org/show_bug.cgi?id=46651 (some have been fixed since that was posted).

This has led to real bugs in NPM being discovered and fixed (e.g. some optnone issues).

But a large portion of the remaining failures are because codegen-only passes haven't been ported to NPM yet. That's fine for the optimization pipeline NPM transition since it doesn't affect the optimization pipeline, but it does present an issue with the approach of the -enable-new-pm flag (which would by default become true alongside the NPM transition). Lots of tests are testing codegen-specific passes via opt (e.g. `opt -amdgpu-lower-intrinsics`) and they can't use NPM (yet).

I think the ideal way is just to port these to NPM. The problem is if the opt pipeline NPM switch is blocked on this, we’re forcing the targets to start porting which I’m not sure if target owners want to do.

On the other end, we’re (almost) actually ready to make these target IR passes use NPM for testing purpose with `opt` tool. Except there needs a way to expose these passes through llvm/lib/Target, llvm/lib/CodeGen rather than llvm/lib/Passes to `opt`. As part of the codegen using NPM work, this is almost done.

Grepping for INITIALIZE_PASS in llvm/lib/Target, there are 220 files with that. That means there are ~200 target specific passes. There's no way we'll be able to port all of those in a reasonable amount of time.

Plus it's not a blocker if we can detect passes are target specific and simply revert to the legacy PM in opt. Then as codegen passes are ported to NPM we can start running those on NPM instead reverting back to the legacy PM. The exact mechanism for this TBD, although probably something similar to below.

I was thinking either we have a way of identifying codegen-only passes and revert back to the legacy PM in opt whenever we see one, or we go back to considering the originally intended approach of adding an equivalent `-passes=` RUN to all tests that should be also running under NPM.

I would prefer the former since it sounds less pervasive.

I'm not sure of a nice and clean solution to identify codegen-only passes. We could go and update every instance of INITIALIZE_PASS to take another parameter indicating if it's codegen-only. Or we could just have a central list somewhere where we check if the pass is in some hardcoded list or has some prefix (e.g. "x86-").

The latter seems in line with the progress has been made on codegen using NPM work where each target would maintain their own pass registry like PassRegister.def. For x86, it is (tentatively) called X86PassRegistry.def. (https://reviews.llvm.org/D83613). If It is useful for the opt pipeline NPM switch, we could find a way to factor it out.

If we can add an equivalent of something like PassBuilder::isAAPassName() to https://reviews.llvm.org/D83613 (X86CodeGenPassBuilder::isX86CodeGenPassName()?) that would definitely work.

The approach of adding equivalent `-passes=` RUN lines to all relevant tests seems daunting, but not exactly sure how daunting. Maybe it's possible to script something and see what fails? We'd still need some way to identify codegen-only passes to make sure we don't miss anything, and we'd need to distinguish between analyses and normal passes. Also, it would slow down test execution since we'd run a lot more tests twice, but maybe that's not such a big deal? Maybe it's good to have most tests running against the legacy PM even when NPM is on by default?

Thoughts?

This is split off from http://lists.llvm.org/pipermail/llvm-dev/2020-July/143395.html.

I can't speak for other targets but I'd love to get AMDGPU passes converted to the NPM. Is there a howto somewhere?

I don't think so, but it's usually fairly straightforward, looking at other passes that exist in both PMs is usually good. I should rewrite https://llvm.org/docs/WritingAnLLVMPass.html to reference NPM.

But specifically for codegen passes we're currently blocked on ychen's work for codegen passes infra in NPM: http://lists.llvm.org/pipermail/llvm-dev/2020-July/143309.html.

Chen, Yuanfang via llvm-dev

unread,

Jul 23, 2020, 2:23:28 PM7/23/20

to Arthur Eubanks, LLVM Developers' List

From: Arthur Eubanks <aeub...@google.com>
Sent: Thursday, July 23, 2020 10:58 AM
To: Chen, Yuanfang <Yuanfa...@sony.com>
Cc: LLVM Developers' List <llvm...@lists.llvm.org>
Subject: Re: [llvm-dev] New pass manager for optimization pipeline status and questions

On Wed, Jul 22, 2020 at 4:05 PM Chen, Yuanfang <Yuanfa...@sony.com> wrote:

From: llvm-dev <llvm-dev...@lists.llvm.org> On Behalf Of Arthur Eubanks via llvm-dev
Sent: Wednesday, July 22, 2020 2:39 PM
To: llvm-dev <llvm...@lists.llvm.org>
Subject: [llvm-dev] New pass manager for optimization pipeline status and questions

Hi all,

I wanted to give a quick update on the status of NPM for the IR optimization pipeline and ask some questions.

In the past I believe there were thoughts that NPM was basically ready because all of check-llvm and check-clang passed when -DENABLE_EXPERIMENTAL_NEW_PASS_MANAGER=ON was specified. But that CMake flag did not apply to opt and any tests running something like `opt -foo-pass -bar-pass` (which is the vast majority of check-llvm tests) were still using the legacy PM. The intended way to use NPM was to use the -passes flag, e.g. `opt -passes='foo,bar'`.

I've added a -enable-new-pm flag to opt to force running NPM passes even when `opt -foo-pass` is used. This is because I didn't want to go through every single test and figure out which ones should be using both -foo-pass and -passes=foo. Switching on -enable-new-pm currently leads to ~1800 check-llvm failures. I've documented the failed tests count per directory in https://bugs.llvm.org/show_bug.cgi?id=46651 (some have been fixed since that was posted).

This has led to real bugs in NPM being discovered and fixed (e.g. some optnone issues).

But a large portion of the remaining failures are because codegen-only passes haven't been ported to NPM yet. That's fine for the optimization pipeline NPM transition since it doesn't affect the optimization pipeline, but it does present an issue with the approach of the -enable-new-pm flag (which would by default become true alongside the NPM transition). Lots of tests are testing codegen-specific passes via opt (e.g. `opt -amdgpu-lower-intrinsics`) and they can't use NPM (yet).

I think the ideal way is just to port these to NPM. The problem is if the opt pipeline NPM switch is blocked on this, we’re forcing the targets to start porting which I’m not sure if target owners want to do.

On the other end, we’re (almost) actually ready to make these target IR passes use NPM for testing purpose with `opt` tool. Except there needs a way to expose these passes through llvm/lib/Target, llvm/lib/CodeGen rather than llvm/lib/Passes to `opt`. As part of the codegen using NPM work, this is almost done.

Grepping for INITIALIZE_PASS in llvm/lib/Target, there are 220 files with that. That means there are ~200 target specific passes. There's no way we'll be able to port all of those in a reasonable amount of time.

Plus it's not a blocker if we can detect passes are target specific and simply revert to the legacy PM in opt. Then as codegen passes are ported to NPM we can start running those on NPM instead reverting back to the legacy PM. The exact mechanism for this TBD, although probably something similar to below.

220 should include machine passes. If you grep “public FunctionPass/ModulePass etc.”, we should be looking at ~60 and not every each of them is registered (tested by `opt`). A lot of them are from AMDGPU. Other than AMDGPU, each target has 2 or 3 on average. If AMDGPU is willing to actively be involved in this and the other targets could migrate theirs, this could not only helps opt pipeline switch but also the codegen pipeline switch. Again, I meant to say these are all “ideal” situation.

I was thinking either we have a way of identifying codegen-only passes and revert back to the legacy PM in opt whenever we see one, or we go back to considering the originally intended approach of adding an equivalent `-passes=` RUN to all tests that should be also running under NPM.

I would prefer the former since it sounds less pervasive.

I'm not sure of a nice and clean solution to identify codegen-only passes. We could go and update every instance of INITIALIZE_PASS to take another parameter indicating if it's codegen-only. Or we could just have a central list somewhere where we check if the pass is in some hardcoded list or has some prefix (e.g. "x86-").

The latter seems in line with the progress has been made on codegen using NPM work where each target would maintain their own pass registry like PassRegister.def. For x86, it is (tentatively) called X86PassRegistry.def. (https://reviews.llvm.org/D83613). If It is useful for the opt pipeline NPM switch, we could find a way to factor it out.

If we can add an equivalent of something like PassBuilder::isAAPassName() to https://reviews.llvm.org/D83613 (X86CodeGenPassBuilder::isX86CodeGenPassName()?) that would definitely work.

That should be straightforward. I’ll see if I could split it out from the patch.

Alina Sbirlea via llvm-dev

unread,

Jul 24, 2020, 2:51:46 PM7/24/20

to Sjoerd Meijer, llvm-dev

Hi all,

The current plan is to prioritize enabling the NPM as soon as possible, and that includes addressing any blockers that are known or arise. This means prioritizing those blockers over other LLVM work. The current umbrella bug is PR46649.

Philip's point is spot on that we are deficient now in the testing of the LegacyPassManager, because so many have already made the switch (FWIW Google switched more than 2 years ago).

It's not constructive for the LLVM community to just flip the switch and break current LPM users. The purpose of these communications to llvm-dev and the bug tracking is to be informative as to the planned direction and make as quick of a progress as possible.

Please keep in mind that the work on the NPM has been going on for many years and many customers have switched years ago, and delaying this for even an additional year is not acceptable for the code health and stability of LLVM.

My point is that we want and should work with users to make the transition smooth, but we do very much need user (meaning companies using LLVM) involvement here in order to not delay the switch further.

Best,

Alina

Sjoerd Meijer via llvm-dev

unread,

Jul 24, 2020, 3:55:04 PM7/24/20

to Alina Sbirlea, llvm-dev

Hi Alina,

I think this is an excellent direction, this is the direction we should take here. Just a somewhat irrelevant disagreement on this though:

> Philip's point is spot on that we are deficient now in the testing of the LegacyPassManager,

I disagree because the LPM is still the default and I appreciated Hans' reply: "Defaults tend to be popular". But this is the direction I like:

> This means prioritizing those blockers over other LLVM work. The current umbrella bug is PR46649.

Just checking: do you accept both performance and code-size regressions as blockers here?

> My point is that we want and should work with users to make the transition smooth, but we do very much need user (meaning companies using LLVM) involvement here in order to not delay the switch further.

That's clear, and agreed.

I would like to remark here that currently, when a commit regresses one benchmark that is important for someone, that is enough justification most of the time for a revert of that commit. That's why I surprised that it looked like we were not setting code-quality goals and requirements before switching. And what I would like to ask here is to provide reasonable enough time for people to look into switching to the NPM, to evaluate this, and then file bugs under PR46649. Just collecting data, evaluating problems, filings bugs can already time-consuming, and then I guess they need fixing too. This also needs to fit in people's plans right now.

But it sounds reasonable to me that this is time-boxed. Given that switching is quite some work I think, switching before the clang-12 release would be unreasonable, and if clang-13 is in half a year from now, that already sounds perhaps somewhat reasonable, but might be tight.

Thanks,

Sjoerd.

From: Alina Sbirlea <asbi...@google.com>
Sent: 24 July 2020 19:51
To: Sjoerd Meijer <Sjoerd...@arm.com>
Cc: Philip Reames <list...@philipreames.com>; Chandler Carruth <chan...@gmail.com>; Eric Christopher <echr...@gmail.com>; llvm-dev <llvm...@lists.llvm.org>

Arthur Eubanks via llvm-dev

unread,

Jul 27, 2020, 5:03:45 PM7/27/20

to Sjoerd Meijer, llvm-dev, Alina Sbirlea

> Just checking: do you accept both performance and code-size regressions as blockers here?

That seems reasonable (with the knowledge that there are always tradeoffs between fixing things and shipping things).

I think there has been a lot of performance work targeted toward NPM that the legacy PM doesn't have, so performance issues might be less likely than code size issues. But either way, please file bugs (with nice repro instructions) blocking the umbrella bug.

Alina Sbirlea via llvm-dev

unread,

Jul 28, 2020, 1:23:51 PM7/28/20

to Sjoerd Meijer, llvm-dev

On Fri, Jul 24, 2020 at 12:54 PM Sjoerd Meijer <Sjoerd...@arm.com> wrote:

Hi Alina,

I think this is an excellent direction, this is the direction we should take here. Just a somewhat irrelevant disagreement on this though:

> Philip's point is spot on that we are deficient now in the testing of the LegacyPassManager,

I disagree because the LPM is still the default and I appreciated Hans' reply: "Defaults tend to be popular". But this is the direction I like:

> This means prioritizing those blockers over other LLVM work. The current umbrella bug is PR46649.

Just checking: do you accept both performance and code-size regressions as blockers here?

Yes, I think big performance and code-size regressions need to be investigated. It will be hard to quantify what "big" is though, but we should definitely track such regressions.

Due to the time-boxed approach we may need to discuss moving forward with the switch with some regressions deemed acceptable, but those considered blockers should be prioritized of course.

> My point is that we want and should work with users to make the transition smooth, but we do very much need user (meaning companies using LLVM) involvement here in order to not delay the switch further.

That's clear, and agreed.

I would like to remark here that currently, when a commit regresses one benchmark that is important for someone, that is enough justification most of the time for a revert of that commit. That's why I surprised that it looked like we were not setting code-quality goals and requirements before switching. And what I would like to ask here is to provide reasonable enough time for people to look into switching to the NPM, to evaluate this, and then file bugs under PR46649. Just collecting data, evaluating problems, filings bugs can already time-consuming, and then I guess they need fixing too. This also needs to fit in people's plans right now.

But it sounds reasonable to me that this is time-boxed. Given that switching is quite some work I think, switching before the clang-12 release would be unreasonable, and if clang-13 is in half a year from now, that already sounds perhaps somewhat reasonable, but might be tight.

I think it's reasonable to add a firm switch before clang-13 and before the end of this year, with intermediary milestones (e.g. filing blockers for user regressions in the next 1-2 months).

I'm inclined to favor a tighter deadline, the motivation here being to ensure that working on potential blockers is prioritized with plenty of time to spare, so the switch remains time-boxed.

Best,

Alina

Sjoerd Meijer via llvm-dev

unread,

Jul 29, 2020, 6:37:53 AM7/29/20

to Alina Sbirlea, llvm-dev

Hi,

> Yes, I think big performance and code-size regressions need to be investigated. It will be hard to quantify what "big" is though, but we should definitely track such regressions.

> Due to the time-boxed approach we may need to discuss moving forward with the switch with some regressions deemed acceptable, but those considered blockers should be prioritized of course.

Agreed, SGTM.

> I think it's reasonable to add a firm switch before clang-13 and before the end of this year, with intermediary milestones (e.g. filing blockers for user regressions in the next 1-2 months).

> I'm inclined to favor a tighter deadline, the motivation here being to ensure that working on potential blockers is prioritized with plenty of time to spare, so the switch remains time-boxed.

I am not entirely sure yet about the tighter deadline, but I am optimistic things might start falling into place soon. Let me explain.

I think you've noticed the code-size issue that I raised. Switching before that being fixed would be a real non-starter for us, but if there's consensus the community want to switch then we'll deal with that of course. Because (some) of these regression are so big, I am hoping we are missing something obvious and that with a few things fixed we solve most of these cases, which is why I am optimistic, and also because of next point:

After code-size, I'm now looking at performance, and things look a lot better there for us. I do see some up and down behaviour: some wins in some benchmarks overall, but also some regressions. I will need to do some more homework here, as I need to check if small downstream divergence play a role here, and need to eliminate that. In some benchmarks where we win overall, there are a few cases that we really don't want to regress, so I need to look into this. I think these are problems that we need to fix, but I will start raising issues for them just for visibility. Overall, there are less to none non-starters in this area I think.

But because these are time-consuming exercises, and it's the summer months, I am getting slightly nervous about the tight(er) deadline. :-)

Cheers,

Sjoerd.

From: Alina Sbirlea <asbi...@google.com>
Sent: 28 July 2020 18:23

Alina Sbirlea via llvm-dev

unread,

Jul 30, 2020, 1:13:31 AM7/30/20

to Sjoerd Meijer, llvm-dev

On Wed, Jul 29, 2020 at 3:37 AM Sjoerd Meijer <Sjoerd...@arm.com> wrote:

Hi,

> Yes, I think big performance and code-size regressions need to be investigated. It will be hard to quantify what "big" is though, but we should definitely track such regressions.

> Due to the time-boxed approach we may need to discuss moving forward with the switch with some regressions deemed acceptable, but those considered blockers should be prioritized of course.

Agreed, SGTM.

> I think it's reasonable to add a firm switch before clang-13 and before the end of this year, with intermediary milestones (e.g. filing blockers for user regressions in the next 1-2 months).

> I'm inclined to favor a tighter deadline, the motivation here being to ensure that working on potential blockers is prioritized with plenty of time to spare, so the switch remains time-boxed.

I am not entirely sure yet about the tighter deadline, but I am optimistic things might start falling into place soon. Let me explain.

I think you've noticed the code-size issue that I raised. Switching before that being fixed would be a real non-starter for us, but if there's consensus the community want to switch then we'll deal with that of course.

Because (some) of these regression are so big, I am hoping we are missing something obvious and that with a few things fixed we solve most of these cases, which is why I am optimistic, and also because of next point:

After code-size, I'm now looking at performance, and things look a lot better there for us. I do see some up and down behaviour: some wins in some benchmarks overall, but also some regressions. I will need to do some more homework here, as I need to check if small downstream divergence play a role here, and need to eliminate that. In some benchmarks where we win overall, there are a few cases that we really don't want to regress, so I need to look into this. I think these are problems that we need to fix, but I will start raising issues for them just for visibility. Overall, there are less to none non-starters in this area I think.

But because these are time-consuming exercises, and it's the summer months, I am getting slightly nervous about the tight(er) deadline. :-)

I think it's great to make it clear which items are critical for you, and having those as blocker makes perfect sense. The code-size issue you added to the umbrella bug makes sense to me, thank you for filing that. I think it's also fair to ask for reasonable delays or help addressing such issues.

I completely agree that discovering and addressing these is a time-consuming endeavour, so I think there's some room for flexibility here. Again, the main reason I'm advocating for a preliminary tighter deadline is so we can ensure prioritization of discovering and addressing such blockers early. As long as the extended deadline (end of this year and clang-13) remains firm I think we'd be in a good position.

A tighter deadline is useful as a checkpoint with the community of: "who is regressed if we switched today", "are the regressions at this point acceptable to make the switch", "what is the status of the investigations into these regressions", "what resources are needed to make progress in the remaining blockers". I propose having the tighter deadline half-way through (middle of October) and work towards evaluating the status of the switch then and decide on extensions as needed. Does this sound like a reasonable plan?

The end goal is for the flag flip to be an overall benefit, not to cause disruptions :-).

Best,

Alina

Cheers,

Sjoerd.

I think the progress so far looks great

Alina Sbirlea via llvm-dev

unread,

Jan 8, 2021, 6:55:55 PM1/8/21

to llvm-dev, Matt Arsenault

Hello,

Reviving this thread as the switch to the new pass manager is getting very close now.

Arthur has been resolving most of the issues, and the remaining ones are tracked under the umbrella bug here: https://bugs.llvm.org/show_bug.cgi?id=46649.

Regarding the opt failures, the two remaining ones (AMD related) were posted in this discussion: https://lists.llvm.org/pipermail/llvm-dev/2020-December/147130.html.

Thank you to everyone who has done offline testing, opening bugs to track and sending and reviewing patches to move forward the switch.

With the switch getting so close, it would help have a smooth transition if any new issues you encounter were filed under the umbrella bug now.

Thank you,

Alina

Sjoerd Meijer via llvm-dev

unread,

Jan 10, 2021, 6:36:34 AM1/10/21

to Alina Sbirlea, llvm-dev, Matt Arsenault

Hello,

Our main showstopper PR46858 (code-size) has been resolved. Many thanks to Arthur for fixing this.

We have seen performance problems too, but first needed to port things to the NPM to get a better picture of this, which is what we've done. Now I'd need to reassess where we these performance problems, and hope to do that before the end of this week.

Cheers,
Sjoerd.

From: Alina Sbirlea <alina....@gmail.com>
Sent: 08 January 2021 23:55
To: llvm-dev <llvm...@lists.llvm.org>
Cc: Arthur Eubanks <aeub...@google.com>; Sharma, Reshabh Kumar <Reshabhku...@amd.com>; Matt Arsenault <ars...@gmail.com>; nhae...@gmail.com <nhae...@gmail.com>; Sjoerd Meijer <Sjoerd...@arm.com>; Philip Reames <list...@philipreames.com>; Chen, Yuanfang <Yuanfa...@sony.com>

Sjoerd Meijer via llvm-dev

unread,

Jan 11, 2021, 6:45:24 PM1/11/21

to Alina Sbirlea, llvm-dev, Arthur Eubanks

Hi Alina & Arthur,

I've investigated the performance impact for us and can now say a little bit more now where we are. Switching now would lead to performance regressions (*) in the initial set of 4 benchmarks we care about. One benchmark is overall neutral but shows regressions where we don't really want them. Hopefully, your fix for PR48715 that I raised today will solve that, many thanks for the amazingly speedy reply! A second benchmark is a bit of disaster, still need to look into that, and a third shows a relatively small regression but it's significant for that benchmark and am looking into that now, and need to look into the fourth benchmark.

Code-generation is *very* different for the cases I am look at and I am profiling and analysing things, for which I need some time. This leads to my question: can you remind me about the timelines? I hope we can work in tandem to have at least the major issues resolved before we switch.

Thanks for working on this, and for your help and speedy replies,

Sjoerd.

(*) I am mostly looking at smaller cores, and very tight loops (baremetal) and guess that most people look at bigger cores where some of these codegen differences have no or a different impact.

From: Sjoerd Meijer <Sjoerd...@arm.com>
Sent: 10 January 2021 11:36
To: Alina Sbirlea <alina....@gmail.com>; llvm-dev <llvm...@lists.llvm.org>
Cc: Arthur Eubanks <aeub...@google.com>; Sharma, Reshabh Kumar <Reshabhku...@amd.com>; Matt Arsenault <ars...@gmail.com>; nhae...@gmail.com <nhae...@gmail.com>; Philip Reames <list...@philipreames.com>; Chen, Yuanfang <Yuanfa...@sony.com>

Arthur Eubanks via llvm-dev

unread,

Jan 12, 2021, 2:20:46 PM1/12/21

to Sjoerd Meijer, llvm-dev

The timeline is not set in stone, but I'd say it's "ASAP" assuming all major blockers are resolved. At this point it's mainly just AMDGPU-specific issues. People can investigate turning on the new PM for their codebases and file bugs (like you are doing, thanks!) for me to look into before flipping the default. But when those are resolved I'll assume all major blockers are fixed and send out an RFC to change the default PM.

Moving to one pass manager (at least for the optimization pipeline) is fairly important for LLVM's code health, and the compile time improvements are nice too.

Of course, people can pin to the legacy PM after the switch if there are regressions that come up, then file bugs that I can look at. If there is a major common regression then we'll have to revert the switch, but for more isolated ones I'd rather keep the new PM as the default PM and ask people to use the legacy PM.

Reid Kleckner via llvm-dev

unread,

Jan 12, 2021, 3:51:28 PM1/12/21

to Sjoerd Meijer, llvm-dev

Keep in mind that LLVM will continue to support the old pass manager via a cmake option, so if you are a vendor, you can fallback to the old pass manager and migrate on your own timeline. However, there are costs to diverging from upstream. As the community migrates to the NPM, the old pass manager will become less tested over time, and it may accumulate bugs or performance regressions.

Philip Reames via llvm-dev

unread,

Jan 12, 2021, 4:29:28 PM1/12/21

to Arthur Eubanks, Sjoerd Meijer, llvm-dev

Just wanted to comment that I'm thrilled to see us at this point finally. This is long overdue. Arthur, thank you for all the work making this happen!

Philip

Alina Sbirlea via llvm-dev

unread,

Jan 12, 2021, 5:37:04 PM1/12/21

to Reid Kleckner, llvm-dev

To echo the others' replies, the timeline is not fixed but we're looking at flipping the default in `opt` as soon as the AMDGPU issues are addressed, and the CMake option shortly after that - order of within a week. Variations on the timeline depend on the bugs being filed in the meantime, hence the importance of testing and bug filing in the next couple of weeks. The bugs that are not addressed before the switch would still be worked on after.

As others have said, the cmake option to continue to use the legacy pass manager will remain in place, and the legacy pass manager code will not be removed from the tree for months after, while the transition stabilises. There's also the option to revert the switch once made for major regressions that may turn up, but the hope is to not get too many last minute surprises.

Sjoerd Meijer via llvm-dev

unread,

Jan 13, 2021, 8:17:14 AM1/13/21

to Alina Sbirlea, Reid Kleckner, llvm-dev

Hello,

Thanks for the update and elaborating on this. I am generally okay with the direction of switching when the blockers are resolved (and the timeline).

After the code-size problems, we have now increased our efforts to look at performance and are filing our reports. I just wanted to reiterate that we are looking at severe and generic problems/regressions that should affect all targets, which I think need to be solved first. We have made quite some progress and are looking at the last code base but that has the biggest regressions. I hope that we will have filled all our blockers next week, so that we can make up the balance of blockers in a week time.

It's understood we could fall back to the legacy pass manager, but I hope that such a divergence is a last-resort that we don't need to use.

Cheers,
Sjoerd.

From: Alina Sbirlea <alina....@gmail.com>
Sent: 12 January 2021 22:36
To: Reid Kleckner <r...@google.com>
Cc: Sjoerd Meijer <Sjoerd...@arm.com>; llvm-dev <llvm...@lists.llvm.org>; Arthur Eubanks <aeub...@google.com>

Subject: Re: [llvm-dev] New pass manager for optimization pipeline status and questions

Sjoerd Meijer via llvm-dev

unread,

Jan 18, 2021, 6:27:08 AM1/18/21

to Alina Sbirlea, Reid Kleckner, llvm-dev

Hi guys,

Just to let you know that we are happy to switch to the NPM as all our issues have been analysed, raised, or fixed.

Great stuff. Many thanks for working on this, and also for your help!

I am sure some things will pop up, but we will just deal with it then.

Cheers,
Sjoerd.

From: Sjoerd Meijer <Sjoerd...@arm.com>
Sent: 13 January 2021 13:17
To: Alina Sbirlea <alina....@gmail.com>; Reid Kleckner <r...@google.com>
Cc: llvm-dev <llvm...@lists.llvm.org>; Arthur Eubanks <aeub...@google.com>

Subject: Re: [llvm-dev] New pass manager for optimization pipeline status and questions

Reply all

Reply to author

Forward