[llvm-dev] Improved jump-threading in LLVM for finite state automata

Phipps, Alan via llvm-dev

unread,

Sep 23, 2020, 10:33:42 AM9/23/20

to llvm...@lists.llvm.org

It is my understanding that the implementation for jump-threading in LLVM is not presently able to effectively optimize code containing a state-machine implemented using a loop + switch. This is the case, for example, with the Coremark benchmark function core_state_transition(). Bug 42313 was filed to address this in 2019:

https://bugs.llvm.org/show_bug.cgi?id=42313

It appears that GCC improved support for jump threading in 2015 along the same lines:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54742

Is anyone aware of any plan to do improve LLVM jump-threading along the same lines for LLVM?

Thanks!

Alan Phipps

Eli Friedman via llvm-dev

unread,

Sep 23, 2020, 2:16:31 PM9/23/20

to Phipps, Alan, llvm...@lists.llvm.org

Nobody is currently working on this, as far as I know. If you’re interested in looking into it, I’ll try to answer any questions.

-Eli

Sjoerd Meijer via llvm-dev

unread,

Sep 23, 2020, 3:04:17 PM9/23/20

to Phipps, Alan, llvm...@lists.llvm.org, Eli Friedman, Evgeny Astigeevich

+ Evgeny

We have a jump threading pass downstream for this that we would love to upstream. I believe Evgeny was working on exactly this, i.e. preparing it for upstreaming.

From: llvm-dev <llvm-dev...@lists.llvm.org> on behalf of Eli Friedman via llvm-dev <llvm...@lists.llvm.org>
Sent: 23 September 2020 19:16
To: Phipps, Alan <a-ph...@ti.com>; llvm...@lists.llvm.org <llvm...@lists.llvm.org>
Subject: Re: [llvm-dev] Improved jump-threading in LLVM for finite state automata

Sjoerd Meijer via llvm-dev

unread,

Sep 23, 2020, 3:14:05 PM9/23/20

to Phipps, Alan, llvm...@lists.llvm.org, Eli Friedman, Evgeny Astigeevich, Sjoerd Meijer, David Green

And related while we are at it, i.e. the coremark specials, we have this sitting in upstream review: https://reviews.llvm.org/D42365

That should help a bit too. It needs a little bit of work, but I thought Dave didn't mind if someone commandeers and finishes it.

From: llvm-dev <llvm-dev...@lists.llvm.org> on behalf of Sjoerd Meijer via llvm-dev <llvm...@lists.llvm.org>
Sent: 23 September 2020 20:03
To: Phipps, Alan <a-ph...@ti.com>; llvm...@lists.llvm.org <llvm...@lists.llvm.org>; Eli Friedman <efri...@quicinc.com>; Evgeny Astigeevich <Evgeny.As...@arm.com>

Subject: Re: [llvm-dev] Improved jump-threading in LLVM for finite state automata

Philip Reames via llvm-dev

unread,

Sep 23, 2020, 3:48:06 PM9/23/20

to Phipps, Alan, llvm...@lists.llvm.org

No active work on my side, but I have given the topic of threaded interpreters (which is what I think you're wanting to produce) a good amount of thought.

I'm really not sure that switch is the right canonical form. The main reason being that having a loop over a large switch is very likely to encourage code motion which is generally profitable, but harmful in this particular context.

I had been thinking down the lines of representing the intepreter as a family of mutually recursive functions with a calling convention optimized for this case and using a musttail call through a lookup table for the dispatch.

I've played with the notion of extending clang with a custom attribute for guaranteed tail calls. I think this is pretty much the only extension needed to be able to natively write out a threaded interpreter as a set of mutually recursive functions.

This is all thought experiment from my side; I haven't had time to sit down and actually prototype any of this.

Philip

_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Phipps, Alan via llvm-dev

unread,

Sep 24, 2020, 4:43:56 PM9/24/20

to Sjoerd Meijer, llvm...@lists.llvm.org, Eli Friedman, Evgeny Astigeevich, David Green

This is great to hear, thanks for the update! Can you provide a timeline for when you expect to have the jump-threading update in review?

The loop-flattening pass also looks very useful. I’m sure we could help with the review -- I don’t know about taking ownership of it, at least right now.

-Alan

Ehsan Amiri via llvm-dev

unread,

Sep 28, 2020, 9:41:14 PM9/28/20

to Sjoerd Meijer, llvm...@lists.llvm.org

Hi Sjoerd

We (at Huawei) also have a pass for this. Originally we implemented this back in 2018 and meant to upstream it, but there were some issues with the implementation that required some changes in the code. We started revising it,a few weeks ago.

I thought now that there are multiple options, maybe we can discuss our approaches, and see if there is a preference in the community for one approach over the other ? What do you think?

Thanks

Ehsan

_______________________________________________

Sjoerd Meijer via llvm-dev

unread,

Sep 29, 2020, 4:23:06 AM9/29/20

to ehsan...@gmail.com, llvm...@lists.llvm.org

Hi Ehsan,

Evgeny uploaded our version here: https://reviews.llvm.org/D88307. He wants to address a few issues.

But yeah, if you've got one ready too, perhaps best to upload it if that's what you want to do. I guess that makes it easier to look at both and see which one would have our preference?

Cheers,

Sjoerd.

From: Ehsan Amiri <ehsan...@gmail.com>
Sent: 29 September 2020 02:37
To: Sjoerd Meijer <Sjoerd...@arm.com>
Cc: Phipps, Alan <a-ph...@ti.com>; llvm...@lists.llvm.org <llvm...@lists.llvm.org>; Eli Friedman <efri...@quicinc.com>; Evgeny Astigeevich <Evgeny.As...@arm.com>; David Green <David...@arm.com>

Paweł Bylica via llvm-dev

unread,

Sep 29, 2020, 6:40:32 AM9/29/20

to Philip Reames, llvm...@lists.llvm.org

On Wed, Sep 23, 2020 at 9:48 PM Philip Reames via llvm-dev <llvm...@lists.llvm.org> wrote:

No active work on my side, but I have given the topic of threaded interpreters (which is what I think you're wanting to produce) a good amount of thought.

I'm really not sure that switch is the right canonical form. The main reason being that having a loop over a large switch is very likely to encourage code motion which is generally profitable, but harmful in this particular context.

I had been thinking down the lines of representing the intepreter as a family of mutually recursive functions with a calling convention optimized for this case and using a musttail call through a lookup table for the dispatch.

I believe the Wasm3 project (https://github.com/wasm3/wasm3) which is a WebAssembly interpreter is using this dispatch technique described by Philip.

I don't know how exactly it is guaranteed that the indirect calls are converted to tail calls (maybe it's not). But the performance is quite impressive.

// Paweł

Philip Reames via llvm-dev

unread,

Sep 29, 2020, 12:33:13 PM9/29/20

to Paweł Bylica, llvm...@lists.llvm.org

On 9/29/20 3:39 AM, Paweł Bylica wrote:

On Wed, Sep 23, 2020 at 9:48 PM Philip Reames via llvm-dev <llvm...@lists.llvm.org> wrote:

No active work on my side, but I have given the topic of threaded interpreters (which is what I think you're wanting to produce) a good amount of thought.

I'm really not sure that switch is the right canonical form. The main reason being that having a loop over a large switch is very likely to encourage code motion which is generally profitable, but harmful in this particular context.

I had been thinking down the lines of representing the intepreter as a family of mutually recursive functions with a calling convention optimized for this case and using a musttail call through a lookup table for the dispatch.

I believe the Wasm3 project (https://github.com/wasm3/wasm3) which is a WebAssembly interpreter is using this dispatch technique described by Philip.

I don't know how exactly it is guaranteed that the indirect calls are converted to tail calls (maybe it's not). But the performance is quite impressive.

Interesting, I hadn't seen this. Reading through the docs, it looks like M3 (the wasm3 interpreter) isn't able to guarantee tail call dispatch which isn't surprising. There's a whole section on managing stack space with some special tricks around loops. I will note there's at least one key misstatement in the description. Branches do not fundamentally need stack space. Calls do - as correctly noted. Still cool to see someone playing with this in modern times, most of the usage I'm familiar with is in older papers (e.g. the threaded code context referenced at the bottom of the M3 description).

Ehsan Amiri via llvm-dev

unread,

Sep 29, 2020, 11:40:48 PM9/29/20

to Sjoerd Meijer, llvm...@lists.llvm.org

That's great. Our patch wouldn't be ready earlier than 2-3 weeks from now. In the next few days we will spend some time reviewing your patch and will let you know if we have any comments.