[LLVMdev] Creating an LLVM backend for a very small stack machine

Wesley J. Landaker

unread,

Feb 22, 2009, 6:25:55 PM2/22/09

to llv...@cs.uiuc.edu

Hi folks,

I am interesting in creating an LLVM backend for a very small stack machine.

Before I start in on the actual implementation, I'd like to make sure that
I'm on the right track. Any comments, suggestions, warnings, tips, etc
would be greatly appreciated.

Background
----------

There are a number of small as other embedded microprocessors that are often
used in FPGAs such as Xilinx's Picoblaze, Lattice's Mico8, or Bernd
Paysan's b16-small.

I am developping a very small stack machine primarily for use inside FPGAs
and ASICs. This serves roughly the same niche as the above processors,
however from my perpsective, my design has a number of advantages. For
example, mine has a much smaller minimal configuration, has higher code
density, is much more parameterizable, fully supports indirect branches, is
FPGA-architecture independent, etc.

The main goal, however, is to not just have a shiny new architecture, but to
have an entire optimized supporting toolchain. I am hoping to build on LLVM
as infrastructure. I don't think I need to sell you all on the reasons why
this would be handy.

I am already pretty "familiar" with LLVM, in that I have successfully made
my own front-end DSL for another project by basically following the
Kalidescope tutorial. I have a good grasp of the LLVM assembly language,
and have a lot of notes (just a sanity-check paper excercize) about how I
would lower just about every instruction into my architecture if I were
doing it by hand. I've read through all the LLVM documentation available
online, and (superficially) looked at a lot of the other backend targets. I
have written custom assemblers and peephole optimizers outside of LLVM in
the past. I am familiar academically with stack-machine specific "register
allocation" optimizations. I'm basically ready to get started on an LLVM
backend for my processor.

My main concerns are that in my searching online, I've gleaned a number of
things that urge me to caution. I don't know how much of this is "true",
but it's my impression after spending a lot of time googling:

* A Picoblaze backend was apparently attempted both for LLVM and GCC, but
apparently didn't go anywhere on either due to missing support for register
indirect jumps, and possibly for other reasons. Of course my processor does
support this, but ...

* There were a number of threads in various GCC lists and other places
implying that, at least for GCC, targeting a stack machine would be very
very difficult because of it's backend assumptions. Of course LLVM is not
GCC, but TableGen seems register-biased ...

* There seems to be a (to me) strange negative vibe whenever someone
brings up targeting a compiler to a very small microprocessor, usually
including an argument that it's not worth it. Of course, I'm the one doing
the work, and I already have decided it's worth it ...

Anyway, hopefully that gives you an idea of where I'm coming from. Before I
get too knee-deep into writing a backend, I have a few questions that I
hope someone can help me with.

Questions
---------

* Has anyone else out there targeted (or tried to target) a stack machine
before? Was it successfull? What problems did you have?

* What parts of the LLVM backend code generator infrastructure would be
usable for targeting a stack machine? e.g. Is it even possible to use
TableGen to target a stack machine?

* When/where/how do things like big integer (iXXXXX), phi nodes, llvm.*
instrincs get lowered; e.g. does my target have to do that, or is it done
generically?

Ultimtely, I'm wondering if targeting a stack machine with the current LLVM
infrastructure is going to be somewhat straightforward even if it's not
totally optimal (desirable), or if it's going to be so problematic that I'd
be better off implementing an entire new code-generator myself
(undesirable).

Any other comments or discussion is welcome.

All of my work (hardware design and all software) will be publicly and
freely available.

--
Wesley J. Landaker <w...@icecavern.net> <xmpp:w...@icecavern.net>
OpenPGP FP: 4135 2A3B 4726 ACC5 9094 0097 F0A9 8A4C 4CD6 E3D2

signature.asc

Eli Friedman

unread,

Feb 22, 2009, 7:06:06 PM2/22/09

to LLVM Developers Mailing List

On Sun, Feb 22, 2009 at 3:25 PM, Wesley J. Landaker <w...@icecavern.net> wrote:
> * Has anyone else out there targeted (or tried to target) a stack machine
> before? Was it successfull? What problems did you have?

Haven't done that, and I don't think there are any existing backends
like this. It should be feasible, though; the backend code is pretty
flexible.

> * What parts of the LLVM backend code generator infrastructure would be
> usable for targeting a stack machine? e.g. Is it even possible to use
> TableGen to target a stack machine?

You should be able to use existing LLVM backend code and TableGen at
least through instruction selection; I'm not sure whether you'll want
register allocation or not, but it should be easy to choose either
way. The whole thing is quite flexible; see
LLVMTargetMachine::addCommonCodeGenPasses in
lib/CodeGen/LLVMTargetMachine.cpp for a high-level overview of how
CodeGen works. It might also be useful to look at LLVM handles x87
floating-point; the relevant code is in
lib/Target/X86/X86FloatingPoint.cpp.

> * When/where/how do things like big integer (iXXXXX), phi nodes, llvm.*
> instrincs get lowered; e.g. does my target have to do that, or is it done
> generically?

Aribitrary-width integers, vectors, llvm.*, etc. are lowered
generically by the Legalize infrastructure; the backend just has to
say what it can and can't support. See
lib/Target/X86/X86ISelLowering.cpp for an example. I don't know the
details of PHI nodes, but that's also taken care of by instruction
selection.

-Eli
_______________________________________________
LLVM Developers mailing list
LLV...@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Wesley J. Landaker

unread,

Feb 22, 2009, 8:43:18 PM2/22/09

to llv...@cs.uiuc.edu

On Sunday 22 February 2009 17:06:06 Eli Friedman wrote:
> On Sun, Feb 22, 2009 at 3:25 PM, Wesley J. Landaker <w...@icecavern.net>
wrote:
> > * Has anyone else out there targeted (or tried to target) a stack
> > machine before? Was it successfull? What problems did you have?
>
> Haven't done that, and I don't think there are any existing backends
> like this. It should be feasible, though; the backend code is pretty
> flexible.

At the very least, there isn't anything in the LLVM instruction set that I
think I would have any trouble lowering to the architecture ... but so far
that's just on paper. ;)

I would love to see a Kalescope-like tutorial that goes step-by-step through
making a backend. At the very least, I'll be documenting my adventure, so
maybe once I know what I'm doing I can turn it into a tutorial.

> You should be able to use existing LLVM backend code and TableGen at
> least through instruction selection; I'm not sure whether you'll want
> register allocation or not, but it should be easy to choose either
> way. The whole thing is quite flexible; see
> LLVMTargetMachine::addCommonCodeGenPasses in
> lib/CodeGen/LLVMTargetMachine.cpp for a high-level overview of how
> CodeGen works. It might also be useful to look at LLVM handles x87
> floating-point; the relevant code is in
> lib/Target/X86/X86FloatingPoint.cpp.

Thank you for the references, I'll have a look at those.

I've read quite a few papers on stack-machine-specific "register allocation"
algorithms, but at least for the first pass, I want to make this as
straightforward as possible.

One thing I was considering was pretending I had multiple registers, and
then when they actually get emitted I would thunk in code to do stack
manipulations, hoping that my architecture specific peephole optimizer
would be able to clean it up (or not, for the first cut).

> Aribitrary-width integers, vectors, llvm.*, etc. are lowered
> generically by the Legalize infrastructure; the backend just has to
> say what it can and can't support. See
> lib/Target/X86/X86ISelLowering.cpp for an example. I don't know the
> details of PHI nodes, but that's also taken care of by instruction
> selection.

Okay, I obviously need to learn more about the infrastructure here, but this
at least sounds promising. I was worried that if I didn't have a register
architecture that I'd have to reinvent the wheel in more places than it
soudns like I will have to.

I'm sure I will be back with more questions once I seriously try starting a
target.

signature.asc

Chris Lattner

unread,

Feb 23, 2009, 1:18:25 AM2/23/09

to LLVM Developers Mailing List

On Feb 22, 2009, at 5:43 PM, Wesley J. Landaker wrote:
>
> I would love to see a Kalescope-like tutorial that goes step-by-step
> through
> making a backend. At the very least, I'll be documenting my
> adventure, so
> maybe once I know what I'm doing I can turn it into a tutorial.

Have you seen:
http://llvm.org/docs/WritingAnLLVMBackend.html

If you're targeting a stack machine, I'd strongly recommend not using
the llvm register allocators and just run you own custom stackifier
pass instead.

-Chris

Mark Shannon

unread,

Feb 23, 2009, 5:23:59 AM2/23/09

to LLVM Developers Mailing List

Hi Wesley,

I've done quite a lot of work on register allocation for stack machines.
You might want to look at my papers:
http://www.dcs.gla.ac.uk/~marks/euroforth.pdf
http://www.dcs.gla.ac.uk/~marks/thesis.pdf

Good luck,
Mark.

winmail.dat

Wesley J. Landaker

unread,

Feb 23, 2009, 10:38:35 AM2/23/09

to llv...@cs.uiuc.edu

On Sunday 22 February 2009 23:18:25 Chris Lattner wrote:
> Have you seen:
> http://llvm.org/docs/WritingAnLLVMBackend.html

I have, and it's certainly helpful. Since the Kalescope tutorial is so
amazingly easy to work through that it makes me jealous for a similar
tutorial on the backend. But I'm definitely not complaining. =)

> If you're targeting a stack machine, I'd strongly recommend not using
> the llvm register allocators and just run you own custom stackifier
> pass instead.

After reading
<http://www.llvm.org/docs/CodeGenerator.html#regAlloc_ssaDecon> is correct
to say that if I don't use an existing LLVM register allocation pass, that
I would need to do my stackification directly on the SSA form?

Could/should I still reuse parts like the PHIElimination pass? It sounded
sort of like this was very coupled with the register allocators.

I'm was initial sort of thinking that for a first (unoptimized) cut I might
be able to just use the "simple" built-in allocator that spills every
value, then replace that afterwards with a stack-based allocator. Does that
sound practical, or would I be painting myself into a corner in the first
step?

Anyway, I'm just trying to make sure I'm taking the right approach before I
run head first into any dead ends. Ultimately, I'd rather do things "right"
than "fast", since in my experience, "right" takes less time in the long
run. ;)

signature.asc

Wesley J. Landaker

unread,

Feb 23, 2009, 11:00:28 AM2/23/09

to llv...@cs.uiuc.edu

On Monday 23 February 2009 03:23:59 Mark Shannon wrote:
> I've done quite a lot of work on register allocation for stack machines.
> You might want to look at my papers:
> http://www.dcs.gla.ac.uk/~marks/euroforth.pdf
> http://www.dcs.gla.ac.uk/~marks/thesis.pdf

Hi Mark,

I've read your papers, and in fact they were part of the data that convinced
me that I really could go with a stack machine for the work I'm doing. I'm
planning on implementing register allocation based partially on your paper.

Actually, I was wondering if any of your lcc implementation work was
publicly available.

signature.asc

Chris Lattner

unread,

Feb 23, 2009, 1:13:21 PM2/23/09

to LLVM Developers Mailing List

On Feb 23, 2009, at 7:38 AM, Wesley J. Landaker wrote:

> On Sunday 22 February 2009 23:18:25 Chris Lattner wrote:
>> Have you seen:
>> http://llvm.org/docs/WritingAnLLVMBackend.html
>
> I have, and it's certainly helpful. Since the Kalescope tutorial is so
> amazingly easy to work through that it makes me jealous for a similar
> tutorial on the backend. But I'm definitely not complaining. =)
>
>> If you're targeting a stack machine, I'd strongly recommend not using
>> the llvm register allocators and just run you own custom stackifier
>> pass instead.
>
> After reading
> <http://www.llvm.org/docs/CodeGenerator.html#regAlloc_ssaDecon> is
> correct
> to say that if I don't use an existing LLVM register allocation
> pass, that
> I would need to do my stackification directly on the SSA form?
>
> Could/should I still reuse parts like the PHIElimination pass? It
> sounded
> sort of like this was very coupled with the register allocators.

Sure, reusing just phi elim and 2-addr elim should be possible.

> I'm was initial sort of thinking that for a first (unoptimized) cut
> I might
> be able to just use the "simple" built-in allocator that spills every
> value, then replace that afterwards with a stack-based allocator.
> Does that
> sound practical, or would I be painting myself into a corner in the
> first
> step?

Sure, that could work. The x86 floating point stack is a stack with
maximum depth of 8 entries. The x86 backend implements it by register
allocating for a 7 entry "register file" and then doing a custom
transformation to change the code to use pushes/pops exchanges etc.

Mark Shannon

unread,

Feb 24, 2009, 9:49:17 AM2/24/09

to LLVM Developers Mailing List

Wesley,

Regarding access to the source code;
I would send you the code, but I might be stepping on a few toes.
The person to speak to is Chris Bailey at the University of York (in the
UK).

However it is written for the lcc tree-based IR, rather than the
SSA-based IR of LLVM, so I don't think it will be that much use.

A lot of the analysis it does is to find information that is explicit in
the SSA form anyway, such as generating a flow graph.

The SSA form already provides a head start for chosing stack allocation
candidates. For example, PHI-nodes are obvious candidates for edges in
the flow-graph.

I would be interested to see how good SSA-form is for stack allocation,
as that was my intended direction if I had stayed longer at York.

Cheers,
Mark.

Wesley J. Landaker wrote:
> On Monday 23 February 2009 03:23:59 Mark Shannon wrote:
>> I've done quite a lot of work on register allocation for stack machines.
>> You might want to look at my papers:
>> http://www.dcs.gla.ac.uk/~marks/euroforth.pdf
>> http://www.dcs.gla.ac.uk/~marks/thesis.pdf
>
> Hi Mark,
>
> I've read your papers, and in fact they were part of the data that convinced
> me that I really could go with a stack machine for the work I'm doing. I'm
> planning on implementing register allocation based partially on your paper.
>
> Actually, I was wondering if any of your lcc implementation work was
> publicly available.
>
>
>

> ------------------------------------------------------------------------

Wesley J. Landaker

unread,

Feb 24, 2009, 6:36:18 PM2/24/09

to LLVM Developers Mailing List

On Tuesday 24 February 2009 07:49:17 Mark Shannon wrote:
> Regarding access to the source code;
> I would send you the code, but I might be stepping on a few toes.
> The person to speak to is Chris Bailey at the University of York (in the
> UK).
>
> However it is written for the lcc tree-based IR, rather than the
> SSA-based IR of LLVM, so I don't think it will be that much use.
>
> A lot of the analysis it does is to find information that is explicit in
> the SSA form anyway, such as generating a flow graph.

That's okay, I wasn't sure it would be much use either; mostly I was just
curious and thought maybe I could learn something from it if it was readily
available.

> The SSA form already provides a head start for chosing stack allocation
> candidates. For example, PHI-nodes are obvious candidates for edges in
> the flow-graph.

I'm a fan of SSA already (used it a bit in a hardware synthesis context),
but this is my first attempt to use it an LLVM context, or to target a
stack machine. I guess I'll see how it goes.

> I would be interested to see how good SSA-form is for stack allocation,
> as that was my intended direction if I had stayed longer at York.

I hope to eventually get to this on my project. For my first cut I'm going
to do a rather dumb allocation just to get things working. In my niche,
most programs for this machine will be under 256 (!) instructions, and
speed doesn't matter much. So on one hand, if it fits, optimizations don't
matter much. But on the other hand, it has to fit! =)

signature.asc

Reply all

Reply to author

Forward