[LLVMdev] MOS6502 target

678 views
Skip to first unread message

Edwin Amsler

unread,
Jul 2, 2014, 9:26:21 PM7/2/14
to llv...@cs.uiuc.edu
Hey there!

I've started to embark on a path to try and create a backend for a 39 year old CPU with only an accumulator, two index registers, and a 256 byte stack. It does have a bank of 256 bytes before the stack that are pretty quick though.

Really, if I can get an assembler out of `llc`, that'll be success enough for me. Clang would be better, but I think that might be crazy talk.

I've been doing lots of research so far, but from the experts, how feasible does this sound?

I've also been banging my head against the wall trying to figure out what all the classes for different instruction types do. Is there a nicely documented index? Is it in source somewhere, or should I start one?

Thanks,

Edwin
_______________________________________________
LLVM Developers mailing list
LLV...@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Bruce Hoult

unread,
Jul 2, 2014, 10:50:34 PM7/2/14
to Edwin Amsler, llv...@cs.uiuc.edu
I've considered doing this as well :-) As an exercise to learn the LLVM back end as much as anything.

It probably makes sense to allocate 8 or 16 pairs of zero page locations as virtual 16 bit registers and make 32 bit operations available only as library routines/intrinsics.

What would be *really* helpful would be if LLVM had a way to detect that certain functions and sets of functions (probably >90% of the program) are NOT recursive and statically allocate fixed zero page and/or high memory locations for their local variables.

If you have the call DAG, you can turn that into a total ordering such that if A transitively calls B then the locations of A's locals will always be at higher addresses than B's locals. (or vice versa).

This will probably be a bit bigger than the maximum dynamic depth of a stack implementation, but I think usually not a lot. And it lets you use absolute addressing instead of slow (zp),y, and also let you avoid saving and restoring simulated callee-save registers.


Roel Jordans

unread,
Jul 3, 2014, 5:00:36 AM7/3/14
to llv...@cs.uiuc.edu
Hi,

I found the tutorial presented at last LLVM conference very helpful when
starting on my own backend. Especially with the GitHub repository
containing a template backend that was quite easy to rename and extend.

Here's the slides with a pointer to the repository:

http://llvm.org/devmtg/2014-04/PDFs/Talks/Building%20an%20LLVM%20backend.pdf

Cheers,
Roel

David Chisnall

unread,
Jul 3, 2014, 5:28:15 AM7/3/14
to Roel Jordans, llv...@cs.uiuc.edu
I'd not seen that tutorial, but it looks very nice. One other piece of advice I'd give:

Get the assembler working first. Go through your instruction reference, define each instruction's encoding, add the relevant bits to the AsmParser (typically not much, unless you have some weird encodings), and test the assembler with llvm-mc and its show-encodings feature before you try to connect up code generation. And write lots of tests while you're doing this!

It makes it easy to see if you've got sensible patterns, when every instruction is in the .td files and you can easily find the ones that aren't used (is it because there's a more sensible way of expressing what they can do, because they do something that isn't easily expressed in LLVM IR, or because you made a mistake?). It's a lot easier to track down bugs in instruction encodings when you've got a decent set of assembly tests for them than when you're staring at compiler output. Common errors include not realising immediates are sign / zero extended or shifted.

David

Edwin Amsler

unread,
Jul 3, 2014, 9:23:35 AM7/3/14
to David Chisnall, llv...@cs.uiuc.edu
That's the plan! It'll be something useful for all my effort.

It's also where the bulk of the coding effort seems to be (instruction and register definitions). I'm sure there's more brain effort into making up for lack of registers and design differences from a modern CPU, but that can come later.

Edwin Amsler

unread,
Jul 3, 2014, 9:26:21 AM7/3/14
to Roel Jordans, <llvmdev@cs.uiuc.edu>
Thanks! I feel the offline documentation could be improved or re-done with some of these. The current backend doc has lots of definition without explanation. It makes it fairly frustrating when it says "look at the spark target" when it doesn't define any of the classes or define what the options were for or the motivations for their use.

Once I've figured this stuff out, I'll likely be contributing to that.

Edwin Amsler

unread,
Jul 3, 2014, 9:53:46 AM7/3/14
to Roel Jordans, <llvmdev@cs.uiuc.edu>
I've read about half-way. Holy crap is this helpful.

This needs to be referenced in the backend docs or the talk needs to be transcribed to become the new backend docs.

> On Jul 3, 2014, at 3:57 AM, Roel Jordans <r.jo...@tue.nl> wrote:
>

John Criswell

unread,
Jul 3, 2014, 2:28:06 PM7/3/14
to Bruce Hoult, Edwin Amsler, llv...@cs.uiuc.edu
On 7/2/14, 9:44 PM, Bruce Hoult wrote:
> I've considered doing this as well :-) As an exercise to learn the
> LLVM back end as much as anything.
>
> It probably makes sense to allocate 8 or 16 pairs of zero page
> locations as virtual 16 bit registers and make 32 bit operations
> available only as library routines/intrinsics.

One interesting problem with doing that is determining which zero page
locations are free for use on the target machine. The Apple II ROM, for
example, reserves many of these locations for its own use. Other
machines may reserve different locations within the zero page.

>
> What would be *really* helpful would be if LLVM had a way to detect
> that certain functions and sets of functions (probably >90% of the
> program) are NOT recursive and statically allocate fixed zero page
> and/or high memory locations for their local variables.

A typical callgraph analysis run within libLTO can be used (along with
Tarjan's algorithm) to find SCCs in the callgraph. The default
callgraph implementation within LLVM is extremely conservative with
function pointers and external code, so for some programs, the results
may not be good.

Alternatively, one could use the DSA analysis within the poolalloc
project to get a better callgraph. There's even code within SAFECode to
find SCCs, although I don't recall off-hand which pass it is and whether
it's active in the default compilation.

Regards,

John Criswell (who fondly remembers assembly programming on his Apple //c)

Bruce Hoult

unread,
Jul 3, 2014, 5:29:11 PM7/3/14
to John Criswell, llv...@cs.uiuc.edu
On Fri, Jul 4, 2014 at 6:21 AM, John Criswell <jtcr...@gmail.com> wrote:
On 7/2/14, 9:44 PM, Bruce Hoult wrote:
I've considered doing this as well :-) As an exercise to learn the LLVM back end as much as anything.

It probably makes sense to allocate 8 or 16 pairs of zero page locations as virtual 16 bit registers and make 32 bit operations available only as library routines/intrinsics.

One interesting problem with doing that is determining which zero page locations are free for use on the target machine.  The Apple II ROM, for example, reserves many of these locations for its own use. Other machines may reserve different locations within the zero page.

The monitor reserves very few. Mostly just screen cursor position and the hooks for character in and out, if you want to use those. Or you can run on bare metal and do your own screen drawing and exit by calling the monitor init vector (e.g. the CPU reset vector).

DOS uses a few more, but not a lot.

The vast majority of locations marked as "reserved" are used by AppleSoft. Which you can totally ignore if you're not a subroutine called by & from BASIC. If you preserved monitor and DOS locations, then you can exit to the AppleSoft init vector. You'll recall "3D0G" :)

Edwin Amsler

unread,
Jul 3, 2014, 7:51:07 PM7/3/14
to John Criswell, Bruce Hoult, llv...@cs.uiuc.edu
I was doing some thinking and discussing, and another interesting problem is that the MOS6502 really doesn't have much in the way of general purpose registers.

Some add/move/etc can use reg X or Y in immediate mode, but I think the expectation is that it operates on the zero page directly.

Has anyone managed to do that effectively? Instructions that take as input a memory location to read values from?

Ex: "ADD A, #00FF" where "#00FF" is the final byte in the zero page.

Edwin Amsler

unread,
Jul 3, 2014, 8:06:30 PM7/3/14
to Bruce Hoult, llv...@cs.uiuc.edu
Well, the stack pointer be a single byte, so pushing things on there doesn't work terribly well.

Assuming I pass by reference, that's 128 values absolutely total before it wraps around and silently clobbers itself. It means single byte values will be incredibly inefficient... Tricky stuff. 

I'm lucky on the C64 since it's rare to exit back to the kernel with machine language apps (never did it when I was a kid at least), so if I destroy the Kernel's stack, no one will ever know! Mwahaha!

With regard to code layout, ideally everything would get inlined since I have gobs of memory compared to everything else. I wouldn't need to worry as much about the stack as long as real values don't get stored there.

Bruce Hoult

unread,
Jul 3, 2014, 9:11:37 PM7/3/14
to Edwin Amsler, llv...@cs.uiuc.edu
On Fri, Jul 4, 2014 at 12:02 PM, Edwin Amsler <edwi...@gmail.com> wrote:
Well, the stack pointer be a single byte, so pushing things on there doesn't work terribly well.

Assuming I pass by reference, that's 128 values absolutely total before it wraps around and silently clobbers itself. It means single byte values will be incredibly inefficient... Tricky stuff.

You absolutely don't want anything on the hardware stack except function return addresses and possibly very temp storage e.g. PHA (push A); do something that destroys A, PLA (pull A). Or you could use a ZP temp for that. STA ZP; LDA ZP is I think cycle or two faster, PHA/PLA is two bytes smaller ... size usually wins.

The "C" local variables stack absolutely needs to be somewhere else, bigger, and using a pair of ZP locations as the stack pointer (SP). You can't index off the hardware stack pointer, for a start.

As mentioned before, if possible you'd want to statically allocate as many local vars as possible, as LDA $nnnn is a byte smaller and twice as fast (4 vs 8) as LDY #nn; LDA (SP),Y. (you'll sometimes be able to use INY or DEY instead of the load .. or just reuse the last value. But still...)


With regard to code layout, ideally everything would get inlined since I have gobs of memory compared to everything else. I wouldn't need to worry as much about the stack as long as real values don't get stored there.

 I actually think that the ideal 6502 compiler would output actual 6502 code (mostly) only for leaf functions, and everything else should be compiled to some kind of byte code. The 6502 was a very very cheap chip to build hardware wise, but the code is BULKY. Even when operating on 8 bit values it's worse than, say, Thumb2, due to the lack of registers. On 16 or 32 bit values it's diabolical if everything is done inline.

Wozniak's "Sweet 16" is still not a terrible design for this, but I think a bit more thought can come up with something better. The Sweet16 interpreter is pretty small though (under 512 bytes I think?), which is pretty important.


The criteria whether to use native or bytecode for a given function is pretty similar to the inlining decision. And a decent compact, small interpreter, byte code solution could be reused on other 8 bit CPUs.

Some of which are still in active use today, and so even commercially important e.g. 8051, AVR, and PIC.

Erm .. are we boring the rest of llvmdev yet?

Jeremy Lakeman

unread,
Jul 3, 2014, 11:04:50 PM7/3/14
to Bruce Hoult, llv...@cs.uiuc.edu
I suppose that once you've got a 6502 working, adding support for a 4510 shouldn't be too difficult....

(http://c65gs.blogspot.com.au/)


Bruce Hoult

unread,
Jul 3, 2014, 11:32:39 PM7/3/14
to Jeremy Lakeman, llv...@cs.uiuc.edu
What's the one-paragraph current state of FPGA programming? What does one cost? What support gear do you need (and how much is it)? Is programming one-shot or can you reuse it?

Paul Gardner-Stephen

unread,
Jul 4, 2014, 12:02:21 AM7/4/14
to llv...@cs.uiuc.edu, br...@hoult.org
> What's the one-paragraph current state of FPGA programming? What does one
> cost? What support gear do you need (and how much is it)? Is programming
> one-shot or can you reuse it?
FPGA boards vary from about $39 up to 1/4 zillion$.  
The FPGA manufacturers provide free proprietary tools that work with most of their cheaper parts.  For some of their high-end FPGAs they may charge you a body part or two to be able to programme them.  The tools will all run on Linux or Windows, or in a VM on a mac (although synthesis, the equivalent of "compiling" can take 2 - 3x longer depending on how good your hypervisor is at handling processes with lots of memory).
Companies like Digilent make lots of nice FPGA development boards.  The ones designed for the academic/student market are good choices as they are often good value for money, have lots of interesting ports already included, use FPGAs that are definitely supported by the free versions of the tools and generally have better on-line support.
For the C65GS (the FPGA re-implementation of the 4510 based Commodore 65 that Jeremy mentioned) I am using a Nexys4 board that is US$160 - US$320 depending on whether you have a .edu.* email address or not.  This has a shiny new Artix Xilinx FPGA that is very fast, is powered by a USB port, and can fit quite large designs.  I work at a University, so mine cost me US$160.  The previous generation based on the slower and less capable (but still very capable) Spartan 6 FPGAs are about 1/2 to 1/3 that price.  But for me the price is worth it, because I want my 8-bit machine to be fast enough for basic productivity, and low enough power to be turned into a laptop, albeit, one with joystick ports and a 1541 disk drive port.
FPGAs can be infinitely reprogrammed. In fact, they need a flash rom or something to feed them their "program" each time they turn on.
Paul.

Paul Gardner-Stephen

unread,
Jul 4, 2014, 12:10:11 AM7/4/14
to llv...@cs.uiuc.edu
Regarding strategies for the back end, it might be worth taking a look at the following existing C compilers for the 6502 and related bits:

http://www.cc65.org/ (recent activity moved to to http://cc65.github.io/cc65/)

The topic of targeting the 4502 / 65C02 is an interesting one because it contains features that seem to have been specifically included to support compilation and execution of compiled software:

1. The are instructions that dereference pointers on the stack: LDA (SP+#$nn),y / STA (SP+#$nn),y (but usually misleadingly written as ($nn,SP),y)
2. The stack can be switched to 16 bit mode with CLE (although it still silently over/under flows).
3. Zero-page can be moved anywhere you like with TBA.
4. There is a third index, Z, that provides slight register pressure relief.
5. X, Y and Z can be pushed and pulled from the stack directly (PHX etc)

Ophis is an assembler with full support for the 4502.

I don't have the time to actively work on an llvm backend for 6502 or 4502, but I would certainly welcome one and am happy to contribute my knowledge of those CPUs and to run code on them etc.  For the C65GS it would be nice to be able to create a new optional operating system written in C, rather than assembler.  

Paul.

Edwin Amsler

unread,
Jul 4, 2014, 9:22:45 AM7/4/14
to Bruce Hoult, llv...@cs.uiuc.edu
You can buy one for under $50, reprogram thousands of times and the dev tools last I checked confused me.

That's the last I'll say about it until you either start a new thread or justify how it fits in this one.

Not a fan of hijacking. 

Edwin Amsler

unread,
Jul 4, 2014, 9:28:25 AM7/4/14
to Paul Gardner-Stephen, llv...@cs.uiuc.edu
Yeah, it would be smart to look there. When I start something, I tend to dive in and ignore other projects until I'm either stuck or done.

Sort of like how Woz and that other guy built their floppy disk. Re-solving the tough problems for me is the fun part :D

It'll definitely be worth it, but someone else who is helping can teach me or implement the virtues of another system.
_______________________________________________

Bruce Hoult

unread,
Jul 4, 2014, 9:30:34 AM7/4/14
to Edwin Amsler, llv...@cs.uiuc.edu
How does it fit the thread? Previous poster Jeremy Lakeman's blog says he has implemented a 6502 in an FPGA, running at about 30x original speed.

Seems relevant to me, especially if anyone is interested in running code on actual hardware rather than a software simulator.

Samuel Crow

unread,
Jul 7, 2014, 1:11:01 PM7/7/14
to LLVM Developers Mailing List
Hello!

As an old Commodore 64 bedroom-coder, I've got a few good ideas for optimization in your 6502 compiler.

The first rule is like this: an array of any multibyte structures is slower than a structure of multiple arrays of byte-long elements. In other words, use byte-planes instead of actual contiguous structures if at all possible. This will practically eliminate the need for most usage of the indexed-indirect addressing mode (y-indexing off a zero-page address-pair-pointer) because each array will start at a non-relocatable absolute address and use either x or y as the index. The limitation of this optimization is that it requires that there be an array dimension between 1 and 256 elements. An example of this is that a 16-bit integer is a multibyte structure of a high-byte and a low-byte. To make a 128 element array of 16-bit integers you convert it to two 128-element arrays of bytes: an array of high-bytes and an array of low-bytes. This also eliminates the need for multiplying indexes by a scale factor based on the size of the element because in a byte-plane each byte !
is 1 unit in length, essentially allowing you to multiply only by 1.

The second rule is a suggestion: address the zero-page as 128 general-purpose registers of 16-bits each that can be subdivided into 2 subregisters of one byte each. Zero page is used mostly in 16-bit aligned pairs for pointer contents when indexed-indirect addressing cannot be avoided. In the same way in the early Intel 8086 machines (whose legacy is still recognized in modern x86 chips), the AH and AL subregisters constituted the high-byte and low-byte contents of the AX register.

Blessings on you for making an 8-bit compiler!

Cheers,

Sam

Krzysztof Parzyszek

unread,
Jul 9, 2014, 5:49:00 PM7/9/14
to llv...@cs.uiuc.edu
On 7/2/2014 8:23 PM, Edwin Amsler wrote:
>
> Really, if I can get an assembler out of `llc`, that'll be success enough for me. Clang would be better, but I think that might be crazy talk.
>
> I've been doing lots of research so far, but from the experts, how feasible does this sound?

The set of programs that you will be able to compile will be severly
limited. There is no OS to speak of, so the only mode you will be able
to use is the "base metal". This means no file I/O.

If you ever plan to run the output on a real hardware, add an option to
reserve memory addresses for memory mapped registers.

There is a 16-bit derivative of 6502, maybe you could target that instead?

In any case, good luck. Sounds really cool!

-Krzysztof


--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

Edwin Amsler

unread,
Jul 11, 2014, 9:25:44 AM7/11/14
to Krzysztof Parzyszek, llv...@cs.uiuc.edu
Oh yeah, I'm not expecting to port some major app. I have pretty intimate knowledge of what my c64 can and can't do. It barely has a kernel (pretty much just in-memory routines for disk, screen, and tape access).

This isn't a practical project (good C compilers exist already), this is an excuse to play with LLVM.
Reply all
Reply to author
Forward
0 new messages