ucow - unix cowgol->z80 compiler

53 views
Skip to first unread message

aaw...@gmail.com

unread,
Nov 30, 2025, 10:58:20 PMNov 30
to RC2014-Z80
https://github.com/avwohl/ucow a unix compiler for the full cowgol language.

Emhasis on optimizaiton.  Needs testers.

Phase: Foundation (No Optimization)
Lexer & Parser
  • Full AST construction (not single-pass)
  • Type checking and validation
  • Symbol table with scope tracking
Basic Code Generation
  • Naive stack-based code gen
  • All variables in memory
  • All operations through A register
  • Function calls via CALL/RET
Testing Infrastructure# Compile python ucow.py test.cow -o test.mac # Assemble um80 test.mac -o test.rel # Link ul80 test.rel -o test.com # Run cpmemu test.com
Phase: Essential OptimizationsConstant Folding & Propagation

Evaluate constant expressions at compile time.

# Before var x := 10 + 20; var y := x * 2; # After var x := 30; var y := 60;

Multi-pass propagation through the AST.

Dead Code Elimination

Build CFG, compute reachability, remove unreachable code.

# Before sub Foo() is return; print("never"); # dead end sub; # After sub Foo() is return; end sub;Common Subexpression Elimination (CSE)

Track available expressions, reuse computed values.

# Before var a := x + y; var b := x + y; # redundant # After var a := x + y; var b := a;

For 8080, this is critical - arithmetic is expensive.

Strength Reduction

Replace expensive operations with cheaper ones.


Original

Replacement

Savings

x * 2

x + x

MUL→ADD

x * 4

x << 2

MUL→shifts

x * 2^n

x << n

MUL→shifts

x / 2

x >> 1

DIV→shift

x % 2

x & 1

MOD→AND

On 8080 without MUL/DIV instructions, this is huge.


Phase: Control Flow OptimizationsBranch Optimization; Before JZ SKIP JMP NEXT SKIP: ... NEXT: ; After JNZ NEXT ... NEXT:
  • Eliminate jumps to jumps
  • Invert conditions to remove unconditional jumps
  • Thread jumps through empty blocks
Loop-Invariant Code Motion

Move computations that don't change inside a loop to outside.

# Before while i < n loop var offset := base + stride; # invariant arr[i] := arr[i] + offset; i := i + 1; end loop; # After var offset := base + stride; while i < n loop arr[i] := arr[i] + offset; i := i + 1; end loop;Loop Unrolling (Small Loops)

For loops with known small iteration counts (2-8), unroll completely.

# Before var i: uint8 := 0; while i < 4 loop arr[i] := 0; i := i + 1; end loop; # After arr[0] := 0; arr[1] := 0; arr[2] := 0; arr[3] := 0;

Eliminates loop overhead (compare, branch, increment).


Phase: Register AllocationLive Variable Analysis

Backward dataflow to determine which variables are live at each point.

Register Allocation

The biggest single optimization for 8080.

Strategy: Linear Scan (simpler than graph coloring)

  1. Compute live ranges for all variables
  2. Sort by start position
  3. Allocate registers greedily, spill to memory when needed

8080 Register Priorities:

HL - pointer operations, array access DE - secondary pointer, 16-bit values BC - loop counters, 16-bit values A - arithmetic (implicit, always available)

Allocation Rules:

  • Pointers → HL preferred
  • Loop counters → BC preferred
  • 16-bit arithmetic → DE or BC
  • 8-bit temps → B, C, D, E
Register Coalescing

Eliminate unnecessary MOVs by giving source and dest the same register.

; Before MOV A, B MOV C, A ; A is now dead ; After (allocate B=C) ; eliminated entirely
Phase: Memory & Data OptimizationsStatic Variable Overlay

Cowgol already does this conservatively. With full call-graph analysis, we can do it optimally.

sub A() calls B(), C() sub B() uses x, y sub C() uses z, w # x,y and z,w can share memory (B and C never concurrent)Workspace Minimization

Nested subroutines share workspace. Minimize total workspace by optimal variable packing.

Constant Pooling

Share identical constants in memory.

print("Error"); print("Error"); # share the string
Phase: InliningSubroutine Inlining

Since Cowgol forbids recursion, we can inline aggressively.

Inline Criteria:

  • Small subroutines (< 20 instructions)
  • Called once
  • Called in hot loops
  • Leaf functions (no calls)

Benefits:

  • Eliminates CALL/RET overhead (27 cycles)
  • Exposes more optimization opportunities (CSE, constant prop)
  • Enables register allocation across call sites
Interface Devirtualization

When an interface variable has only one possible implementation, inline it.

interface Cmp(a: uint8, b: uint8): (r: int8); sub NumCmp implements Cmp is ... end sub; var cmp: Cmp := NumCmp; # If NumCmp is the only implementation, inline calls to cmp()
Phase: Peephole OptimizationPattern Matching on Assembly

After code generation, pattern-match and replace.


Pattern

Replacement

PUSH r; POP r

(delete)

MOV A,r; MOV r,A

(delete second)

LDA addr; STA addr

(delete STA)

JMP L; L:

(delete JMP)

XRA A

(better than MVI A,0)

INR A; DCR A

(delete both)

ADD A

(same as RLC for *2)
Instruction Selection

Choose optimal instructions during code gen.

; Load 0 into A MVI A, 0 ; 7 cycles, 2 bytes XRA A ; 4 cycles, 1 byte ← better ; Compare A to 0 CPI 0 ; 7 cycles, 2 bytes ORA A ; 4 cycles, 1 byte ← better (sets Z flag) ; 16-bit increment INR L ; need to handle carry ; vs INX H ; 5 cycles, 1 byte ← better
Phase: Advanced Optimizations (Future)SSA Form

Convert to Static Single Assignment for:

  • Better constant propagation
  • Better dead code elimination
  • Cleaner dataflow analysis
Instruction Scheduling

Reorder instructions to minimize stalls (less critical on 8080 than modern CPUs).

Profile-Guided Optimization

Run program, collect execution counts, optimize hot paths.

ladislau szilagyi

unread,
Dec 1, 2025, 4:48:24 AMDec 1
to RC2014-Z80
Hi,
from your text, it seems that the target is the 8080, not Z80. Is this correct?
Ladislau
Reply all
Reply to author
Forward
0 new messages