ucow - unix cowgol->z80 compiler

53 views

Skip to first unread message

aaw...@gmail.com

unread,

Nov 30, 2025, 10:58:20 PMNov 30

to RC2014-Z80

https://github.com/avwohl/ucow a unix compiler for the full cowgol language.

Emhasis on optimizaiton. Needs testers.

Phase: Foundation (No Optimization)Lexer & Parser

Full AST construction (not single-pass)
Type checking and validation
Symbol table with scope tracking

Basic Code Generation

Naive stack-based code gen
All variables in memory
All operations through A register
Function calls via CALL/RET

Testing Infrastructure# Compile python ucow.py test.cow -o test.mac # Assemble um80 test.mac -o test.rel # Link ul80 test.rel -o test.com # Run cpmemu test.com

Phase: Essential OptimizationsConstant Folding & Propagation

Evaluate constant expressions at compile time.

# Before var x := 10 + 20; var y := x * 2; # After var x := 30; var y := 60;

Multi-pass propagation through the AST.

Dead Code Elimination

Build CFG, compute reachability, remove unreachable code.

# Before sub Foo() is return; print("never"); # dead end sub; # After sub Foo() is return; end sub;Common Subexpression Elimination (CSE)

Track available expressions, reuse computed values.

# Before var a := x + y; var b := x + y; # redundant # After var a := x + y; var b := a;

For 8080, this is critical - arithmetic is expensive.

Strength Reduction

Replace expensive operations with cheaper ones.

Original
Replacement
Savings
x * 2
x + x
MUL→ADD
x * 4
x << 2
MUL→shifts
x * 2^n
x << n
MUL→shifts
x / 2
x >> 1
DIV→shift
x % 2
x & 1
MOD→AND

On 8080 without MUL/DIV instructions, this is huge.

Phase: Control Flow OptimizationsBranch Optimization; Before JZ SKIP JMP NEXT SKIP: ... NEXT: ; After JNZ NEXT ... NEXT:

Eliminate jumps to jumps
Invert conditions to remove unconditional jumps
Thread jumps through empty blocks

Loop-Invariant Code Motion

Move computations that don't change inside a loop to outside.

# Before while i < n loop var offset := base + stride; # invariant arr[i] := arr[i] + offset; i := i + 1; end loop; # After var offset := base + stride; while i < n loop arr[i] := arr[i] + offset; i := i + 1; end loop;Loop Unrolling (Small Loops)

For loops with known small iteration counts (2-8), unroll completely.

# Before var i: uint8 := 0; while i < 4 loop arr[i] := 0; i := i + 1; end loop; # After arr[0] := 0; arr[1] := 0; arr[2] := 0; arr[3] := 0;

Eliminates loop overhead (compare, branch, increment).

Phase: Register AllocationLive Variable Analysis

Backward dataflow to determine which variables are live at each point.

The biggest single optimization for 8080.

Strategy: Linear Scan (simpler than graph coloring)

Compute live ranges for all variables
Sort by start position
Allocate registers greedily, spill to memory when needed

8080 Register Priorities:

HL - pointer operations, array access DE - secondary pointer, 16-bit values BC - loop counters, 16-bit values A - arithmetic (implicit, always available)

Allocation Rules:

Pointers → HL preferred
Loop counters → BC preferred
16-bit arithmetic → DE or BC
8-bit temps → B, C, D, E

Eliminate unnecessary MOVs by giving source and dest the same register.

; Before MOV A, B MOV C, A ; A is now dead ; After (allocate B=C) ; eliminated entirely

Phase: Memory & Data OptimizationsStatic Variable Overlay

Cowgol already does this conservatively. With full call-graph analysis, we can do it optimally.

sub A() calls B(), C() sub B() uses x, y sub C() uses z, w # x,y and z,w can share memory (B and C never concurrent)Workspace Minimization

Nested subroutines share workspace. Minimize total workspace by optimal variable packing.

Constant Pooling

Share identical constants in memory.

print("Error"); print("Error"); # share the string

Phase: InliningSubroutine Inlining

Since Cowgol forbids recursion, we can inline aggressively.

Inline Criteria:

Small subroutines (< 20 instructions)
Called once
Called in hot loops
Leaf functions (no calls)

Benefits:

Eliminates CALL/RET overhead (27 cycles)
Exposes more optimization opportunities (CSE, constant prop)
Enables register allocation across call sites

Interface Devirtualization

When an interface variable has only one possible implementation, inline it.

interface Cmp(a: uint8, b: uint8): (r: int8); sub NumCmp implements Cmp is ... end sub; var cmp: Cmp := NumCmp; # If NumCmp is the only implementation, inline calls to cmp()

Phase: Peephole OptimizationPattern Matching on Assembly

After code generation, pattern-match and replace.

Pattern
Replacement
PUSH r; POP r
(delete)
MOV A,r; MOV r,A
(delete second)
LDA addr; STA addr
(delete STA)
JMP L; L:
(delete JMP)
XRA A
(better than MVI A,0)
INR A; DCR A
(delete both)
ADD A
(same as RLC for *2)Instruction Selection

Choose optimal instructions during code gen.

; Load 0 into A MVI A, 0 ; 7 cycles, 2 bytes XRA A ; 4 cycles, 1 byte ← better ; Compare A to 0 CPI 0 ; 7 cycles, 2 bytes ORA A ; 4 cycles, 1 byte ← better (sets Z flag) ; 16-bit increment INR L ; need to handle carry ; vs INX H ; 5 cycles, 1 byte ← better

Phase: Advanced Optimizations (Future)SSA Form

Convert to Static Single Assignment for:

Better constant propagation
Better dead code elimination
Cleaner dataflow analysis

Instruction Scheduling

Reorder instructions to minimize stalls (less critical on 8080 than modern CPUs).

Profile-Guided Optimization

Run program, collect execution counts, optimize hot paths.

ladislau szilagyi

unread,

Dec 1, 2025, 4:48:24 AMDec 1

to RC2014-Z80

Hi,

from your text, it seems that the target is the 8080, not Z80. Is this correct?

Ladislau

Reply all

Reply to author

Forward

0 new messages