GLYPH ISA update and a reflection on "metacognition"

47 views

Skip to first unread message

Michael Clark

unread,

Dec 23, 2025, 6:35:18 PM12/23/25

to RISC-V ISA Dev, mic...@cogmeta.com

Hi Folks,

a year 2025 closing update on the GLYPH ISA, plus some reflection:

latest: https://metaparadigm.com/~mclark/glyph-20251105.pdf
current: https://metaparadigm.com/~mclark/glyph.pdf
source: https://github.com/michaeljclark/glyph

I believe this work is complementary to the work of the RISC-V community, which is why I am writing to you all here, as it was only after deep reflection from working on the RISC-V QEMU "virt" machine that led to the inception of the GLYPH ISA. because, in my opinion, as an emulator developer, it is an emulator writer's instruction set.

the goals were stated in 2017 here, although it was upon working on the RISC-V QEMU "virt" machine that I came to believe that RISC-V in its current form is not ideal for the purposes that initially led me here, and it was mostly due to constant synthesis and for closer alignment with the X86 instruction set:

GLYPH design

GLYPH was designed after reflection on the combinatoric complexity of X86 instruction decode and the complexity of constant synthesis in RISC-V. the GLYPH ISA adopts an even simpler length decoding scheme than present-day RISC-V, at the expense of losing a bit of coding space in the 16-bit packet. the goal is that after quantitative evaluation, we can win back some density with more compact constant synthesis due to the "constant stream," which is analogous to push constants in Vulkan.

GLYPH has 16/32/64-bit instruction packets. there is a 128-bit instruction packet, but there is no immediate intention to use it. at present, only the 16-bit packet has been placed, and it has been placed based on analysis of common X86 prolog epilog codegen sequences seen in QEMU for X86-64, with special PC-relative loads and stores designed to fish pointers from .data sections and operations for register save/restore, pointer arithmetic, and loops. 16-bit instructions are presently limited exclusively to 64-bit arithmetic except for loading and adding sign-extended 32-bit constants. the instruction naming harks to Motorola and Intel. i.e. we retain mov.

GLYPH intends to be closely aligned with the Intel/AMD X86 instruction set to build a C-HVM (a hardware virtual machine supporting the C language) that is hardware feasible:

source: https://github.com/michaeljclark/x86
source: https://github.com/michaeljclark/cinf

GLYPH evolution

it is intended that the 32-bit packet will contain scalar and SIMD128 instructions for a 128-bit unified integer and floating-point register file. the prototype is currently 64-bit. scalar port SIMD arithmetic will be limited to 64-bit, but it will feature 128-bit logical ops and shifts. I believe we can meet timing constraints with that because 128-bit shifts should have lower fan-out and latency than 64-bit carry-add chains.

the 64-bit instruction packet will contain Intel/AMD EVEX-like instructions for a 128/256/512-bit vector register file, plus the addition of an extended 4096-bit vector length (i32x128 and f32x128). there are no special mask registers because the scalar port is 128-bit. the 64-bit instruction packet appears to be a perfect fit for EVEX, and we can do this legally as a virtual target, like a compiler or VM. but it means we would need a license from Intel or AMD to make hardware. I am not trying to hide the fact that we will adopt and extend EVEX. and I intend to add 4-lane swizzle to common integer and floating point vector operations: VFADD, VFSUB, VFMUL, VFDIV, VFMIN, VFMAX:

float swizzle bits = { x, y, z, w, 0, 1, -1, NaN }
int swizzle bits = { x, y, z, w, 0, 1, -1, 2 }

this is so that we can reduce lane shuffle ops for common arithmetic. here is a 3D cross product, and it maps to current SPIR-V shader capabilities:

VFMUL r4 r1.yzx_ r2.zxy_ ; (ay*bz, az*bx, ax*by)
VFMUL r5 r1.zxy_ r2.yzx_ ; (az*by, ax*bz, ay*bx)
VFSUB r3 r4 r5 ; cross product in r3

GLYPH status

what is happening with GLYPH right now?

nothing. I have stopped working on it until there is some community support. there have been small changes that I will outline here to update the community, because folks have kindly been providing anonymous feedback on the spec:

added carry flag arithmetic to the ADD and SUB instructions.
- there are no ADC or SBC instructions in the compressed page, so this is architectural bones for the carry-add predicate flag
adopted "true carry similar to ARM and the IBM Power ISA.
- SUB carry was flipped. this saves a NOT gate and a CMC (complement carry) instruction in big number arithmetic with mixed ADD SUB. here is SUB and SBC under this model.
  - sub(x,y) = x + ~y + 1
  - sbc(x,y) = x + ~y + carry
added capabilities and domains extensions for virtualization.
- privilege levels have been factored into a capability matrix. the design uses "trap routing" registers to control switches of capabilities for instructions in different capability sets. this de-duplicates mode or privilege level state and provides for "self-mapped" trap routing to emulate bare-metal environments. i.e. aliased capabilities.
added machine extension for software-defined memory management.
- the design adopts a TLB fill handler reminiscent of MIPS TLB fill. this makes it easier to support foreign MMU and IOMMU page table formats. I would call this an emulator writer's dream because it allows us to accelerate QEMU "soft-mmu". we also add "translation addresses" and formally define AS prefixes (address space prefixes) to solve the "address-space narrowing problem" in virtualization.
a physical memory self-mapping table and PTE T-bit were added.
- this allows the page walker to reuse the page tables for physical memory permissions, and it is similar to the physical memory self-maps set up by UEFI BIOS on X86 machines. it is designed for shadow page-table page monitoring to support a virtualization model closer to Xen, although it supports both type-1 and type-2 hypervisors.
added an environment extension for power, reset, and diagnostics.
- this is a very simple extension. it also supports POST codes.

there are a few pending changes based on recent feedback:

add 'smask' beside 'starget' to multicast interrupts to thread groups
- map group/thread so that you can unicast to threads or multicast to groups of threads. e.g. 0x0000123400000001/0xffffffff00000000
'dommac' TLB handler needs signing.
- we could close a caps hole in the dommac capability at the expense of extra state (color * color); however, it is used by HV ('domdom') running the translator, not system ('domsys'), so it has a higher assumed level of trust. this is an advisory. machine-mode has an escape and must be trusted.
add an SMT extension
- unlike preesnt-day SMT, we intend to pause SMT threads on capability set and domain transitions, to make sure that only one thread per core will transition to system, capability, domain, or machine capabilities at a time. there will only be one copy of the system vectors. so for SMT we only need (PC, IB) and the scalar and vector register files for each thread. this is because we believe SMT is better suited to multi-threading with coherent system vectors as opposed to incoherent threads with different page table vectors and capabilities. because that seems super dangerous.

one of the defining characteristics of the work is that we have tried, where possible, to use ARPA naming, and that distinguishes this work from many other instruction set architectures, not in any essential functional way, but primarily in understandability. for example, instead of using topological and logical APIC IDs for IPI targets, we simply use "thread address" and "thread domain".

GLYPH future

btw can someone please forward this email update to Mark? why? maybe Mark wants to implement standardized "meta containers" for CPU/GPU/AI tasks as part of the Open Compute Project (OCP). and, we also have a tangential idea to apply AI to teaching as opposed to learning:

https://cogmeta.com/~mclark/epistemic-toolkit.pdf

(succinctly, but this can be skimmed over in this context):

Common Core curriculum emphasizes skills and outcomes (argumentation, evidence, modeling) but does not explicitly train epistemic habits as repeatable daily practices. our epistemic toolkit foregrounds contradiction, falsification, provenance, and uncertainty as everyday habits, rather than material reserved for advanced courses (debate, science, philosophy). it extends ideas from the mandatory IB Theory of Knowledge (TOK) course, with an emphasis on practice over theory.

it arose from the guiding question, “what did I miss at school?”

our metacognitive stance (morning reminder, afternoon reflection, evening practice) goes beyond typical lesson plans, which prioritize content coverage, assessment, and “homework”. the insistence on enumeration, edge-case testing, and model divergence aligns with the scientific method and the philosophy of science, which distinguishes it from standard K-12 pedagogy. it aligns with Common Core goals, but it is more rigorous, more reflective, and more self-aware.

finally, happy holidays to everyone taking a break right now.

Regards,
Michael.

Reply all

Reply to author

Forward

0 new messages