Most of the simpler 16bit processors I've fiddled with didn't have
that many registers but some were certainly risc-like(by necessity of
the times).
The DG Nova basically has 4 registers of which one you use as the
stack pointer and two cannot be used for indexing as one combination
is "no index" and another other is "PC relative". PC not being a
generic register on Nova. The encoding was pretty tight although it
wasn't until the Nova 4 that byte pointers, hardware stack (as opposed
to software conventions) and the Eclipse that load immediate got added
(Nova you load PC relative which means the assembler has to know how
to spill constants and jump to keep them in range or hide them in skip
slots). ARM draws a bunch of stuff from the Nova including some of the
conditional execution and "operation with extra effect bit" mindset.
The indexing form is nice because it's basically two bits into a four
way mux of 0, PC and register 2 or 3.
RISC16 is another interesting one - it's lacking a few essentials
(shifts) you'd really need to add for any actual use but it's
basically a teaching ultra simple 16bit RISC CPU intended to teach
simple, pipelined and out of order CPU design in US university
courses.
https://user.eng.umd.edu/~blj/risc/ . Code density is quite
poor though which is a big deal for 16bit IMHO.
The most beautiful 16bit simple instruction set I know is probably the
SEL System72 (apart from the fact it has special I/O instructions) .
It really was an encoding thing of elegance - single instruction
format and totally consistent set of flag bits on the instructions. It
does however require that registers also map as memory word 0-7. Like
many period systems of this form it had options for fast SRAM (very
pricy at the time) in the lowest memory locations and ran much faster
that way if fitted rather than just core. The SEL encoding is
[R][I][X][op][S][displacement]
with 5 operation bits. R I X S are relative, signed, indirect and xreg
and generate an address
You could also just use a 12ns RAM and not bother with separate
registers. It's how core based machines often worked in the days of
slow cycle times, and it's also how some more modern minicomputers
like the Gould systems did but with cache RAMs. They avoided all the
cache complexity by just using fast SRAM back when fast SRAM although
pricey and power hungry was basically CPU clock speed.
Anyway that was more of a diversion on "you probably only need four or
eight registers" 8)
Alan