Julian:  I'm still much of a non-contributor here, but I want to present a (possibly) opposing point of view to your excellent definition of "8-bit processor".  I think that it's important two consider two perspectives.
The first is the hardware perspective, and there I agree that the ALU-width should be the defining characteristic -- although you don't use the term "ALU" specifically, as I think that's (maybe) what you mean by the expression "performs arithmetic on, at most, 8-bit quantities in a single instruction".
The second is from the software (by which I mean "programming model") perspective.  Register width comes, I think, closest to a clear basis for "programming model", but as you point out register-width can be a misnomer here.
The conundrum, at least for me, is the gap between the two when there is a multi-cycle instruction-architecture where a narrower-ALU is used to implement a wider data-path and thus register-width.  That gap is often defined in terms of a microcoded design for an algorithmic state machine.  The instruction set that is presented to the programmer as layered over an internal "lower level" set of instructions that operate at the register-transfer level (RTL) to implement the overall instruction set architecture (ISA).
An example is the initial implementation of the PDP-11 ISA by DEC using an LSI process.  They chose to use only an internal 8-bit data path, 8-bit wide ALU and then a multi-cycle implementation of a 16-bit ISA.
Is that an 8-bit processor by your definition?  Possibly yes, but the consequence is inconsistent with the ISA and thus the programming model that the realized architecture implements.  I don't think that anyone would call the DEC LSI-based 11/03 an 8-bit processor, despite what's happening inside.  I don't even think that many folks in the community even know how the DEC-11 ISA is implemented -- they just see a low-power, slow(er), cheap, implementation of the "standard" 16-bit-wide instruction set and it's 16-bit wide arithmetic data path.
On the other hand, I could interpret your claim as "no" if I decide that "performs arithmetic on" is intended to _not_ constrain the actual ALU.  (Another example of the mismatch would be bit-serial implementations like the PDP-8/S -- the ALU is only one bit wide, plus carry, and 12 bits are processed serially in the course of executing one instruction.)
In the end, we probably don't disagree, but I think that your definition would be stronger if you included a clarification that "performs arithmetic on" is not necessarily equivalent to "ALU".  Just like you're trying to tighten the definition on the "up side" I'd like to tighten the definition on the "down side" :->.
Thanks for your excellent proposal!
paul