I have some notes on the topic based on Dyalog and BQN experience here:
https://mlochbaum.github.io/BQN/implementation/primitive/types.html
I think you're underestimating how much work bit booleans are. With 2D
or higher-rank arrays, even simple operations can be very tricky to
implement well. On the other hand much of J's code for 1-byte characters
could be directly reused for integers, as structural manipulations and
searching don't depend on arithmetic or ordering.
Promotion to a higher type can be a practical problem which is one
reason I emphasize 2-byte integers on that page. There's also the issue
of getting primitives to output a small type when applicable, which is
not that hard in a new interpreter but may be more of an annoyance when
changing a mature one.
Decreasing element type by a factor of 2 halves the per-element cost of
arithmetic, so it will be significant as long as the arrays aren't so
small that the program mostly spends on constant per-array costs. This
holds at about 1000 elements in CBQN, which I wouldn't say is too large.
The following benchmark for <./\ seems reasonably representative:
https://mlochbaum.github.io/bencharray/output/plot/scan-min.svg
(main page is
https://mlochbaum.github.io/bencharray/pages/summary.html)
Marshall