My primary concern is that the basic charsets (alphabetic, numeric, whitespace, etc.) have no overlap, as is the case with ASCII.
This defines the Numeric_Types of Decimal, Digit, or Numeric (or None).
Decimal has a one-to-one correspondence with category Nd, the other cases are mixed.
Allowing Numeric includes many characters with a primary category of Letter, breaking the overlap rule.
For example, Sichuan (famous for its spicy cuisine) is written 四川, literally "Four Rivers."
The character for four can actually be used in a positional decimal system, but in processing modern text it is more likely to indicate a word.
Similarly with many well-known neighborhoods of Tokyo (Mitaka, Yotsuya, Roppongi).
The other problem being as you note that Numeric include fractions, and since digit-value is defined in terms of char-numeric? and must return 0..9 these are clearly out.
The question is then do we include Digits? They are the following short list:
00B2..00B3 ; Digit # No [2] SUPERSCRIPT TWO..SUPERSCRIPT THREE
00B9 ; Digit # No SUPERSCRIPT ONE
1369..1371 ; Digit # No [9] ETHIOPIC DIGIT ONE..ETHIOPIC DIGIT NINE
19DA ; Digit # No NEW TAI LUE THAM DIGIT ONE
2070 ; Digit # No SUPERSCRIPT ZERO
2074..2079 ; Digit # No [6] SUPERSCRIPT FOUR..SUPERSCRIPT NINE
2080..2089 ; Digit # No [10] SUBSCRIPT ZERO..SUBSCRIPT NINE
2460..2468 ; Digit # No [9] CIRCLED DIGIT ONE..CIRCLED DIGIT NINE
2474..247C ; Digit # No [9] PARENTHESIZED DIGIT ONE..PARENTHESIZED DIGIT NINE
2488..2490 ; Digit # No [9] DIGIT ONE FULL STOP..DIGIT NINE FULL STOP
24EA ; Digit # No CIRCLED DIGIT ZERO
24F5..24FD ; Digit # No [9] DOUBLE CIRCLED DIGIT ONE..DOUBLE CIRCLED DIGIT NINE
24FF ; Digit # No NEGATIVE CIRCLED DIGIT ZERO
2776..277E ; Digit # No [9] DINGBAT NEGATIVE CIRCLED DIGIT ONE..DINGBAT NEGATIVE CIRCLED DIGIT NINE
2780..2788 ; Digit # No [9] DINGBAT CIRCLED SANS-SERIF DIGIT ONE..DINGBAT CIRCLED SANS-SERIF DIGIT NINE
278A..2792 ; Digit # No [9] DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT ONE..DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT NINE
10A40..10A43 ; Digit # No [4] KHAROSHTHI DIGIT ONE..KHAROSHTHI DIGIT FOUR
10E60..10E68 ; Digit # No [9] RUMI DIGIT ONE..RUMI DIGIT NINE
11052..1105A ; Digit # No [9] BRAHMI NUMBER ONE..BRAHMI NUMBER NINE
1F100..1F10A ; Digit # No [11] DIGIT ZERO FULL STOP..DIGIT NINE COMMA
These seem more like symbols than numbers, and in only one case is there a consecutive range 0..9 (the subscripts).
Admittedly, this is subjective, but I think it is a nice property if char-numeric? indicates the char can be used in a positional decimal system, which rules out Digits.
Thoughts?
--
Alex