Erratum #28 to R7RS-small: char-numeric? is technically not well defined.

21 views
Skip to first unread message

John Cowan

unread,
Feb 9, 2019, 1:29:46 PM2/9/19
to scheme-re...@googlegroups.com, scheme-re...@googlegroups.com
The definition of char-numeric? refers to a nonexistent Unicode property Numeric_Digit.  The intention was to refer to characters whose Numeric_Type property is either Digit or Decimal, which is defined by Unicode as those characters with a non-empty value in field 7 of the UnicodeData.txt file. The simplest fix is to change the relevant paragraph of Section 6.6 from "Numeric_Digit" to "Numeric_Type=Digit or Numeric_Type=Decimal". 

(Similarly, R6RS refers to the nonexistent property "Numeric", which could be interpreted either as above, or as also including characters with Numeric_Type=Numeric, which are non-digits such as fractions and powers of ten.  These characters have non-empty values in field 8 of UnicodeData.txt.)

-- 
John Cowan          http://vrici.lojban.org/~cowan        co...@ccil.org
"But I am the real Strider, fortunately," he said, looking down at them
with his face softened by a sudden smile.  "I am Aragorn son of Arathorn,
and if by life or death I can save you, I will."

Alex Shinn

unread,
Feb 10, 2019, 9:48:34 AM2/10/19
to scheme-re...@googlegroups.com, scheme-re...@googlegroups.com
What about using the category Nd (Number, Decimal Digit)?

--
Alex

--
You received this message because you are subscribed to the Google Groups "scheme-reports-wg2" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scheme-reports-...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

John Cowan

unread,
Feb 10, 2019, 9:09:19 PM2/10/19
to scheme-re...@googlegroups.com
That's a reasonable alternative, yes.  However, the other digits are in fact digits, they just don't participate in a positional decimal number scheme.  Let's discuss further.

You received this message because you are subscribed to the Google Groups "scheme-reports-wg1" group.

Alex Shinn

unread,
Feb 12, 2019, 3:16:48 PM2/12/19
to scheme-re...@googlegroups.com
My primary concern is that the basic charsets (alphabetic, numeric, whitespace, etc.) have no overlap, as is the case with ASCII.

This defines the Numeric_Types of Decimal, Digit, or Numeric (or None).
Decimal has a one-to-one correspondence with category Nd, the other cases are mixed.

Allowing Numeric includes many characters with a primary category of Letter, breaking the overlap rule.
For example, Sichuan (famous for its spicy cuisine) is written 四川, literally "Four Rivers."
The character for four can actually be used in a positional decimal system, but in processing modern text it is more likely to indicate a word.
Similarly with many well-known neighborhoods of Tokyo (Mitaka, Yotsuya, Roppongi).

The other problem being as you note that Numeric include fractions, and since digit-value is defined in terms of char-numeric? and must return 0..9 these are clearly out.

The question is then do we include Digits?  They are the following short list:

00B2..00B3    ; Digit # No   [2] SUPERSCRIPT TWO..SUPERSCRIPT THREE
00B9          ; Digit # No       SUPERSCRIPT ONE
1369..1371    ; Digit # No   [9] ETHIOPIC DIGIT ONE..ETHIOPIC DIGIT NINE
19DA          ; Digit # No       NEW TAI LUE THAM DIGIT ONE
2070          ; Digit # No       SUPERSCRIPT ZERO
2074..2079    ; Digit # No   [6] SUPERSCRIPT FOUR..SUPERSCRIPT NINE
2080..2089    ; Digit # No  [10] SUBSCRIPT ZERO..SUBSCRIPT NINE
2460..2468    ; Digit # No   [9] CIRCLED DIGIT ONE..CIRCLED DIGIT NINE
2474..247C    ; Digit # No   [9] PARENTHESIZED DIGIT ONE..PARENTHESIZED DIGIT NINE
2488..2490    ; Digit # No   [9] DIGIT ONE FULL STOP..DIGIT NINE FULL STOP
24EA          ; Digit # No       CIRCLED DIGIT ZERO
24F5..24FD    ; Digit # No   [9] DOUBLE CIRCLED DIGIT ONE..DOUBLE CIRCLED DIGIT NINE
24FF          ; Digit # No       NEGATIVE CIRCLED DIGIT ZERO
2776..277E    ; Digit # No   [9] DINGBAT NEGATIVE CIRCLED DIGIT ONE..DINGBAT NEGATIVE CIRCLED DIGIT NINE
2780..2788    ; Digit # No   [9] DINGBAT CIRCLED SANS-SERIF DIGIT ONE..DINGBAT CIRCLED SANS-SERIF DIGIT NINE
278A..2792    ; Digit # No   [9] DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT ONE..DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT NINE
10A40..10A43  ; Digit # No   [4] KHAROSHTHI DIGIT ONE..KHAROSHTHI DIGIT FOUR
10E60..10E68  ; Digit # No   [9] RUMI DIGIT ONE..RUMI DIGIT NINE
11052..1105A  ; Digit # No   [9] BRAHMI NUMBER ONE..BRAHMI NUMBER NINE
1F100..1F10A  ; Digit # No  [11] DIGIT ZERO FULL STOP..DIGIT NINE COMMA
These seem more like symbols than numbers, and in only one case is there a consecutive range 0..9 (the subscripts).
Admittedly, this is subjective, but I think it is a nice property if char-numeric? indicates the char can be used in a positional decimal system, which rules out Digits.

Thoughts?

--
Alex

Arthur A. Gleckler

unread,
Feb 12, 2019, 3:29:34 PM2/12/19
to scheme-reports-wg1
On Tue, Feb 12, 2019 at 12:16 PM Alex Shinn <alex...@gmail.com> wrote:
 
Admittedly, this is subjective, but I think it is a nice property if char-numeric? indicates the char can be used in a positional decimal system, which rules out Digits.
Thoughts?

I admit that I am a Unicode novice, but I find your argument convincing.  It's always possible for someone who wants to take advantage of other numeric representations to do so, and it's likely that bugs will result from making char-numeric? return true for these.

John Cowan

unread,
Feb 12, 2019, 5:59:30 PM2/12/19
to scheme-re...@googlegroups.com
On Tue, Feb 12, 2019 at 3:16 PM Alex Shinn <alex...@gmail.com> wrote:

Allowing Numeric includes many characters with a primary category of Letter, breaking the overlap rule.

Yes, I agree that Numeric makes no sense, even though that's what R6RS says.
 

00B2..00B3    ; Digit # No   [2] SUPERSCRIPT TWO..SUPERSCRIPT THREE
00B9          ; Digit # No       SUPERSCRIPT ONE
1369..1371    ; Digit # No   [9] ETHIOPIC DIGIT ONE..ETHIOPIC DIGIT NINE
19DA          ; Digit # No       NEW TAI LUE THAM DIGIT ONE
2070          ; Digit # No       SUPERSCRIPT ZERO
2074..2079    ; Digit # No   [6] SUPERSCRIPT FOUR..SUPERSCRIPT NINE
2080..2089    ; Digit # No  [10] SUBSCRIPT ZERO..SUBSCRIPT NINE
2460..2468    ; Digit # No   [9] CIRCLED DIGIT ONE..CIRCLED DIGIT NINE
2474..247C    ; Digit # No   [9] PARENTHESIZED DIGIT ONE..PARENTHESIZED DIGIT NINE
2488..2490    ; Digit # No   [9] DIGIT ONE FULL STOP..DIGIT NINE FULL STOP
24EA          ; Digit # No       CIRCLED DIGIT ZERO
24F5..24FD    ; Digit # No   [9] DOUBLE CIRCLED DIGIT ONE..DOUBLE CIRCLED DIGIT NINE
24FF          ; Digit # No       NEGATIVE CIRCLED DIGIT ZERO
2776..277E    ; Digit # No   [9] DINGBAT NEGATIVE CIRCLED DIGIT ONE..DINGBAT NEGATIVE CIRCLED DIGIT NINE
2780..2788    ; Digit # No   [9] DINGBAT CIRCLED SANS-SERIF DIGIT ONE..DINGBAT CIRCLED SANS-SERIF DIGIT NINE
278A..2792    ; Digit # No   [9] DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT ONE..DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT NINE
10A40..10A43  ; Digit # No   [4] KHAROSHTHI DIGIT ONE..KHAROSHTHI DIGIT FOUR
10E60..10E68  ; Digit # No   [9] RUMI DIGIT ONE..RUMI DIGIT NINE
11052..1105A  ; Digit # No   [9] BRAHMI NUMBER ONE..BRAHMI NUMBER NINE
1F100..1F10A  ; Digit # No  [11] DIGIT ZERO FULL STOP..DIGIT NINE COMMA
These seem more like symbols than numbers, and in only one case is there a consecutive range 0..9 (the subscripts).

I don't think the consecutive range matters much.
 
Admittedly, this is subjective, but I think it is a nice property if char-numeric? indicates the char can be used in a positional decimal system, which rules out Digits.

I think the subscript and superscript digits actually are used positionally (x^10, a_22), but that's cherry-picking.  Digit it is.  I'll fix the errata list and draft.

-- 
John Cowan          http://vrici.lojban.org/~cowan        co...@ccil.org
The peculiar excellence of comedy is its excellent fooling, and Aristophanes's
claim to immortality is based upon one title only: he was a master maker
of comedy, he could fool excellently.  Here Gilbert stands side by side
with him.  He, too, could write the most admirable nonsense.  There has
never been better fooling than his, and a comparison with him carries
nothing derogatory to the great Athenian. --Edith Hamilton, The Greek Way

Reply all
Reply to author
Forward
0 new messages