20 views

Skip to first unread message

Feb 9, 2019, 1:29:46 PM2/9/19

to scheme-re...@googlegroups.com, scheme-re...@googlegroups.com

The definition of char-numeric? refers to a nonexistent Unicode property Numeric_Digit. The intention was to refer to characters whose Numeric_Type property is either Digit or Decimal, which is defined by Unicode as those characters with a non-empty value in field 7 of the UnicodeData.txt file. The simplest fix is to change the relevant paragraph of Section 6.6 from "Numeric_Digit" to "Numeric_Type=Digit or Numeric_Type=Decimal".

(Similarly, R6RS refers to the nonexistent property "Numeric", which could be interpreted either as above, or as also including characters with Numeric_Type=Numeric, which are non-digits such as fractions and powers of ten. These characters have non-empty values in field 8 of UnicodeData.txt.)

--

John Cowan http://vrici.lojban.org/~cowan co...@ccil.org

"But I am the real Strider, fortunately," he said, looking down at them

with his face softened by a sudden smile. "I am Aragorn son of Arathorn,

and if by life or death I can save you, I will."

Feb 10, 2019, 9:48:34 AM2/10/19

to scheme-re...@googlegroups.com, scheme-re...@googlegroups.com

What about using the category Nd (Number, Decimal Digit)?

--

Alex

--

You received this message because you are subscribed to the Google Groups "scheme-reports-wg2" group.

To unsubscribe from this group and stop receiving emails from it, send an email to scheme-reports-...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Feb 10, 2019, 9:09:19 PM2/10/19

to scheme-re...@googlegroups.com

That's a reasonable alternative, yes. However, the other digits are in fact digits, they just don't participate in a positional decimal number scheme. Let's discuss further.

You received this message because you are subscribed to the Google Groups "scheme-reports-wg1" group.

Feb 12, 2019, 3:16:48 PM2/12/19

to scheme-re...@googlegroups.com

My primary concern is that the basic charsets (alphabetic, numeric, whitespace, etc.) have no overlap, as is the case with ASCII.

The summary of these derived types is in: https://www.unicode.org/Public/UCD/latest/ucd/extracted/DerivedNumericType.txt

This defines the Numeric_Types of Decimal, Digit, or Numeric (or None).

Decimal has a one-to-one correspondence with category Nd, the other cases are mixed.

Allowing Numeric includes many characters with a primary category of Letter, breaking the overlap rule.

For example, Sichuan (famous for its spicy cuisine) is written 四川, literally "Four Rivers."

The character for four can actually be used in a positional decimal system, but in processing modern text it is more likely to indicate a word.

Similarly with many well-known neighborhoods of Tokyo (Mitaka, Yotsuya, Roppongi).

The other problem being as you note that Numeric include fractions, and since digit-value is defined in terms of char-numeric? and must return 0..9 these are clearly out.

The question is then do we include Digits? They are the following short list:

00B2..00B3 ; Digit # No [2] SUPERSCRIPT TWO..SUPERSCRIPT THREE 00B9 ; Digit # No SUPERSCRIPT ONE 1369..1371 ; Digit # No [9] ETHIOPIC DIGIT ONE..ETHIOPIC DIGIT NINE 19DA ; Digit # No NEW TAI LUE THAM DIGIT ONE 2070 ; Digit # No SUPERSCRIPT ZERO 2074..2079 ; Digit # No [6] SUPERSCRIPT FOUR..SUPERSCRIPT NINE 2080..2089 ; Digit # No [10] SUBSCRIPT ZERO..SUBSCRIPT NINE 2460..2468 ; Digit # No [9] CIRCLED DIGIT ONE..CIRCLED DIGIT NINE 2474..247C ; Digit # No [9] PARENTHESIZED DIGIT ONE..PARENTHESIZED DIGIT NINE 2488..2490 ; Digit # No [9] DIGIT ONE FULL STOP..DIGIT NINE FULL STOP 24EA ; Digit # No CIRCLED DIGIT ZERO 24F5..24FD ; Digit # No [9] DOUBLE CIRCLED DIGIT ONE..DOUBLE CIRCLED DIGIT NINE 24FF ; Digit # No NEGATIVE CIRCLED DIGIT ZERO 2776..277E ; Digit # No [9] DINGBAT NEGATIVE CIRCLED DIGIT ONE..DINGBAT NEGATIVE CIRCLED DIGIT NINE 2780..2788 ; Digit # No [9] DINGBAT CIRCLED SANS-SERIF DIGIT ONE..DINGBAT CIRCLED SANS-SERIF DIGIT NINE 278A..2792 ; Digit # No [9] DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT ONE..DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT NINE 10A40..10A43 ; Digit # No [4] KHAROSHTHI DIGIT ONE..KHAROSHTHI DIGIT FOUR 10E60..10E68 ; Digit # No [9] RUMI DIGIT ONE..RUMI DIGIT NINE 11052..1105A ; Digit # No [9] BRAHMI NUMBER ONE..BRAHMI NUMBER NINE 1F100..1F10A ; Digit # No [11] DIGIT ZERO FULL STOP..DIGIT NINE COMMA

These seem more like symbols than numbers, and in only one case is there a consecutive range 0..9 (the subscripts).

Admittedly, this is subjective, but I think it is a nice property if char-numeric? indicates the char can be used in a positional decimal system, which rules out Digits.

Thoughts?

--

Alex

Feb 12, 2019, 3:29:34 PM2/12/19

to scheme-reports-wg1

On Tue, Feb 12, 2019 at 12:16 PM Alex Shinn <alex...@gmail.com> wrote:

Admittedly, this is subjective, but I think it is a nice property if char-numeric? indicates the char can be used in a positional decimal system, which rules out Digits.Thoughts?

I admit that I am a Unicode novice, but I find your argument convincing. It's always possible for someone who wants to take advantage of other numeric representations to do so, and it's likely that bugs will result from making char-numeric? return true for these.

Feb 12, 2019, 5:59:30 PM2/12/19

to scheme-re...@googlegroups.com

On Tue, Feb 12, 2019 at 3:16 PM Alex Shinn <alex...@gmail.com> wrote:

Allowing Numeric includes many characters with a primary category of Letter, breaking the overlap rule.

Yes, I agree that Numeric makes no sense, even though that's what R6RS says.

00B2..00B3 ; Digit # No [2] SUPERSCRIPT TWO..SUPERSCRIPT THREE 00B9 ; Digit # No SUPERSCRIPT ONE 1369..1371 ; Digit # No [9] ETHIOPIC DIGIT ONE..ETHIOPIC DIGIT NINE 19DA ; Digit # No NEW TAI LUE THAM DIGIT ONE 2070 ; Digit # No SUPERSCRIPT ZERO 2074..2079 ; Digit # No [6] SUPERSCRIPT FOUR..SUPERSCRIPT NINE 2080..2089 ; Digit # No [10] SUBSCRIPT ZERO..SUBSCRIPT NINE 2460..2468 ; Digit # No [9] CIRCLED DIGIT ONE..CIRCLED DIGIT NINE 2474..247C ; Digit # No [9] PARENTHESIZED DIGIT ONE..PARENTHESIZED DIGIT NINE 2488..2490 ; Digit # No [9] DIGIT ONE FULL STOP..DIGIT NINE FULL STOP 24EA ; Digit # No CIRCLED DIGIT ZERO 24F5..24FD ; Digit # No [9] DOUBLE CIRCLED DIGIT ONE..DOUBLE CIRCLED DIGIT NINE 24FF ; Digit # No NEGATIVE CIRCLED DIGIT ZERO 2776..277E ; Digit # No [9] DINGBAT NEGATIVE CIRCLED DIGIT ONE..DINGBAT NEGATIVE CIRCLED DIGIT NINE 2780..2788 ; Digit # No [9] DINGBAT CIRCLED SANS-SERIF DIGIT ONE..DINGBAT CIRCLED SANS-SERIF DIGIT NINE 278A..2792 ; Digit # No [9] DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT ONE..DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT NINE 10A40..10A43 ; Digit # No [4] KHAROSHTHI DIGIT ONE..KHAROSHTHI DIGIT FOUR 10E60..10E68 ; Digit # No [9] RUMI DIGIT ONE..RUMI DIGIT NINE 11052..1105A ; Digit # No [9] BRAHMI NUMBER ONE..BRAHMI NUMBER NINE 1F100..1F10A ; Digit # No [11] DIGIT ZERO FULL STOP..DIGIT NINE COMMAThese seem more like symbols than numbers, and in only one case is there a consecutive range 0..9 (the subscripts).

I don't think the consecutive range matters much.

Admittedly, this is subjective, but I think it is a nice property if char-numeric? indicates the char can be used in a positional decimal system, which rules out Digits.

I think the subscript and superscript digits actually are used positionally (x^10, a_22), but that's cherry-picking. Digit it is. I'll fix the errata list and draft.

--

John Cowan http://vrici.lojban.org/~cowan co...@ccil.org

The peculiar excellence of comedy is its excellent fooling, and Aristophanes's

claim to immortality is based upon one title only: he was a master maker

of comedy, he could fool excellently. Here Gilbert stands side by side

with him. He, too, could write the most admirable nonsense. There has

never been better fooling than his, and a comparison with him carries

nothing derogatory to the great Athenian. --Edith Hamilton, The Greek Way

Reply all

Reply to author

Forward

0 new messages

Search

Clear search

Close search

Google apps

Main menu