Erratum needed for char-foldcase description

14 views
Skip to first unread message

co...@ccil.org

unread,
Jun 12, 2015, 2:51:41 PM6/12/15
to scheme-re...@googlegroups.com, scheme-re...@googlegroups.com
Currently, the description of the `char-foldcase` procedure in R7RS-small
says:

The char-foldcase procedure applies the Unicode simple
case-folding algorithm to its argument and returns the result.
Note that language-sensitive folding is not used.
If the argument is an uppercase letter, the result will be
either a lowercase letter or the same as the argument if the
lowercase letter does not exist or is not supported by the
implementation.

Alas, this will no longer be true in Unicode 8.0, to be released in a
few weeks. The Cherokee script is transitioning from having a single
case to having both upper and lower case, and the existing letters are
being repurposed as upper case letters, just as SCRIPTUM LATINUM grew a
lower case during the Middle Ages. For backward compatibility, this
means that case folding (which does not affect Cherokee letters in
Unicode 7.0 and earlier) will now map the lower-case letters to upper case,
leaving the upper-case letters alone. All other scripts with case (Latin,
Greek, Cyrillic, Armenian, and a few others) will continue to map
upper-case letters to lower case.

I therefore propose stripping out the references to upper and lower
case, with the following result:

The char-foldcase procedure applies the Unicode simple
case-folding algorithm to its argument and returns the result.
Note that language-sensitive folding is not used. If the result
of folding is not supported by the implementation, the argument
is returned.

Does anyone find this objectionable? I have added it to the unofficial
errata list.

--
weirdo: When is R7RS coming out?
Riastradh: As soon as the top is a beautiful golden brown and if you
stick a toothpick in it, the toothpick comes out dry.


Alex Shinn

unread,
Jun 12, 2015, 6:49:18 PM6/12/15
to scheme-re...@googlegroups.com, scheme-re...@googlegroups.com
On Sat, Jun 13, 2015 at 3:51 AM, <co...@ccil.org> wrote:
[...] The Cherokee script is transitioning from having a single

case to having both upper and lower case, and the existing letters are
being repurposed as upper case letters, just as SCRIPTUM LATINUM grew a
lower case during the Middle Ages.  For backward compatibility, this
means that case folding (which does not affect Cherokee letters in
Unicode 7.0 and earlier) will now map the lower-case letters to upper case,
leaving the upper-case letters alone.

Was there any reason they couldn't repurpose the existing characters as lower-case?

-- 
Alex

John Cowan

unread,
Jun 12, 2015, 9:38:30 PM6/12/15
to scheme-re...@googlegroups.com, scheme-re...@googlegroups.com
Alex Shinn scripsit:

> Was there any reason they couldn't repurpose the existing characters as
> lower-case?

You mean the Cherokees or the Unicadets? Because the existing glyphs
are typical capital letters in size and appearance, they are going to
be used as the upper-case glyphs. Lower-case letters will be smaller
and only go up to the x-height. Changing the existing codepoints to
represent the lowercase letters would cause existing single-case texts
to come out in all lowercase, which is not how they should appear.

--
John Cowan http://www.ccil.org/~cowan co...@ccil.org
Some people open all the Windows; wise wives welcome the spring by moving
the Unix. --Advertisement for Unix Book Units (U.K.)
(see https://9p.io/who/dmr/unix3image.gif)
Reply all
Reply to author
Forward
0 new messages