Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

The extension of Is_Basic to unicode (about AI12-0260-1)

75 views
Skip to first unread message

ytomino

unread,
Apr 10, 2018, 8:52:34 PM4/10/18
to
AI12-0260-1/04 Functions Is_Basic and To_Basic in Wide_Characters.Handling
http://www.ada-auth.org/cgi-bin/cvsweb.cgi/ai12s/ai12-0260-1.txt?rev=1.5&raw=N

...Has already been formally adopted into RM? (status is "Amendment")

I found inconsistency between existing Characters.Handling.Is_Basic and new Wide_Characters.Handling.Is_Basic.

Characters.Handling.Is_Basic in RM:

True if Item is a basic letter. A basic letter is a character that is in one of the ranges 'A'..'Z' and 'a'..'z', or that is one of the following: 'Æ', 'æ', 'Ð', 'ð', 'Þ', 'þ', or 'ß'.

Characters.H.Is_Basic includes only alphabet, not include other symbols.
Is_Basic ('+') = False.

Wide_Characters.Handling.Is_Basic in AI:

Returns True if the Wide_Character designated by Item has no Decomposition Mapping in the code charts of ISO/IEC 10646:2017; otherwise returns False.

Wide_Characters.H.Is_Basic includes all un-decomposable characters, called as "base character" in Unicode world. It include the symbols.
Is_Basic ('+') = True.

Perhaps, Is_Basic must be defined as the intersection of the set of base characters *and the set of letters* (categorized as 'Ll', 'Lu', 'Lt', 'Lm', 'Lo'... in Unicode Character Database).

Thanks.

J-P. Rosen

unread,
Apr 10, 2018, 11:38:07 PM4/10/18
to
Le 11/04/2018 à 02:52, ytomino a écrit :
> AI12-0260-1/04 Functions Is_Basic and To_Basic in Wide_Characters.Handling
> I found inconsistency between existing Characters.Handling.Is_Basic and new Wide_Characters.Handling.Is_Basic.
>
> Characters.Handling.Is_Basic in RM:
>
> True if Item is a basic letter. A basic letter is a character that is in one of the ranges 'A'..'Z' and 'a'..'z', or that is one of the following: 'Æ', 'æ', 'Ð', 'ð', 'Þ', 'þ', or 'ß'.
>
> Characters.H.Is_Basic includes only alphabet, not include other symbols.
> Is_Basic ('+') = False.
>
> Wide_Characters.Handling.Is_Basic in AI:
>
> Returns True if the Wide_Character designated by Item has no Decomposition Mapping in the code charts of ISO/IEC 10646:2017; otherwise returns False.
>
> Wide_Characters.H.Is_Basic includes all un-decomposable characters, called as "base character" in Unicode world. It include the symbols.
> Is_Basic ('+') = True.
>
> Perhaps, Is_Basic must be defined as the intersection of the set of base characters *and the set of letters* (categorized as 'Ll', 'Lu', 'Lt', 'Lm', 'Lo'... in Unicode Character Database).


Right, but the old definition was wrong and the new one is right. In
general, Ada prefers to use existing standards rather than inventing its
own special definitions. If you need to make sure that something is a
letter, there is the Is_Letter function.

--
J-P. Rosen
Adalog
2 rue du Docteur Lombard, 92441 Issy-les-Moulineaux CEDEX
Tel: +33 1 45 29 21 52, Fax: +33 1 45 29 25 00
http://www.adalog.fr

ytomino

unread,
Apr 10, 2018, 11:52:52 PM4/10/18
to
I agree with you on the point of the old definition is wrong.
However, should new function name be used for new definition?
Message has been deleted

J-P. Rosen

unread,
Apr 11, 2018, 4:54:58 PM4/11/18
to
Le 11/04/2018 à 16:32, Dan'l Miller a écrit :
>> True if Item is a basic letter. A basic letter is a character that
>> is in one of the ranges 'A'..'Z' and 'a'..'z', or that is one of
>> the following: 'Æ', 'æ', 'Ð', 'ð', 'Þ', 'þ', or 'ß'.
> If this Ada-specific definition of this is-basic/base-Latin-letter
> property is the official normative list, then it seems rather
> arbitrary and capricious, not conforming to Unicode or to linguistic
> reality.
>
> In Unicode-speak's terminology/jargon, the definition of base
> character at https://definedterm.com/a/definition/160575 would admit
> quite a few more, [...]
The above Is_Basic is about Character, and is defined only when using
Latin-1. Unicode is a different standard.

Randy Brukardt

unread,
Apr 11, 2018, 6:20:28 PM4/11/18
to
"J-P. Rosen" <ro...@adalog.fr> wrote in message
news:palsmv$g18$1...@gioia.aioe.org...
> Le 11/04/2018 à 16:32, Dan'l Miller a écrit :
>>> True if Item is a basic letter. A basic letter is a character that
>>> is in one of the ranges 'A'..'Z' and 'a'..'z', or that is one of
>>> the following: 'Æ', 'æ', 'Ğ', 'ğ', 'Ş', 'ş', or 'ß'.
>> If this Ada-specific definition of this is-basic/base-Latin-letter
>> property is the official normative list, then it seems rather
>> arbitrary and capricious, not conforming to Unicode or to linguistic
>> reality.
>>
>> In Unicode-speak's terminology/jargon, the definition of base
>> character at https://definedterm.com/a/definition/160575 would admit
>> quite a few more, [...]
> The above Is_Basic is about Character, and is defined only when using
> Latin-1. Unicode is a different standard.

Moreover, its definition is historical -- it was defined this way for Ada
95, and whether or not that would be the correct definition had it been
defined in 2018 is irrelevant. Changing the definition would potentially
silently break programs that use it. There are a number of things in
Ada.Characters.Handling that aren't correct for Unicode purposes, one of
them is even called out by the third note in A.3.2.

Randy.



ytomino

unread,
Apr 11, 2018, 7:57:12 PM4/11/18
to
Thanks for your detailed description.

If Character.Handling.Is_Basic can not be changed because compatibility, still more, this *overloading* will create new problem for the future.

For example, on rewriting some applications from Character to Wide_Character, it may be imagined that two meanings of Is_Basic will confuse.
Or, they makes hard to use "use clause", or use as a generic formal subprogram.

Excuse me for repeating, should new function name be used for new definition?

function Is_Base (Item : Wide_Character) return Boolean; -- according with Unicode
function Is_Basic (Item : Wide_Character) return Boolean is (Is_Base (Item) and Is_Letter (Item)); -- for compatibility

J-P. Rosen

unread,
Apr 12, 2018, 1:14:13 AM4/12/18
to
Le 12/04/2018 à 01:57, ytomino a écrit :
> If Character.Handling.Is_Basic can not be changed because
> compatibility, still more, this *overloading* will create new problem
> for the future.
>
> For example, on rewriting some applications from Character to
> Wide_Character, it may be imagined that two meanings of Is_Basic will
> confuse.
> Or, they makes hard to use "use clause", or use as a generic formal
> subprogram.
If you are adapting a program to use the full BMP instead of Latin1,
expect many more difficult issues and/or incompatibilities than this one...

> Excuse me for repeating, should new function name be used for new
> definition?
>
> function Is_Base (Item : Wide_Character) return Boolean; -- according with Unicode
> function Is_Basic (Item : Wide_Character) return Boolean is (Is_Base (Item) and Is_Letter (Item)); -- for compatibility

This is technically doable, but not obviously desirable. This
incompatibility seems to me to be a candidate to the following comment,
already used for some other incompatibilities:

"This incompatibility is likely to fix more bugs than it will create"
0 new messages