To unsubscribe from this group, send email to golang-nuts+unsubscribegooglegroups.com or reply to this email with the words "REMOVE ME" as the subject.
> The Unicode tables do not respect locale.
UnicodeData.txt is not the whole case mapping specified by Unicode.
It only covers context-independent 1-1 mappings. The remaing rules are
described at http://unicode.org/Public/UNIDATA/SpecialCasing.txt and
they include Turkish i.
--
Marcin Kowalczyk
Here's a more precise formulation of the problem, in a form suitable
for programming, which uses authoritative sources to be more
persuasive.
The Unicode Standard includes case mapping rules.
Unicode Technical Report #21 - Case Mappings
http://unicode.org/reports/tr21/tr21-3.html
The Unicode Standard 5.2 - Chapter 5 - 5.18 Case Mappings
http://unicode.org/versions/Unicode5.0.0/ch05.pdf
"The case mappings specified by the Unicode Character Database are in
the union of the UnicodeData.txt and SpecialCasing.txt files." TR21-3
The Go unicode package only refers to the rules in the UnicodeData.txt
file ; it omits the rules in the SpecialCasing.txt file. Therefore,
the case mappings for Turkish and some other language are incorrect.
Peter
On Mar 29, 10:44 am, Mehmet D. Akın <mda...@gmail.com> wrote:
> Hi,
>
> It seems ToLower and ToUpper in the strings package do not work correctly
> for Turkish "ı,i" letters. The correct case conversion for Turkish locale
> would be i <-> İ </wiki/Turkish_dotted_and_dotless_I> and ı <-> I . Is
Here's an example of the use of a culture qualifier to apply the
Unicode case mapping and sorting rules to Turkish and several other
alphabets.
Custom Case Mappings and Sorting Rules
http://msdn.microsoft.com/en-us/library/xk2wykcz%28VS.100%29.aspx
Peter
On Mar 29, 2:50 pm, peterGo <go.peter...@gmail.com> wrote:
> Mehmet,
>
> Here's a more precise formulation, in programming terms, of the
> problem, using authoritative sources to be more persuasive.
>
> The Unicode Standard includes case mapping rules.
>
> Unicode Technical Report #21 - Case Mappingshttp://unicode.org/reports/tr21/tr21-3.html
>
> The Unicode Standard - Chapter 5 - 5.18 Case Mappingshttp://unicode.org/versions/Unicode5.0.0/ch05.pdf
>
> "The case mappings specified by the Unicode Character Database are in
> the union of the UnicodeData.txt and SpecialCasing.txt files." TR21-3
>
> The Go unicode package only refers to the rules in UnicodeData.txt; it
> omits the rules in SpecialCasing.txt. Therefore, the case mappings for
> Turkish and some other language are incorrect.
>
> Peter
>
> On Mar 29, 10:44 am, Mehmet D. Akın <mda...@gmail.com> wrote:
>
> > Hi,
>
> > It seems ToLower and ToUpper in the strings package do not work correctly
> > for Turkish "ı,i" letters. The correct case conversion for Turkish locale
> > would be i <-> İ </wiki/Turkish_dotted_and_dotless_I> and ı <-> I . Is
Mehmet,
Here's a more precise formulation, in programming terms, of the
problem, using authoritative sources to be more persuasive.
The Unicode Standard includes case mapping rules.
Unicode Technical Report #21 - Case Mappings
http://unicode.org/reports/tr21/tr21-3.html
The Unicode Standard - Chapter 5 - 5.18 Case Mappings
http://unicode.org/versions/Unicode5.0.0/ch05.pdf
"The case mappings specified by the Unicode Character Database are in
the union of the UnicodeData.txt and SpecialCasing.txt files." TR21-3
The Go unicode package only refers to the rules in UnicodeData.txt; it
omits the rules in SpecialCasing.txt. Therefore, the case mappings for
Turkish and some other language are incorrect.
Peter
> Hi,
>
> It seems ToLower and ToUpper in the strings package do not work correctly
> for Turkish "ı,i" letters. The correct case conversion for Turkish locale
> would be i <-> İ </wiki/Turkish_dotted_and_dotless_I> and ı <-> I . Is
We've come up with a reasonable design to add variant case mapping tables to the interface. It will take a few days to make the changes.
> Thanks Rob, So it will be fixed eventually. Good news. Should I open
> a bug to track this?
Yes please.
-rob
Yes please.
-rob
On Mar 29, 8:33 pm, "Rob 'Commander' Pike" <r...@google.com> wrote:
> I am not a Unicode expert, so I may be missing some nuance, but here
> is my understanding.
>
The "Turkish test" gained some popularity. To see what problems people
from turkey face, take a look at this post:
http://www.moserware.com/2008/02/does-your-code-pass-turkey-test.html
A character is considered to have a lowercase, uppercase, or title case equivalent if there is a respective "simple" case mapping specified for the character in the Unicode Character Database (UnicodeData.txt). If a character has no mapping equivalent, the result is the character itself.
ICU implements full Unicode string case mappings. In general,
- case mapping can change the number of code points and/or code units of a string,
So, the present list thread prompted Go to obtain a SpecialCase type allowing users to select a mapping for their local language. But what about the general case?
- is language-sensitive (results may differ depending on language), and
- is context-sensitive (a character in the input string may map differently depending on surrounding characters).