Add Turkish language support to things like `String.downcase/2` `String.upcase/2`.

66 views
Skip to first unread message

Cemre Aşar

unread,
Mar 28, 2020, 1:02:04 PM3/28/20
to elixir-lang-core
  1. I tried to do that but my knowledge of Elixir couldn't help me. :) The only thing swill be done are, in `String.upcase/2`, "i" will be "İ" and "ı" will be "I", in `String.downcase/2`, "İ" will be "i" and "I" will be "ı". I thought it would be something like `String.upcase "i", :turkish` but if you folks have a better way, I'm totally open to it. :) Thanks for interesting and have a nice day. :)

Andrea Leopardi

unread,
Mar 28, 2020, 2:18:49 PM3/28/20
to elixir-lang-core
Hey Cemre,

> "ı" will be "I"

This is already the case. You can try String.upcase("ı") == "I".

In any case, Elixir uses the Unicode standard to do upcasing and downcasing. Do other languages that support Unicode behave differently for these letters?

Andrea

On Sat, Mar 28, 2020 at 6:02 PM Cemre Aşar <cemre...@gmail.com> wrote:
  1. I tried to do that but my knowledge of Elixir couldn't help me. :) The only thing swill be done are, in `String.upcase/2`, "i" will be "İ" and "ı" will be "I", in `String.downcase/2`, "İ" will be "i" and "I" will be "ı". I thought it would be something like `String.upcase "i", :turkish` but if you folks have a better way, I'm totally open to it. :) Thanks for interesting and have a nice day. :)

--
You received this message because you are subscribed to the Google Groups "elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-co...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/35528a1c-bcd7-4871-a3f8-161525cf6c62%40googlegroups.com.

José Valim

unread,
Mar 28, 2020, 3:28:12 PM3/28/20
to elixir-l...@googlegroups.com
So we don't support the locale specific sections of Unicode and we leave that for third party libraries. That's because proper support is actually quite complex, as some languages may take grammar rules into account when applying downcasing/uppercase. So those scenarios are best handled by specific purposes libraries.

Cemre Aşar

unread,
Mar 29, 2020, 12:07:58 AM3/29/20
to elixir-lang-core
It really differs from language to language.



28 Mart 2020 Cumartesi 21:18:49 UTC+3 tarihinde Andrea Leopardi yazdı:
Hey Cemre,

> "ı" will be "I"

This is already the case. You can try String.upcase("ı") == "I".

In any case, Elixir uses the Unicode standard to do upcasing and downcasing. Do other languages that support Unicode behave differently for these letters?

Andrea

On Sat, Mar 28, 2020 at 6:02 PM Cemre Aşar <cemre...@gmail.com> wrote:
  1. I tried to do that but my knowledge of Elixir couldn't help me. :) The only thing swill be done are, in `String.upcase/2`, "i" will be "İ" and "ı" will be "I", in `String.downcase/2`, "İ" will be "i" and "I" will be "ı". I thought it would be something like `String.upcase "i", :turkish` but if you folks have a better way, I'm totally open to it. :) Thanks for interesting and have a nice day. :)

--
You received this message because you are subscribed to the Google Groups "elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-l...@googlegroups.com.
2020.03.29-06:42:14AM.jpg
2020.03.29-06:27:12AM.jpg
2020.03.29-06:17:19AM.jpg
2020.03.29-06:16:28AM.jpg
2020.03.29-06:01:32AM.jpg

Cemre Aşar

unread,
Mar 29, 2020, 12:11:24 AM3/29/20
to elixir-lang-core
Forgot to add my shell.

29 Mart 2020 Pazar 07:07:58 UTC+3 tarihinde Cemre Aşar yazdı:
2020.03.29-07:10:07.jpg
Message has been deleted

Cemre Aşar

unread,
Mar 29, 2020, 12:14:10 AM3/29/20
to elixir-lang-core
From that issue https://github.com/elixir-lang/elixir/issues/7105 and how `String.downcase/2` work, I thought it is like "These languages work the same way it would but have very tiny differences.". So, I thought it would be proper to add this. Sorry for unnecessary question. :) Have a nice day.

28 Mart 2020 Cumartesi 22:28:12 UTC+3 tarihinde José Valim yazdı:
So we don't support the locale specific sections of Unicode and we leave that for third party libraries. That's because proper support is actually quite complex, as some languages may take grammar rules into account when applying downcasing/uppercase. So those scenarios are best handled by specific purposes libraries.

On Sat, Mar 28, 2020 at 7:18 PM Andrea Leopardi <an.le...@gmail.com> wrote:
Hey Cemre,

> "ı" will be "I"

This is already the case. You can try String.upcase("ı") == "I".

In any case, Elixir uses the Unicode standard to do upcasing and downcasing. Do other languages that support Unicode behave differently for these letters?

Andrea

On Sat, Mar 28, 2020 at 6:02 PM Cemre Aşar <cemre...@gmail.com> wrote:
  1. I tried to do that but my knowledge of Elixir couldn't help me. :) The only thing swill be done are, in `String.upcase/2`, "i" will be "İ" and "ı" will be "I", in `String.downcase/2`, "İ" will be "i" and "I" will be "ı". I thought it would be something like `String.upcase "i", :turkish` but if you folks have a better way, I'm totally open to it. :) Thanks for interesting and have a nice day. :)

--
You received this message because you are subscribed to the Google Groups "elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-l...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-l...@googlegroups.com.

Kip

unread,
May 16, 2020, 8:30:32 AM5/16/20
to elixir-lang-core
Elixir supports the `:greek` mode for `String.upcase/2` and `String.downcase/2` backed by the data in `SpecialCasing.txt`.  It so happens that in Unicode 11 (the version upon which Elixir is based) only has the data for the Greek modality.

As of Unicode 13 (release in March) and perhaps Unicode 12 (I didn't check), SpecialCasing.txt also has the data to support an additional two modalities:

1. Turkic languages (Turkish and Azeri)
2. Lithuanian

Totally understand that full locale-customised transforms are outside the scope of Elixir (and are totally in scope for the work I'm doing with Unicode Transform algorithms). I do think a proposal to add these two variants has some merit since they are included in SpecialCasing.txt which is already part of Elixir's Unicode repertoire.

José, I know you have alway preferred to update the Unicode modules yourself. I'm still happy to give it a go.  Unicode 14 is going to be released on September 2020 which aligns nicely to Elixir 1.11.

Perhaps something to consider? Add Unicode 14 to Elixir 1.11 including supporting these two variants in SpecialCasing.txt as part of String.upcase/2 and String.downcase/2 as modes?

José Valim

unread,
May 16, 2020, 10:00:28 AM5/16/20
to elixir-l...@googlegroups.com
According to Unicode, both of those are context sensitive, which means we can’t apply them like we apply the Greek rules.

My understanding Is that Unicode 13 is out and we need to update it. Unicode 14 will be out in September 2021.

Kip

unread,
May 16, 2020, 10:25:52 AM5/16/20
to elixir-lang-core
Correct on both fronts José, I missed the language context. And yes September *2021* for Unicode 14.  I need to take a break, my accuracy is tracking too well today :-)

I just need to finished the code for Unicode regular expressions and the unicode transform rules engine so I can implement Unicode transforms. And then I'll feel like I can move on to something else other than I18n/L10n .....
Reply all
Reply to author
Forward
0 new messages