Jira (PUP-1800) Ruby downcase does not handle all unicode letters

4 views
Skip to first unread message

Henrik Lindberg (JIRA)

unread,
Jan 15, 2015, 4:05:03 PM1/15/15
to puppe...@googlegroups.com
Henrik Lindberg updated an issue
 
Puppet / Bug PUP-1800
Ruby downcase does not handle all unicode letters
Change By: Henrik Lindberg
Scrum Team: Language
Add Comment Add Comment
 
This message was sent by Atlassian JIRA (v6.3.10#6340-sha1:7ea293a)
Atlassian logo

Ethan Brown (JIRA)

unread,
Jul 29, 2016, 4:16:13 PM7/29/16
to puppe...@googlegroups.com
Ethan Brown updated an issue
Change By: Ethan Brown
Labels: i18n utf-8
This message was sent by Atlassian JIRA (v6.4.13#64028-sha1:b7939e9)
Atlassian logo

Henrik Lindberg (JIRA)

unread,
Jul 30, 2016, 8:20:03 AM7/30/16
to puppe...@googlegroups.com
Henrik Lindberg commented on Bug PUP-1800
 
Re: Ruby downcase does not handle all unicode letters

Note that:

  • normalization is required which is performance demanding in Ruby
  • user locale is needed as rules for up/down case depends on locale
  • locale is also needed for correct comparisons/ordering of strings

These together makes this a much thornier problem than just adding a library and comparing a different way.

Henrik Lindberg (JIRA)

unread,
Sep 7, 2016, 6:12:23 PM9/7/16
to puppe...@googlegroups.com
Henrik Lindberg updated an issue
 
Change By: Henrik Lindberg
Team: Puppet Developer Support
This message was sent by Atlassian JIRA (v6.4.14#64029-sha1:ae256fe)
Atlassian logo

Ethan Brown (JIRA)

unread,
Sep 30, 2016, 5:59:02 PM9/30/16
to puppe...@googlegroups.com
Ethan Brown commented on Bug PUP-1800
 
Re: Ruby downcase does not handle all unicode letters

Henrik Lindberg I'm removing the PUP-1031 epic in an effort to close that off. This seems like a ticket that belongs in a separate language (or similar) related epic like the ones that we've created at PUP-6718 / PUP-6719 / PUP-6720. I'd suggest creating that new epic and having it block the LOC-11 epic.

John Duarte (JIRA)

unread,
May 15, 2017, 6:50:04 PM5/15/17
to puppe...@googlegroups.com
John Duarte updated an issue
 
Change By: John Duarte
Labels: i18n  triaged  utf-8

Moses Mendoza (JIRA)

unread,
May 18, 2017, 1:49:12 PM5/18/17
to puppe...@googlegroups.com
Moses Mendoza updated an issue
Change By: Moses Mendoza
Labels: i18n  triaged  utf-8

Ethan Brown (JIRA)

unread,
May 18, 2017, 5:05:02 PM5/18/17
to puppe...@googlegroups.com
Ethan Brown commented on Bug PUP-1800
 
Re: Ruby downcase does not handle all unicode letters

FYI - Ruby 2.4 (which will ship with Puppet 5) is supposed to have fixed this.

Henrik Lindberg (JIRA)

unread,
May 19, 2017, 3:52:02 AM5/19/17
to puppe...@googlegroups.com

Here is an article about the Unicode support in Ruby 2.4 http://www.sw.it.aoyama.ac.jp/2016/pub/IUC40-Ruby2.4/

I found that it follows Unicode 9.0.0 recommendations for handling of special characters in German, Turkish, etc. by making compromises. That is better than having to rely on Locale and having different behavior depending on it. This means that up/down-case operations work - but that the operation in reverse (upcase.downcase or downcase.upcase) is not guaranteed to produce the same result. This is good enough since Puppet is not a typesetting system

I have yet to try out sorting and comparisons (<=>, <, >, etc) and operations like casecmp. (Article did not talk much about that)

Henrik Lindberg (JIRA)

unread,
May 19, 2017, 4:05:02 AM5/19/17
to puppe...@googlegroups.com

Oh, the implementation is naturally filled with gotchas - see this article: http://blog.honeybadger.io/ruby-s-unicode-support/
Read through the list of tests in that post and cry...

We may have to normalize all strings everywhere and that will have a huge performance impact. (Every JSON and YAML read for example).
If we do nothing we get the new behavior and we will probably be seeing bug after bug in relation to use of unicode characters and non normalized strings.

Josh Cooper (Jira)

unread,
Jun 6, 2020, 7:59:03 PM6/6/20
to puppe...@googlegroups.com
Josh Cooper updated an issue
 
Change By: Josh Cooper
Team: Puppet Developer Experience Night's Watch
This message was sent by Atlassian Jira (v8.5.2#805002-sha1:a66f935)
Atlassian logo

Josh Cooper (Jira)

unread,
Jan 27, 2021, 12:11:05 AM1/27/21
to puppe...@googlegroups.com
Josh Cooper updated an issue
In the puppet language, {{" A" == "a"}} is true and {{" Ä" == "ä"}} should be true, but it is false . :

{noformat}
bx puppet apply -e 'notice("A" == "a") notice("Ä" == "ä")'
Notice: Scope(Class[main]): true
Notice: Scope(Class[main]): false
{noformat}

Ruby doesn't properly handle changing from upper to lower case letters for all unicode glyphs in the letters category.

Other interesting cases to look at are things like "ß" which has no upper/lower-case distinction.

Josh Cooper (Jira)

unread,
Jun 9, 2021, 6:22:03 PM6/9/21
to puppe...@googlegroups.com
Josh Cooper commented on Bug PUP-1800
 
Re: Ruby downcase does not handle all unicode letters

This would be good to resolve but we don't have any plans on fixing it anytime soon. Please reopen if this is needed.

This message was sent by Atlassian Jira (v8.13.2#813002-sha1:c495a97)
Atlassian logo
Reply all
Reply to author
Forward
0 new messages