Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Downcase Accented characters
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  16 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Roberto Ostinelli  
View profile   Translate to Translated (View Original)
 More options Oct 21 2012, 3:14 pm
From: Roberto Ostinelli <robe...@widetag.com>
Date: Sun, 21 Oct 2012 12:14:37 -0700
Local: Sun, Oct 21 2012 3:14 pm
Subject: [erlang-questions] Downcase Accented characters

Dear list,

I've a binary string which includes accented characters and unicode, that i
need to downcase.

Is my real best option here to convert everything to list and downcase that?

_______________________________________________
erlang-questions mailing list
erlang-questi...@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Loïc Hoguin  
View profile  
 More options Oct 21 2012, 3:19 pm
From: Loïc Hoguin <es...@ninenines.eu>
Date: Sun, 21 Oct 2012 21:18:54 +0200
Local: Sun, Oct 21 2012 3:18 pm
Subject: Re: [erlang-questions] Downcase Accented characters
On 10/21/2012 09:14 PM, Roberto Ostinelli wrote:

> Dear list,

> I've a binary string which includes accented characters and unicode,
> that i need to downcase.

> Is my real best option here to convert everything to list and downcase that?

Your current best option is ux_string:to_lower/1 from the ux library
which will properly lower all characters, not just A-Z.

Should be at https://github.com/erlang-unicode/ux

--
Loïc Hoguin
Erlang Cowboy
Nine Nines
http://ninenines.eu
_______________________________________________
erlang-questions mailing list
erlang-questi...@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Roberto Ostinelli  
View profile  
 More options Oct 21 2012, 3:25 pm
From: Roberto Ostinelli <robe...@widetag.com>
Date: Sun, 21 Oct 2012 12:25:11 -0700
Local: Sun, Oct 21 2012 3:25 pm
Subject: Re: [erlang-questions] Downcase Accented characters

Thank you Loïc,

did you happen to benchmark it? Would that be better/faster than a simple
list_to_binary(string:to_lower(binary_to_list(Bin)))?

_______________________________________________
erlang-questions mailing list
erlang-questi...@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Loïc Hoguin  
View profile  
 More options Oct 21 2012, 3:33 pm
From: Loïc Hoguin <es...@ninenines.eu>
Date: Sun, 21 Oct 2012 21:33:25 +0200
Local: Sun, Oct 21 2012 3:33 pm
Subject: Re: [erlang-questions] Downcase Accented characters
For this comparison, ux would be slow and accurate, while your solution
would be fast and inaccurate. :)

On 10/21/2012 09:25 PM, Roberto Ostinelli wrote:

--
Loïc Hoguin
Erlang Cowboy
Nine Nines
http://ninenines.eu
_______________________________________________
erlang-questions mailing list
erlang-questi...@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Roberto Ostinelli  
View profile  
 More options Oct 21 2012, 3:39 pm
From: Roberto Ostinelli <robe...@widetag.com>
Date: Sun, 21 Oct 2012 12:39:22 -0700
Local: Sun, Oct 21 2012 3:39 pm
Subject: Re: [erlang-questions] Downcase Accented characters

For the records, this just works..

start() ->
Unicode = list_to_binary("∞-HOpe@☺.EXAMple.com/My❤"),
Result = list_to_binary(string:to_lower(binary_to_list(Unicode))),
"∞-hope@☺.example.com/my❤" = binary_to_list(Result).

any downsides I'm not seeing?

On Sun, Oct 21, 2012 at 12:25 PM, Roberto Ostinelli <robe...@widetag.com>wrote:

_______________________________________________
erlang-questions mailing list
erlang-questi...@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Roberto Ostinelli  
View profile  
 More options Oct 21 2012, 3:45 pm
From: Roberto Ostinelli <robe...@widetag.com>
Date: Sun, 21 Oct 2012 12:45:21 -0700
Local: Sun, Oct 21 2012 3:45 pm
Subject: Re: [erlang-questions] Downcase Accented characters

BTW,

ux dependencies are unsatisfied:

==> ux (get-deps)
Pulling abnfc from {git,"git://github.com/nygge/abnfc.git","master"}
Cloning into 'abnfc'...
Pulling metamodule from {git,"git://github.com/freeakk/metamodule.git",
                             "master"}
fatal: remote error:
  Repository not found.
Cloning into 'metamodule'...
ERROR: git clone -n git://github.com/freeakk/metamodule.git metamodule
failed with error: 128 and output:
fatal: remote error:
  Repository not found.
Cloning into 'metamodule'...

ERROR: 'get-deps' failed while processing

On Sun, Oct 21, 2012 at 12:39 PM, Roberto Ostinelli <robe...@widetag.com>wrote:

_______________________________________________
erlang-questions mailing list
erlang-questi...@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Loïc Hoguin  
View profile  
 More options Oct 21 2012, 3:46 pm
From: Loïc Hoguin <es...@ninenines.eu>
Date: Sun, 21 Oct 2012 21:46:17 +0200
Local: Sun, Oct 21 2012 3:46 pm
Subject: Re: [erlang-questions] Downcase Accented characters
This only works for letters found in latin1, not for all the uppercases
found in unicode. If that's good enough for you then you don't need ux. :)

On 10/21/2012 09:39 PM, Roberto Ostinelli wrote:

--
Loïc Hoguin
Erlang Cowboy
Nine Nines
http://ninenines.eu
_______________________________________________
erlang-questions mailing list
erlang-questi...@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Roberto Ostinelli  
View profile  
 More options Oct 21 2012, 3:51 pm
From: Roberto Ostinelli <robe...@widetag.com>
Date: Sun, 21 Oct 2012 12:51:46 -0700
Local: Sun, Oct 21 2012 3:51 pm
Subject: Re: [erlang-questions] Downcase Accented characters

Oh I see.

So if I want to downcase this string: "∞-HOpe@☺.ÉXAMple.com/My❤" I will
need ux?

r.

_______________________________________________
erlang-questions mailing list
erlang-questi...@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Loïc Hoguin  
View profile  
 More options Oct 21 2012, 4:00 pm
From: Loïc Hoguin <es...@ninenines.eu>
Date: Sun, 21 Oct 2012 22:00:15 +0200
Local: Sun, Oct 21 2012 4:00 pm
Subject: Re: [erlang-questions] Downcase Accented characters
Yes and no, this example would still work I think? I'm no expert on how
Erlang deals with unicode, I just know what string:to_lower/1 does. :)

On 10/21/2012 09:51 PM, Roberto Ostinelli wrote:

--
Loïc Hoguin
Erlang Cowboy
Nine Nines
http://ninenines.eu
_______________________________________________
erlang-questions mailing list
erlang-questi...@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Roberto Ostinelli  
View profile  
 More options Oct 21 2012, 4:10 pm
From: Roberto Ostinelli <robe...@widetag.com>
Date: Sun, 21 Oct 2012 13:10:27 -0700
Local: Sun, Oct 21 2012 4:10 pm
Subject: Re: [erlang-questions] Downcase Accented characters

ok, thank you :)

r.

_______________________________________________
erlang-questions mailing list
erlang-questi...@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Thomas Allen  
View profile  
 More options Oct 21 2012, 4:12 pm
From: "Thomas Allen" <tho...@oinksoft.com>
Date: Sun, 21 Oct 2012 16:12:09 -0400
Local: Sun, Oct 21 2012 4:12 pm
Subject: Re: [erlang-questions] Downcase Accented characters

On Sun, October 21, 2012 3:39 pm, Roberto Ostinelli wrote:
> For the records, this just works..

> start() ->
> Unicode = list_to_binary("∞-HOpe@☺.EXAMple.com/My❤"),
> Result = list_to_binary(string:to_lower(binary_to_list(Unicode))),
> "∞-hope@☺.example.com/my❤" = binary_to_list(Result).

> any downsides I'm not seeing?

For what it's worth,

1> list_to_binary("&#8734;-HOpe@&#9786;.EXAMple.com/My&#10084;").
** exception error: bad argument
     in function  list_to_binary/1
        called as
list_to_binary([8734,45,72,79,112,101,64,9786,46,69,88,65,77,112,108,101,
                                  46,99,111,109,47,77,121,10084])

I get that on my system if any of the special characters (&#8734;,
&#9786;, &#10084;) are present (R15B02 on Debian 6.0.6 and OSX 10.7.2,
both built from source). So you might need to be careful with that
technique.

Thomas Allen

_______________________________________________
erlang-questions mailing list
erlang-questi...@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Michael Uvarov  
View profile  
 More options Oct 21 2012, 4:13 pm
From: Michael Uvarov <free...@gmail.com>
Date: Mon, 22 Oct 2012 00:12:52 +0400
Local: Sun, Oct 21 2012 4:12 pm
Subject: Re: [erlang-questions] Downcase Accented characters
> ux dependencies are unsatisfied.

Just update it.
_______________________________________________
erlang-questions mailing list
erlang-questi...@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Michael Uvarov  
View profile  
 More options Oct 21 2012, 4:16 pm
From: Michael Uvarov <free...@gmail.com>
Date: Mon, 22 Oct 2012 00:16:37 +0400
Local: Sun, Oct 21 2012 4:16 pm
Subject: Re: [erlang-questions] Downcase Accented characters
list_to_binary([8734,45,72,79,112,101,64,9786,46,69,88,65,77,112,108,101,
                                  46,99,111,109,47,77,121,10084])
It works only for elements from 1 to 255.
Use unicode:characters_to_binary/1 instead.

If ux is slow,  than try i18n (it is NIFs for ICU).
_______________________________________________
erlang-questions mailing list
erlang-questi...@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Yurii Rashkovskii  
View profile  
 More options Oct 21 2012, 6:02 pm
From: Yurii Rashkovskii <yra...@gmail.com>
Date: Sun, 21 Oct 2012 15:02:36 -0700 (PDT)
Local: Sun, Oct 21 2012 6:02 pm
Subject: Re: [erlang-questions] Downcase Accented characters

Roberto,

You might be able to achieve what you need by using one isolated bit of
Elixir's distribution — String.Unicode module.

it is compiled right off UnicodeData.txt so it has all the necessary data
embedded, thus saving you from talking to gen_servers or ETS tables.

And the best part is that you don't really need Elixir itself to be able to
use it:

You can
edit https://github.com/elixir-lang/elixir/blob/master/lib/elixir/priv/uni...
and rename it from String.Unicode to, say, :string_unicode (to be visually
native to erlang's code) and after the compilation you'll get a beam file
you can use independently from Elixir because it doesn't use anything from
the elixir application.

Hope you'll find this helpful.

_______________________________________________
erlang-questions mailing list
erlang-questi...@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Marc Worrell  
View profile  
 More options Oct 22 2012, 5:45 am
From: Marc Worrell <m...@worrell.nl>
Date: Mon, 22 Oct 2012 11:44:59 +0200
Local: Mon, Oct 22 2012 5:44 am
Subject: Re: [erlang-questions] Downcase Accented characters
When you need to do downcast a subset (most european languages) then you can also check the z_string.erl module in z_stdlib.

https://github.com/zotonic/z_stdlib/blob/master/src/z_string.erl

We are in the process of splitting useful libraries from Zotonic, and z_stdlib is one of them.
Any additions/fixes are welcome.

- Marc

On 21 okt. 2012, at 21:14, Roberto Ostinelli wrote:

> Dear list,

> I've a binary string which includes accented characters and unicode, that i need to downcase.

> Is my real best option here to convert everything to list and downcase that?
> _______________________________________________
> erlang-questions mailing list
> erlang-questi...@erlang.org
> http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
erlang-questi...@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Richard O'Keefe  
View profile  
 More options Oct 25 2012, 11:06 pm
From: "Richard O'Keefe" <o...@cs.otago.ac.nz>
Date: Fri, 26 Oct 2012 16:05:52 +1300
Local: Thurs, Oct 25 2012 11:05 pm
Subject: Re: [erlang-questions] Downcase Accented characters

On 22/10/2012, at 10:44 PM, Marc Worrell wrote:

> When you need to do downcast a subset (most european languages) then you can also check the z_string.erl module in z_stdlib.

> https://github.com/zotonic/z_stdlib/blob/master/src/z_string.erl

> We are in the process of splitting useful libraries from Zotonic, and z_stdlib is one of them.
> Any additions/fixes are welcome.

Is there any reason why trim{,_left,_right}/1 don't strip
leading/trailing NBSP characters?  (Or other Unicode white
space characters above U+0020.)

_______________________________________________
erlang-questions mailing list
erlang-questi...@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »