Downcase Accented characters
The group you are posting to is a
Usenet group . Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
From:
Roberto Ostinelli <robe... @widetag.com>
Date: Sun, 21 Oct 2012 12:14:37 -0700
Local: Sun, Oct 21 2012 3:14 pm
Subject: [erlang-questions] Downcase Accented characters
Dear list,
I've a binary string which includes accented characters and unicode, that i
need to downcase.
Is my real best option here to convert everything to list and downcase that?
_______________________________________________
erlang-questions mailing list
erlang-questi... @erlang.org
http://erlang.org/mailman/listinfo/erlang-questions
You must
Sign in before you can post messages.
You do not have the permission required to post.
From:
Loïc Hoguin <es... @ninenines.eu>
Date: Sun, 21 Oct 2012 21:18:54 +0200
Local: Sun, Oct 21 2012 3:18 pm
Subject: Re: [erlang-questions] Downcase Accented characters
On 10/21/2012 09:14 PM, Roberto Ostinelli wrote:
> Dear list,
> I've a binary string which includes accented characters and unicode,
> that i need to downcase.
> Is my real best option here to convert everything to list and downcase that?
Your current best option is ux_string:to_lower/1 from the ux library which will properly lower all characters, not just A-Z.
Should be at https://github.com/erlang-unicode/ux
-- Loïc Hoguin
Erlang Cowboy
Nine Nines
http://ninenines.eu
_______________________________________________
erlang-questions mailing list
erlang-questi... @erlang.org
http://erlang.org/mailman/listinfo/erlang-questions
You must
Sign in before you can post messages.
You do not have the permission required to post.
From:
Roberto Ostinelli <robe... @widetag.com>
Date: Sun, 21 Oct 2012 12:25:11 -0700
Local: Sun, Oct 21 2012 3:25 pm
Subject: Re: [erlang-questions] Downcase Accented characters
Thank you Loïc,
did you happen to benchmark it? Would that be better/faster than a simple
list_to_binary(string:to_lower(binary_to_list(Bin)))?
On Sun, Oct 21, 2012 at 12:18 PM, Loïc Hoguin <es
... @ninenines.eu> wrote:
> On 10/21/2012 09:14 PM, Roberto Ostinelli wrote:
>> Dear list,
>> I've a binary string which includes accented characters and unicode,
>> that i need to downcase.
>> Is my real best option here to convert everything to list and downcase
>> that?
> Your current best option is ux_string:to_lower/1 from the ux library which
> will properly lower all characters, not just A-Z.
> Should be at https://github.com/erlang-**unicode/ux <https://github.com/erlang-unicode/ux >
> --
> Loďc Hoguin
> Erlang Cowboy
> Nine Nines
> http://ninenines.eu
_______________________________________________
erlang-questions mailing list
erlang-questi... @erlang.org
http://erlang.org/mailman/listinfo/erlang-questions
You must
Sign in before you can post messages.
You do not have the permission required to post.
From:
Loïc Hoguin <es... @ninenines.eu>
Date: Sun, 21 Oct 2012 21:33:25 +0200
Local: Sun, Oct 21 2012 3:33 pm
Subject: Re: [erlang-questions] Downcase Accented characters
For this comparison, ux would be slow and accurate, while your solution would be fast and inaccurate. :) On 10/21/2012 09:25 PM, Roberto Ostinelli wrote:
> Thank you Loïc,
> did you happen to benchmark it? Would that be better/faster than a > simple list_to_binary(string:to_lower(binary_to_list(Bin)))?
> On Sun, Oct 21, 2012 at 12:18 PM, Loïc Hoguin <es... @ninenines.eu > <mailto:es... @ninenines.eu>> wrote:
> On 10/21/2012 09:14 PM, Roberto Ostinelli wrote:
> Dear list,
> I've a binary string which includes accented characters and unicode, > that i need to downcase.
> Is my real best option here to convert everything to list and > downcase that?
> Your current best option is ux_string:to_lower/1 from the ux library > which will properly lower all characters, not just A-Z.
> Should be at https://github.com/erlang-__unicode/ux > <https://github.com/erlang-unicode/ux >
> -- > Loďc Hoguin > Erlang Cowboy > Nine Nines > http://ninenines.eu
-- Loïc Hoguin Erlang Cowboy Nine Nines http://ninenines.eu _______________________________________________ erlang-questions mailing list erlang-questi... @erlang.org http://erlang.org/mailman/listinfo/erlang-questions
You must
Sign in before you can post messages.
You do not have the permission required to post.
From:
Roberto Ostinelli <robe... @widetag.com>
Date: Sun, 21 Oct 2012 12:39:22 -0700
Local: Sun, Oct 21 2012 3:39 pm
Subject: Re: [erlang-questions] Downcase Accented characters
For the records, this just works..
start() ->
Unicode = list_to_binary("∞-HOpe@☺.EXAMple.com/My❤"),
Result = list_to_binary(string:to_lower(binary_to_list(Unicode))),
"∞-hope@☺.example.com/my❤" = binary_to_list(Result).
any downsides I'm not seeing?
On Sun, Oct 21, 2012 at 12:25 PM, Roberto Ostinelli <robe... @widetag.com>wrote:
> Thank you Loïc,
> did you happen to benchmark it? Would that be better/faster than a simple
> list_to_binary(string:to_lower(binary_to_list(Bin)))?
> On Sun, Oct 21, 2012 at 12:18 PM, Loïc Hoguin <es... @ninenines.eu> wrote:
>> On 10/21/2012 09:14 PM, Roberto Ostinelli wrote:
>>> Dear list,
>>> I've a binary string which includes accented characters and unicode,
>>> that i need to downcase.
>>> Is my real best option here to convert everything to list and downcase
>>> that?
>> Your current best option is ux_string:to_lower/1 from the ux library
>> which will properly lower all characters, not just A-Z.
>> Should be at https://github.com/erlang-**unicode/ux <https://github.com/erlang-unicode/ux >
>> --
>> Loďc Hoguin
>> Erlang Cowboy
>> Nine Nines
>> http://ninenines.eu
_______________________________________________
erlang-questions mailing list
erlang-questi... @erlang.org
http://erlang.org/mailman/listinfo/erlang-questions
You must
Sign in before you can post messages.
You do not have the permission required to post.
From:
Roberto Ostinelli <robe... @widetag.com>
Date: Sun, 21 Oct 2012 12:45:21 -0700
Local: Sun, Oct 21 2012 3:45 pm
Subject: Re: [erlang-questions] Downcase Accented characters
BTW,
ux dependencies are unsatisfied:
==> ux (get-deps)
Pulling abnfc from {git,"git://github.com/nygge/abnfc.git","master"}
Cloning into 'abnfc'...
Pulling metamodule from {git,"git://github.com/freeakk/metamodule.git",
"master"}
fatal: remote error:
Repository not found.
Cloning into 'metamodule'...
ERROR: git clone -n git://github.com/freeakk/metamodule.git metamodule
failed with error: 128 and output:
fatal: remote error:
Repository not found.
Cloning into 'metamodule'...
ERROR: 'get-deps' failed while processing
On Sun, Oct 21, 2012 at 12:39 PM, Roberto Ostinelli <robe... @widetag.com>wrote:
> For the records, this just works..
> start() ->
> Unicode = list_to_binary("∞-HOpe@☺.EXAMple.com/My❤"),
> Result = list_to_binary(string:to_lower(binary_to_list(Unicode))),
> "∞-hope@☺.example.com/my❤" = binary_to_list(Result).
> any downsides I'm not seeing?
> On Sun, Oct 21, 2012 at 12:25 PM, Roberto Ostinelli <robe... @widetag.com>wrote:
>> Thank you Loïc,
>> did you happen to benchmark it? Would that be better/faster than a simple
>> list_to_binary(string:to_lower(binary_to_list(Bin)))?
>> On Sun, Oct 21, 2012 at 12:18 PM, Loïc Hoguin <es... @ninenines.eu> wrote:
>>> On 10/21/2012 09:14 PM, Roberto Ostinelli wrote:
>>>> Dear list,
>>>> I've a binary string which includes accented characters and unicode,
>>>> that i need to downcase.
>>>> Is my real best option here to convert everything to list and downcase
>>>> that?
>>> Your current best option is ux_string:to_lower/1 from the ux library
>>> which will properly lower all characters, not just A-Z.
>>> Should be at https://github.com/erlang-**unicode/ux <https://github.com/erlang-unicode/ux >
>>> --
>>> Loďc Hoguin
>>> Erlang Cowboy
>>> Nine Nines
>>> http://ninenines.eu
_______________________________________________
erlang-questions mailing list
erlang-questi... @erlang.org
http://erlang.org/mailman/listinfo/erlang-questions
You must
Sign in before you can post messages.
You do not have the permission required to post.
From:
Loïc Hoguin <es... @ninenines.eu>
Date: Sun, 21 Oct 2012 21:46:17 +0200
Local: Sun, Oct 21 2012 3:46 pm
Subject: Re: [erlang-questions] Downcase Accented characters
This only works for letters found in latin1, not for all the uppercases found in unicode. If that's good enough for you then you don't need ux. :) On 10/21/2012 09:39 PM, Roberto Ostinelli wrote:
> For the records, this just works..
> start() -> > Unicode = list_to_binary("∞-HOpe@☺.EXAMple.com/My❤"), > Result = list_to_binary(string:to_lower(binary_to_list(Unicode))), > "∞-hope@☺.example.com/my <http://example.com/my >❤" = binary_to_list(Result).
> any downsides I'm not seeing?
> On Sun, Oct 21, 2012 at 12:25 PM, Roberto Ostinelli <robe... @widetag.com > <mailto:robe... @widetag.com>> wrote:
> Thank you Loïc,
> did you happen to benchmark it? Would that be better/faster than a > simple list_to_binary(string:to_lower(binary_to_list(Bin)))?
> On Sun, Oct 21, 2012 at 12:18 PM, Loïc Hoguin <es... @ninenines.eu > <mailto:es... @ninenines.eu>> wrote:
> On 10/21/2012 09:14 PM, Roberto Ostinelli wrote:
> Dear list,
> I've a binary string which includes accented characters and > unicode, > that i need to downcase.
> Is my real best option here to convert everything to list > and downcase that?
> Your current best option is ux_string:to_lower/1 from the ux > library which will properly lower all characters, not just A-Z.
> Should be at https://github.com/erlang-__unicode/ux > <https://github.com/erlang-unicode/ux >
> -- > Loďc Hoguin > Erlang Cowboy > Nine Nines > http://ninenines.eu
-- Loïc Hoguin Erlang Cowboy Nine Nines http://ninenines.eu _______________________________________________ erlang-questions mailing list erlang-questi... @erlang.org http://erlang.org/mailman/listinfo/erlang-questions
You must
Sign in before you can post messages.
You do not have the permission required to post.
From:
Roberto Ostinelli <robe... @widetag.com>
Date: Sun, 21 Oct 2012 12:51:46 -0700
Local: Sun, Oct 21 2012 3:51 pm
Subject: Re: [erlang-questions] Downcase Accented characters
Oh I see.
So if I want to downcase this string: "∞-HOpe@☺.ÉXAMple.com/My❤" I will
need ux?
r.
On Sun, Oct 21, 2012 at 12:46 PM, Loïc Hoguin <es
... @ninenines.eu> wrote:
> This only works for letters found in latin1, not for all the uppercases
> found in unicode. If that's good enough for you then you don't need ux. :)
> On 10/21/2012 09:39 PM, Roberto Ostinelli wrote:
>> For the records, this just works..
>> start() ->
>> Unicode = list_to_binary("∞-HOpe@☺.**EXAMple.com/My❤"),
>> Result = list_to_binary(string:to_**lower(binary_to_list(Unicode))**),
>> "∞-hope@☺.example.com/my <http://example.com/my >❤" =
>> binary_to_list(Result).
>> any downsides I'm not seeing?
>> On Sun, Oct 21, 2012 at 12:25 PM, Roberto Ostinelli <robe... @widetag.com
>> <mailto:robe... @widetag.com>> wrote:
>> Thank you Loïc,
>> did you happen to benchmark it? Would that be better/faster than a
>> simple list_to_binary(string:to_**lower(binary_to_list(Bin)))?
>> On Sun, Oct 21, 2012 at 12:18 PM, Loïc Hoguin <es... @ninenines.eu
>> <mailto:es... @ninenines.eu>> wrote:
>> On 10/21/2012 09:14 PM, Roberto Ostinelli wrote:
>> Dear list,
>> I've a binary string which includes accented characters and
>> unicode,
>> that i need to downcase.
>> Is my real best option here to convert everything to list
>> and downcase that?
>> Your current best option is ux_string:to_lower/1 from the ux
>> library which will properly lower all characters, not just A-Z.
>> Should be at https://github.com/erlang-__**unicode/ux <https://github.com/erlang-__unicode/ux >
>> <https://github.com/erlang-**unicode/ux <https://github.com/erlang-unicode/ux >
>> --
>> Loďc Hoguin
>> Erlang Cowboy
>> Nine Nines
>> http://ninenines.eu
> --
> Loïc Hoguin
> Erlang Cowboy
> Nine Nines
> http://ninenines.eu
_______________________________________________
erlang-questions mailing list
erlang-questi... @erlang.org
http://erlang.org/mailman/listinfo/erlang-questions
You must
Sign in before you can post messages.
You do not have the permission required to post.
From:
Loïc Hoguin <es... @ninenines.eu>
Date: Sun, 21 Oct 2012 22:00:15 +0200
Local: Sun, Oct 21 2012 4:00 pm
Subject: Re: [erlang-questions] Downcase Accented characters
Yes and no, this example would still work I think? I'm no expert on how Erlang deals with unicode, I just know what string:to_lower/1 does. :) On 10/21/2012 09:51 PM, Roberto Ostinelli wrote:
> Oh I see.
> So if I want to downcase this string: "∞-HOpe@☺.ÉXAMple.com/My❤" I will > need ux?
> r.
> On Sun, Oct 21, 2012 at 12:46 PM, Loïc Hoguin <es... @ninenines.eu > <mailto:es... @ninenines.eu>> wrote:
> This only works for letters found in latin1, not for all the > uppercases found in unicode. If that's good enough for you then you > don't need ux. :)
> On 10/21/2012 09:39 PM, Roberto Ostinelli wrote:
> For the records, this just works..
> start() -> > Unicode = list_to_binary("∞-HOpe@☺.__EXAMple.com/My❤"), > Result = > list_to_binary(string:to___lower(binary_to_list(Unicode))__), > "∞-hope@☺.example.com/my <http://example.com/my > > <http://example.com/my >❤" = binary_to_list(Result).
> any downsides I'm not seeing?
> On Sun, Oct 21, 2012 at 12:25 PM, Roberto Ostinelli > <robe... @widetag.com <mailto:robe... @widetag.com> > <mailto:robe... @widetag.com <mailto:robe... @widetag.com>>> wrote:
> Thank you Loïc,
> did you happen to benchmark it? Would that be better/faster > than a > simple list_to_binary(string:to___lower(binary_to_list(Bin)))?
> On Sun, Oct 21, 2012 at 12:18 PM, Loïc Hoguin > <es... @ninenines.eu <mailto:es... @ninenines.eu> > <mailto:es... @ninenines.eu <mailto:es... @ninenines.eu>>> wrote:
> On 10/21/2012 09:14 PM, Roberto Ostinelli wrote:
> Dear list,
> I've a binary string which includes accented > characters and > unicode, > that i need to downcase.
> Is my real best option here to convert everything > to list > and downcase that?
> Your current best option is ux_string:to_lower/1 from > the ux > library which will properly lower all characters, not > just A-Z.
> Should be at https://github.com/erlang-____unicode/ux > <https://github.com/erlang-__unicode/ux >
> <https://github.com/erlang-__unicode/ux > <https://github.com/erlang-unicode/ux >>
> -- > Loďc Hoguin > Erlang Cowboy > Nine Nines > http://ninenines.eu
> -- > Loïc Hoguin
> Erlang Cowboy > Nine Nines > http://ninenines.eu
-- Loïc Hoguin Erlang Cowboy Nine Nines http://ninenines.eu _______________________________________________ erlang-questions mailing list erlang-questi... @erlang.org http://erlang.org/mailman/listinfo/erlang-questions
You must
Sign in before you can post messages.
You do not have the permission required to post.
From:
Roberto Ostinelli <robe... @widetag.com>
Date: Sun, 21 Oct 2012 13:10:27 -0700
Local: Sun, Oct 21 2012 4:10 pm
Subject: Re: [erlang-questions] Downcase Accented characters
ok, thank you :)
r.
On Sun, Oct 21, 2012 at 1:00 PM, Loïc Hoguin <es
... @ninenines.eu> wrote:
> Yes and no, this example would still work I think? I'm no expert on how
> Erlang deals with unicode, I just know what string:to_lower/1 does. :)
_______________________________________________
erlang-questions mailing list
erlang-questi... @erlang.org
http://erlang.org/mailman/listinfo/erlang-questions
You must
Sign in before you can post messages.
You do not have the permission required to post.
From:
"Thomas Allen" <tho... @oinksoft.com>
Date: Sun, 21 Oct 2012 16:12:09 -0400
Local: Sun, Oct 21 2012 4:12 pm
Subject: Re: [erlang-questions] Downcase Accented characters
On Sun, October 21, 2012 3:39 pm, Roberto Ostinelli wrote:
> For the records, this just works..
> start() ->
> Unicode = list_to_binary("∞-HOpe@☺.EXAMple.com/My❤"),
> Result = list_to_binary(string:to_lower(binary_to_list(Unicode))),
> "∞-hope@☺.example.com/my❤" = binary_to_list(Result).
> any downsides I'm not seeing?
For what it's worth,
1> list_to_binary("∞-HOpe@☺.EXAMple.com/My❤").
** exception error: bad argument
in function list_to_binary/1
called as
list_to_binary([8734,45,72,79,112,101,64,9786,46,69,88,65,77,112,108,101,
46,99,111,109,47,77,121,10084])
I get that on my system if any of the special characters (∞,
☺, ❤) are present (R15B02 on Debian 6.0.6 and OSX 10.7.2,
both built from source). So you might need to be careful with that
technique.
Thomas Allen
_______________________________________________
erlang-questions mailing list
erlang-questi... @erlang.org
http://erlang.org/mailman/listinfo/erlang-questions
You must
Sign in before you can post messages.
You do not have the permission required to post.
From:
Michael Uvarov <free... @gmail.com>
Date: Mon, 22 Oct 2012 00:12:52 +0400
Local: Sun, Oct 21 2012 4:12 pm
Subject: Re: [erlang-questions] Downcase Accented characters
You must
Sign in before you can post messages.
You do not have the permission required to post.
From:
Michael Uvarov <free... @gmail.com>
Date: Mon, 22 Oct 2012 00:16:37 +0400
Local: Sun, Oct 21 2012 4:16 pm
Subject: Re: [erlang-questions] Downcase Accented characters
list_to_binary([8734,45,72,79,112,101,64,9786,46,69,88,65,77,112,108,101,
46,99,111,109,47,77,121,10084])
It works only for elements from 1 to 255.
Use unicode:characters_to_binary/1 instead.
If ux is slow, than try i18n (it is NIFs for ICU).
_______________________________________________
erlang-questions mailing list
erlang-questi... @erlang.org
http://erlang.org/mailman/listinfo/erlang-questions
You must
Sign in before you can post messages.
You do not have the permission required to post.
From:
Yurii Rashkovskii <yra... @gmail.com>
Date: Sun, 21 Oct 2012 15:02:36 -0700 (PDT)
Local: Sun, Oct 21 2012 6:02 pm
Subject: Re: [erlang-questions] Downcase Accented characters
Roberto,
You might be able to achieve what you need by using one isolated bit of Elixir's distribution — String.Unicode module.
it is compiled right off UnicodeData.txt so it has all the necessary data embedded, thus saving you from talking to gen_servers or ETS tables.
And the best part is that you don't really need Elixir itself to be able to use it:
You can edit https://github.com/elixir-lang/elixir/blob/master/lib/elixir/priv/uni... and rename it from String.Unicode to, say, :string_unicode (to be visually native to erlang's code) and after the compilation you'll get a beam file you can use independently from Elixir because it doesn't use anything from the elixir application.
Hope you'll find this helpful.
On Sunday, October 21, 2012 12:14:47 PM UTC-7, Roberto Ostinelli wrote:
> Dear list,
> I've a binary string which includes accented characters and unicode, that > i need to downcase.
> Is my real best option here to convert everything to list and downcase > that?
_______________________________________________
erlang-questions mailing list
erlang-questi... @erlang.org
http://erlang.org/mailman/listinfo/erlang-questions
You must
Sign in before you can post messages.
You do not have the permission required to post.
From:
Marc Worrell <m... @worrell.nl>
Date: Mon, 22 Oct 2012 11:44:59 +0200
Local: Mon, Oct 22 2012 5:44 am
Subject: Re: [erlang-questions] Downcase Accented characters
When you need to do downcast a subset (most european languages) then you can also check the z_string.erl module in z_stdlib.
https://github.com/zotonic/z_stdlib/blob/master/src/z_string.erl
We are in the process of splitting useful libraries from Zotonic, and z_stdlib is one of them.
Any additions/fixes are welcome.
- Marc
On 21 okt. 2012, at 21:14, Roberto Ostinelli wrote:
> Dear list,
> I've a binary string which includes accented characters and unicode, that i need to downcase.
> Is my real best option here to convert everything to list and downcase that?
> _______________________________________________
> erlang-questions mailing list
> erlang-questi... @erlang.org
> http://erlang.org/mailman/listinfo/erlang-questions
_______________________________________________
erlang-questions mailing list
erlang-questi... @erlang.org
http://erlang.org/mailman/listinfo/erlang-questions
You must
Sign in before you can post messages.
You do not have the permission required to post.
From:
"Richard O'Keefe" <o... @cs.otago.ac.nz>
Date: Fri, 26 Oct 2012 16:05:52 +1300
Local: Thurs, Oct 25 2012 11:05 pm
Subject: Re: [erlang-questions] Downcase Accented characters
On 22/10/2012, at 10:44 PM, Marc Worrell wrote:
> When you need to do downcast a subset (most european languages) then you can also check the z_string.erl module in z_stdlib.
> https://github.com/zotonic/z_stdlib/blob/master/src/z_string.erl
> We are in the process of splitting useful libraries from Zotonic, and z_stdlib is one of them.
> Any additions/fixes are welcome.
Is there any reason why trim{,_left,_right}/1 don't strip
leading/trailing NBSP characters? (Or other Unicode white
space characters above U+0020.)
_______________________________________________
erlang-questions mailing list
erlang-questi... @erlang.org
http://erlang.org/mailman/listinfo/erlang-questions
You must
Sign in before you can post messages.
You do not have the permission required to post.