Received: by 10.204.149.65 with SMTP id s1mr1036648bkv.3.1350849114340; Sun, 21 Oct 2012 12:51:54 -0700 (PDT) X-BeenThere: erlang-programming@googlegroups.com Received: by 10.205.124.16 with SMTP id gm16ls3223519bkc.7.gmail; Sun, 21 Oct 2012 12:51:54 -0700 (PDT) Received: by 10.204.4.211 with SMTP id 19mr1037173bks.5.1350849113977; Sun, 21 Oct 2012 12:51:53 -0700 (PDT) Received: by 10.204.4.211 with SMTP id 19mr1037172bks.5.1350849113938; Sun, 21 Oct 2012 12:51:53 -0700 (PDT) Return-Path: Received: from hades.cslab.ericsson.net (hades.cslab.ericsson.net. [192.121.151.104]) by gmr-mx.google.com with ESMTP id 27si832448bks.3.2012.10.21.12.51.53; Sun, 21 Oct 2012 12:51:53 -0700 (PDT) Received-SPF: pass (google.com: domain of erlang-questions-boun...@erlang.org designates 192.121.151.104 as permitted sender) client-ip=192.121.151.104; Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of erlang-questions-boun...@erlang.org designates 192.121.151.104 as permitted sender) smtp.mail=erlang-questions-boun...@erlang.org Received: from hades.cslab.ericsson.net (hades [192.121.151.104]) by hades.cslab.ericsson.net (Postfix) with ESMTP id 018B15C193; Sun, 21 Oct 2012 21:51:49 +0200 (CEST) X-Original-To: erlang-questi...@erlang.org Delivered-To: erlang-questi...@erlang.org Received: from mail-la0-f53.google.com (mail-la0-f53.google.com [209.85.215.53]) by hades.cslab.ericsson.net (Postfix) with ESMTP id 39F155C006 for ; Sun, 21 Oct 2012 21:51:47 +0200 (CEST) Received: by mail-la0-f53.google.com with SMTP id l5so1164405lah.40 for ; Sun, 21 Oct 2012 12:51:47 -0700 (PDT) d=google.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :x-gm-message-state; bh=Q9aOfHU8Oq7edBnhePXxGK+yM4Rzzz5Sp7W6SB+LdCE=; b=CMTMJMFFlqhoAsyLc5yA0i8MYrYWt/FYfoQXiY7Rv41j2WeBfMeu8M1ZXFiXBMF3x+ zihLHnn4qiH+ll6c4jbkO/S2goHnFIbjGdTyfzuqTtn6jj8jRxkWZC5txAGqJugZLCTG abHZVT5A5gdrwvU78bfYjCuNl7QMkh32yIPpfVgTT8543249cXkpphfOtmzCqaA7Keim q1ZwK30y5L0Rr0e7iEGT8dm0pgtJ1cBurPidVreN1TLS0k0XyLLZ3IglOFLeZNU6oyki eUWmlYBtV9dznD+0vZwQLdKZGPpbK8eYz3e/ivWgiPcfXhzhCdEST+YDpCo5eWVc7rbW Qleg== MIME-Version: 1.0 Received: by 10.152.103.38 with SMTP id ft6mr6233202lab.40.1350849106821; Sun, 21 Oct 2012 12:51:46 -0700 (PDT) Received: by 10.152.24.226 with HTTP; Sun, 21 Oct 2012 12:51:46 -0700 (PDT) In-Reply-To: <50845109.6090...@ninenines.eu> References: <50844A9E.5040...@ninenines.eu> <50845109.6090...@ninenines.eu> Date: Sun, 21 Oct 2012 12:51:46 -0700 Message-ID: From: Roberto Ostinelli To: =?ISO-8859-1?Q?Lo=EFc_Hoguin?= X-Gm-Message-State: ALoCoQkK3uDeExP99vCkUWS3vbS4xvtFEWFvJ0pzcID/OLrQ5wvUO6xb9P9fQ94tGXNr6FckxSwk Cc: Erlang Subject: Re: [erlang-questions] Downcase Accented characters X-BeenThere: erlang-questi...@erlang.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: General Erlang/OTP discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: multipart/mixed; boundary="===============8538583769671361615==" Errors-To: erlang-questions-boun...@erlang.org Sender: erlang-questions-boun...@erlang.org --===============8538583769671361615== Content-Type: multipart/alternative; boundary=f46d040715fd2842f504cc9711b3 --f46d040715fd2842f504cc9711b3 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Oh I see. So if I want to downcase this string: "=E2=88=9E-HOpe@=E2=98=BA.=C3=89XAMpl= e.com/My=E2=9D=A4" I will need ux? r. On Sun, Oct 21, 2012 at 12:46 PM, Lo=C3=AFc Hoguin wro= te: > This only works for letters found in latin1, not for all the uppercases > found in unicode. If that's good enough for you then you don't need ux. := ) > > > On 10/21/2012 09:39 PM, Roberto Ostinelli wrote: > >> For the records, this just works.. >> >> start() -> >> Unicode =3D list_to_binary("=E2=88=9E-HOpe@=E2=98=BA.**EXAMple.com/My=E2= =9D=A4"), >> Result =3D list_to_binary(string:to_**lower(binary_to_list(Unicode))**), >> "=E2=88=9E-hope@=E2=98=BA.example.com/my =E2=9D= =A4" =3D >> binary_to_list(Result). >> >> >> any downsides I'm not seeing? >> >> On Sun, Oct 21, 2012 at 12:25 PM, Roberto Ostinelli > > wrote: >> >> Thank you Lo=C3=AFc, >> >> did you happen to benchmark it? Would that be better/faster than a >> simple list_to_binary(string:to_**lower(binary_to_list(Bin)))? >> >> >> On Sun, Oct 21, 2012 at 12:18 PM, Lo=C3=AFc Hoguin > > wrote: >> >> On 10/21/2012 09:14 PM, Roberto Ostinelli wrote: >> >> Dear list, >> >> I've a binary string which includes accented characters and >> unicode, >> that i need to downcase. >> >> Is my real best option here to convert everything to list >> and downcase that? >> >> >> Your current best option is ux_string:to_lower/1 from the ux >> library which will properly lower all characters, not just A-Z. >> >> Should be at https://github.com/erlang-__**unicode/ux >> >> >> > >> >> -- >> Lo=C4=8Fc Hoguin >> Erlang Cowboy >> Nine Nines >> http://ninenines.eu >> >> >> >> > > -- > Lo=C3=AFc Hoguin > > Erlang Cowboy > Nine Nines > http://ninenines.eu > --f46d040715fd2842f504cc9711b3 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Oh I see.

So if I want to downcase this string:=C2=A0"=E2=88=9E-HOpe@=E2=98=BA.=C3=89XAM= ple.com/My=E2=9D=A4" I will need ux?

r.

On Sun, Oct 21, 2012 at 12:46 PM, Lo=C3=AFc Hoguin= <es...@ninenines.eu> wrote:
This only works for letters found in latin1,= not for all the uppercases found in unicode. If that's good enough for= you then you don't need ux. :)


On 10/21/2012 09:39 PM, Roberto Ostinelli wrote:
For the records, this just works..

start() ->
Unicode =3D list_to_binary("=E2=88=9E-HOpe@=E2=98=BA.EXAMple.co= m/My=E2=9D=A4"),
Result =3D list_to_binary(string:to_lower(binary_to_list(Unicode))),
"=E2=88=9E-hope@=E2=98=BA.example.com/my <http://example.com/my>=E2=9D=A4" =3D binary_to_list(Res= ult).


any downsides I'm not seeing?

On Sun, Oct 21, 2012 at 12:25 PM, Roberto Ostinelli <robe...@widetag.com
<mailto:roberto= @widetag.com>> wrote:

=C2=A0 =C2=A0 Thank you Lo=C3=AFc,

=C2=A0 =C2=A0 did you happen to benchmark it? Would that be better/faster t= han a
=C2=A0 =C2=A0 simple list_to_binary(string:to_lower(binary_to_list(B= in)))?


=C2=A0 =C2=A0 On Sun, Oct 21, 2012 at 12:18 PM, Lo=C3=AFc Hoguin <es...@ninenines.eu
=C2=A0 =C2=A0 <mailto:es...@ninenines.eu>> wrote:

=C2=A0 =C2=A0 =C2=A0 =C2=A0 On 10/21/2012 09:14 PM, Roberto Ostinelli wrote= :

=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Dear list,

=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 I've a binary string which in= cludes accented characters and
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 unicode,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 that i need to downcase.

=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Is my real best option here to co= nvert everything to list
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 and downcase that?


=C2=A0 =C2=A0 =C2=A0 =C2=A0 Your current best option is ux_string:to_lower/= 1 from the ux
=C2=A0 =C2=A0 =C2=A0 =C2=A0 library which will properly lower all character= s, not just A-Z.

=C2=A0 =C2=A0 =C2=A0 =C2=A0 Should be at https://github.com/erlang-__unico= de/ux

=C2=A0 =C2=A0 =C2=A0 =C2=A0 <https://github.com/erlang-unicode/ux>=

=C2=A0 =C2=A0 =C2=A0 =C2=A0 --
=C2=A0 =C2=A0 =C2=A0 =C2=A0 Lo=C4=8Fc Hoguin
=C2=A0 =C2=A0 =C2=A0 =C2=A0 Erlang Cowboy
=C2=A0 =C2=A0 =C2=A0 =C2=A0 Nine Nines
=C2=A0 =C2=A0 =C2=A0 =C2=A0 http://ninenines.eu





--
Lo=C3=AFc Hoguin

Erlang Cowboy
Nine Nines
http://ninenines.eu

--f46d040715fd2842f504cc9711b3-- --===============8538583769671361615== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ erlang-questions mailing list erlang-questi...@erlang.org http://erlang.org/mailman/listinfo/erlang-questions --===============8538583769671361615==--