Message from discussion
unicode in string literals
Received: by 10.205.126.4 with SMTP id gu4mr2078451bkc.8.1343653607567;
Mon, 30 Jul 2012 06:06:47 -0700 (PDT)
X-BeenThere: erlang-programming@googlegroups.com
Received: by 10.204.7.203 with SMTP id e11ls4539699bke.8.gmail; Mon, 30 Jul
2012 06:06:47 -0700 (PDT)
Received: by 10.204.151.213 with SMTP id d21mr2084019bkw.0.1343653607166;
Mon, 30 Jul 2012 06:06:47 -0700 (PDT)
Received: by 10.204.151.213 with SMTP id d21mr2084018bkw.0.1343653607145;
Mon, 30 Jul 2012 06:06:47 -0700 (PDT)
Return-Path: <erlang-questions-boun...@erlang.org>
Received: from hades.cslab.ericsson.net (hades.cslab.ericsson.net. [192.121.151.104])
by gmr-mx.google.com with ESMTP id j4si2934592bkj.3.2012.07.30.06.06.47;
Mon, 30 Jul 2012 06:06:47 -0700 (PDT)
Received-SPF: pass (google.com: domain of erlang-questions-boun...@erlang.org designates 192.121.151.104 as permitted sender) client-ip=192.121.151.104;
Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of erlang-questions-boun...@erlang.org designates 192.121.151.104 as permitted sender) smtp.mail=erlang-questions-boun...@erlang.org; dkim=neutral (body hash did not verify) header...@gmail.com
Received: from hades.cslab.ericsson.net (hades [192.121.151.104])
by hades.cslab.ericsson.net (Postfix) with ESMTP id 95CEB5C0A5;
Mon, 30 Jul 2012 15:06:41 +0200 (CEST)
X-Original-To: erlang-questi...@erlang.org
Delivered-To: erlang-questi...@erlang.org
Received: from mail-qa0-f46.google.com (mail-qa0-f46.google.com
[209.85.216.46])
by hades.cslab.ericsson.net (Postfix) with ESMTP id 5ED3E5C00C
for <erlang-questi...@erlang.org>; Mon, 30 Jul 2012 15:06:39 +0200 (CEST)
Received: by qadb17 with SMTP id b17so835349qad.19
for <erlang-questi...@erlang.org>; Mon, 30 Jul 2012 06:06:38 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
h=mime-version:in-reply-to:references:date:message-id:subject:from:to
:cc:content-type;
bh=0WJ0WPRjm48Ll1SuP0A4IjeXskStYA2GlrHr+1rdBbo=;
b=O+8gXE4eob3xiij7CpiI9yVD6ahV66PHpcJzqay/MDiVJkGUKe9XvATRkYM5ue6jgp
b9TJPKASN67CwP7C1AjBc7ikB0FeRIfbVxVor5N5aK8aqgRnZcpVxn/9RA4gKAD10Dz1
aQ894R5yn0g+XdYusHmALqvzqQ6XjT2S2pGxssSVSII34OuRVw3Q+kI0VdcwWrMCLSFQ
Xqt/QTTZdmBpqnhIez/smA9xEHJKzeEV6N5aLD70yquH3V03FV84eQHcGbJczaDk1MIt
6tSApMZLWMF0yJKUpOTG9M9wrUK/QPzE6tQLaXJfcoc+CKDU0jXvPTRXM/j7+QJtCxf0
uTAg==
MIME-Version: 1.0
Received: by 10.60.30.35 with SMTP id p3mr17451329oeh.16.1343653598531; Mon,
30 Jul 2012 06:06:38 -0700 (PDT)
Received: by 10.182.81.38 with HTTP; Mon, 30 Jul 2012 06:06:38 -0700 (PDT)
In-Reply-To: <CAANBt-pPC1Tscvekyf++-u7r43Nub7skO1C_N=+3Dn03wwT...@mail.gmail.com>
References: <CAANBt-pPC1Tscvekyf++-u7r43Nub7skO1C_N=+3Dn03wwT...@mail.gmail.com>
Date: Mon, 30 Jul 2012 15:06:38 +0200
Message-ID: <CAKrexV+D8WdibhyEMdzxXjXpois816-S2n9_eo4YA5LTu8m...@mail.gmail.com>
From: CGS <cgsmcml...@gmail.com>
To: Joe Armstrong <erl...@gmail.com>
Cc: Erlang <erlang-questi...@erlang.org>
Subject: Re: [erlang-questions] unicode in string literals
X-BeenThere: erlang-questi...@erlang.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: General Erlang/OTP discussions <erlang-questions.erlang.org>
List-Unsubscribe: <http://erlang.org/mailman/options/erlang-questions>,
<mailto:erlang-questions-requ...@erlang.org?subject=unsubscribe>
List-Archive: <http://erlang.org/pipermail/erlang-questions>
List-Post: <mailto:erlang-questi...@erlang.org>
List-Help: <mailto:erlang-questions-requ...@erlang.org?subject=help>
List-Subscribe: <http://erlang.org/mailman/listinfo/erlang-questions>,
<mailto:erlang-questions-requ...@erlang.org?subject=subscribe>
Content-Type: multipart/mixed; boundary="===============6893848118929145252=="
Errors-To: erlang-questions-boun...@erlang.org
Sender: erlang-questions-boun...@erlang.org
--===============6893848118929145252==
Content-Type: multipart/alternative; boundary=e89a8ff25652710e4e04c60bbbb2
--e89a8ff25652710e4e04c60bbbb2
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Hi Joe,
You may try unicode module:
test() -> unicode:characters_to_list("a=E2=88=9Eb",utf8).
which will return the desired list [97,8734,98]. As Richard said, the
default is Latin-1 (0-255 integers).
As for binaries, the same problem (assuming Latin-1).
CGS
On Mon, Jul 30, 2012 at 2:35 PM, Joe Armstrong <erl...@gmail.com> wrote:
> What is a literal string in Erlang? Originally it was a list of
> integers, each integer
> being a single character code - this made strings very easy to work with
>
> The code
>
> test() -> "a=E2=88=9Eb".
>
> Compiles to code which returns the list
> of integers [97,226,136,158,98].
>
> This is very inconvenient. I had expected it to return
> [97, 8734, 98]. The length of the list should be 3 not 5
> since it contains three unicode characters not five.
>
> Is this a bug or a horrible misfeature?
>
> So how can I make a string with the three characters 'a' 'infinity' 'b'
>
> test() -> "a\x{221e}b" is ugly
>
> test() -> <<"a=E2=88=9Eb"/utf8>> seems to be a bug
> it gives an error in the
> shell but is ok in compiled code and
> returns
> <<97,195,162,194,136,194,158,98>> which is
> very strange
>
> test() -> [$a,8734,$b] is ugly
>
> /Joe
> _______________________________________________
> erlang-questions mailing list
> erlang-questi...@erlang.org
> http://erlang.org/mailman/listinfo/erlang-questions
>
--e89a8ff25652710e4e04c60bbbb2
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Hi Joe,<div><br></div><div>You may try unicode module:</div><div><br></div>=
<div>test() -> unicode:characters_to_list("a=E2=88=9Eb",utf8).=
</div><div><br></div><div>which will return the desired list=C2=A0[97,8734,=
98]. As Richard said, the default is Latin-1 (0-255 integers).</div>
<div><br></div><div>As for binaries, the same problem (assuming Latin-1).</=
div><div><br></div><div>CGS</div><div><br></div><div><br></div><div><br><br=
><div class=3D"gmail_quote">On Mon, Jul 30, 2012 at 2:35 PM, Joe Armstrong =
<span dir=3D"ltr"><<a href=3D"mailto:erl...@gmail.com" target=3D"_blank"=
>erl...@gmail.com</a>></span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">What is a literal string in Erlang? Original=
ly it was a list of<br>
integers, each integer<br>
being a single character code - this made strings very easy to work with<br=
>
<br>
The code<br>
<br>
=C2=A0 =C2=A0 test() -> "a=E2=88=9Eb".<br>
<br>
Compiles to code which returns the list<br>
of integers [97,226,136,158,98].<br>
<br>
This is very inconvenient. I had expected it to return<br>
[97, 8734, 98]. The length of the list should be 3 not 5<br>
since it contains three unicode characters not five.<br>
<br>
Is this a bug or a horrible misfeature?<br>
<br>
So how can I make a string with the three characters 'a' 'infin=
ity' 'b'<br>
<br>
test() -> "a\x{221e}b" =C2=A0 =C2=A0 =C2=A0 =C2=A0is ugly<br>
<br>
test() -> <<"a=E2=88=9Eb"/utf8>> =C2=A0 seems to b=
e a bug<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 it gives an error in the<br>
shell but is ok in compiled code and<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 returns<br>
<<97,195,162,194,136,194,158,98>> which is<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 very strange<br>
<br>
test() -> [$a,8734,$b] =C2=A0 =C2=A0 =C2=A0 is ugly<br>
<br>
/Joe<br>
_______________________________________________<br>
erlang-questions mailing list<br>
<a href=3D"mailto:erlang-questi...@erlang.org">erlang-questi...@erlang.org<=
/a><br>
<a href=3D"http://erlang.org/mailman/listinfo/erlang-questions" target=3D"_=
blank">http://erlang.org/mailman/listinfo/erlang-questions</a><br>
</blockquote></div><br></div>
--e89a8ff25652710e4e04c60bbbb2--
--===============6893848118929145252==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
_______________________________________________
erlang-questions mailing list
erlang-questi...@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions
--===============6893848118929145252==--