Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

\xA0 vs \x{A0} in strings and Unicode

0 views

Skip to first unread message

Vadim Konovalov

unread,

Aug 10, 2005, 2:20:20 AM8/10/05

to perl5-...@perl.org

Recently I thought that strings "\xA0" and "\x{A0}" will be interpreted
differently, namely last one will be interpreted as Unicode (U+00A0)

Quick check with
perl -MDevel::Peek -we "print Dump qq/\xa0/,Dump qq/\x{a0}/"
shows that "\xA0" and "\x{A0}" both interpreted as not-Unicode.

However, this is untrue for perl version 5.6.x, and behaviour of 5.6.x
version is exactly as I described -- \x{A0} for Unicode and \xA0 for
ordinary chars within strings, and that leads to my confusion.

I know that 5.6.x has weak Unicode support, but such kind of parsing seems
to me quite reasonable.

Does anyone knows why this changed, and why it is a bad idea to treat string
as Unicode once it has *any* \x{...} chars inside, and not only above 0x100,
as currently documentation says?

Honestly, I thought of this \x{...} parsing as a quite stable behaviour and
could even rely on that in my programs.

Thanks in advance,
Vadim.

h...@crypt.org

unread,

Aug 10, 2005, 8:28:41 AM8/10/05

to Konovalov, Vadim, perl5-...@perl.org

"Konovalov, Vadim" <vkono...@spb.lucent.com> wrote:
:Recently I thought that strings "\xA0" and "\x{A0}" will be interpreted

:differently, namely last one will be interpreted as Unicode (U+00A0)
:
:Quick check with
: perl -MDevel::Peek -we "print Dump qq/\xa0/,Dump qq/\x{a0}/"
:shows that "\xA0" and "\x{A0}" both interpreted as not-Unicode.

To me, when not using it to incorporate otherwise inaccesible characters,
the "\x{A0}" mechanism is useful primarily for disambiguation.

I use it regularly when I think it can contribute to clearer code, and in
such use it is directly analagous to the use of braces for disambiguation
of variables, as in "the ${word}s", or "${scalar}[0]".

Hugo

0 new messages