Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Bytes make no sense on text strings

5 views
Skip to first unread message

Juerd

unread,
Oct 9, 2006, 5:40:09 PM10/9/06
to perl6-l...@perl.org
I don't understand why having :bytes for things like s/// would be a
good thing.

A Str doesn't have bytes, just like how a Buf doesn't have characters.

To get bytes out of a Str, you need an encoding. There will be an
internal encoding, but exposing it in this way is probably just asking
for a lot of trouble: inconsistent (invalid) data that internals rely
on, and the inability to switch the internal encoding later. Or, for
example, to keep things 8bit encoded as an optimization until something
demands more than that.

As I understand it, a Str is a unicode string, not a UTF-8 string.

I propose that using :bytes on a text string throws an exception.
--
korajn salutojn,

juerd waalboer: perl hacker <ju...@juerd.nl> <http://juerd.nl/sig>
convolution: ict solutions and consultancy <sa...@convolution.nl>

Ik vertrouw stemcomputers niet.
Zie <http://www.wijvertrouwenstemcomputersniet.nl/>.

Larry Wall

unread,
Oct 9, 2006, 6:02:06 PM10/9/06
to perl6-l...@perl.org
On Mon, Oct 09, 2006 at 11:40:09PM +0200, Juerd wrote:
: I don't understand why having :bytes for things like s/// would be a

: good thing.
:
: A Str doesn't have bytes, just like how a Buf doesn't have characters.
:
: To get bytes out of a Str, you need an encoding. There will be an
: internal encoding, but exposing it in this way is probably just asking
: for a lot of trouble: inconsistent (invalid) data that internals rely
: on, and the inability to switch the internal encoding later. Or, for
: example, to keep things 8bit encoded as an optimization until something
: demands more than that.
:
: As I understand it, a Str is a unicode string, not a UTF-8 string.

A string object is allowed to present multiple interfaces at different
abstraction levels. If a string object allows multiple abstraction levels
it is part of the object's job in life to keep those abstraction levels
in sync with each other. This is one of the reasons it's Very Important
that string positions be considered opaque abstractions and not numbers.

: I propose that using :bytes on a text string throws an exception.

It will if the string in question doesn't support the bytes abstraction
level, which might well be most of them.

Larry

0 new messages