Raw bytes in perl6

44 views
Skip to first unread message

David Formosa )

unread,
Jul 12, 2005, 12:53:49 AM7/12/05
to perl6-l...@perl.org
How do we intend to manipulate raw binary in Perl6? Perl5's use
bytes; pragma is rather poor (forcing all strings to be raw in its
scope or requiring do {use bytes; ...} type tricks to deal with them)
and now Perl6 has real typing perhaps it would be more usefull to have
a bytestring type (or and octect string type) that doesn't get
utf8ed.

--
Please excuse my spelling as I suffer from agraphia. See
http://dformosa.zeta.org.au/~dformosa/Spelling.html to find out more.
Free the Memes.

Larry Wall

unread,
Jul 12, 2005, 4:55:56 PM7/12/05
to perl6-l...@perl.org
On Tue, Jul 12, 2005 at 04:53:49AM -0000, David Formosa (aka ? the Platypus) wrote:
: How do we intend to manipulate raw binary in Perl6? Perl5's use

: bytes; pragma is rather poor (forcing all strings to be raw in its
: scope or requiring do {use bytes; ...} type tricks to deal with them)
: and now Perl6 has real typing perhaps it would be more usefull to have
: a bytestring type (or and octect string type) that doesn't get
: utf8ed.

I've said that string types will be allowed to specify a minimum and
maximum abstraction level. Byte strings merely specify a maximum
abstraction level of bytes, and then any code that looks at it as
codepoints, graphemes, or characters will only see values in the
range of 0..255, and any attempt to store a character larger than
255 into such a string will fail.

On the other extreme we can have abstract string types that encapsulate
their representation, so that you're allowed to deal with them as
graphemes or characters, but not get at their "true" representation,
so you can't tell whether they're stored in UTF-8, UTF-32, UTF-EBCDIC,
ASN.1, ISO-2022-jp, or Morse Code.

As for naming string types, perhaps the main Str type can be parameterized.

my ::bitstr ::= Str of bit;
my ::bytestr ::= Str of uint8;
my ::codestr ::= Str of Code;

On the other hand, if the basic Str type is unwilling to take on the
burden of being parameterized, we could generate all our funny string
types by mapping a string name to an array declaration.

my Str $foo is Array of byte;

or some such. But maybe we can make Str of byte mean that by way
of shorthand.

Larry

Yuval Kogman

unread,
Jul 12, 2005, 5:46:49 PM7/12/05
to perl6-l...@perl.org
On Tue, Jul 12, 2005 at 13:55:56 -0700, Larry Wall wrote:

> On the other hand, if the basic Str type is unwilling to take on the
> burden of being parameterized, we could generate all our funny string
> types by mapping a string name to an array declaration.
>
> my Str $foo is Array of byte;
>
> or some such. But maybe we can make Str of byte mean that by way
> of shorthand

If this means that the string role, composed with the array role is
just a way to apply a bunch of really cool operations (rules,
substringing, composition, conversion) onto a stream of things that
know to do the Char role, can we have monads too? ;-)

Seriously though, haskell's way of treating strings as lists make
strings useful in a totally different way than perl5 makes them
useful, and I'd like to have both.

Perhaps the most interesting aspect of the string-is-a-list mindset
is that Parsec can parse any list of crap into any structured crap.
It's only affinity towards real strings and characters is the
builtin library of useful rules.

--
() Yuval Kogman <nothi...@woobling.org> 0xEBD27418 perl hacker &
/\ kung foo master: /me has realultimatepower.net: neeyah!!!!!!!!!!!!

Yuval Kogman

unread,
Jul 12, 2005, 8:18:54 PM7/12/05
to perl6-l...@perl.org
On Wed, Jul 13, 2005 at 00:46:49 +0300, Yuval Kogman wrote:

> Perhaps the most interesting aspect of the string-is-a-list mindset
> is that Parsec can parse any list of crap into any structured crap.
> It's only affinity towards real strings and characters is the
> builtin library of useful rules.

By the way, a nice use case for using the rules engine could be
"parsing" a stream of SAX events into a structure... XML::Simple in
perl6 could be really as simple as it sounds =)

Can anyone see this being retrofitted on top of current rules
semantics? How does PGE relate to this?

--
() Yuval Kogman <nothi...@woobling.org> 0xEBD27418 perl hacker &

/\ kung foo master: /methinks long and hard, and runs away: neeyah!!!

Sam Vilain

unread,
Jul 12, 2005, 9:22:26 PM7/12/05
to Yuval Kogman, perl6-l...@perl.org
Yuval Kogman wrote:
> By the way, a nice use case for using the rules engine could be
> "parsing" a stream of SAX events into a structure... XML::Simple in
> perl6 could be really as simple as it sounds =)
> Can anyone see this being retrofitted on top of current rules
> semantics? How does PGE relate to this?

Yes, in fact SGML DTDs, once reduced to compact forms such as in;

http://search.cpan.org/src/SAMV/Perldoc-0.13/t/09-scottish.t

End up looking surprisingly similar to rules.

Sam.

Larry Wall

unread,
Jul 13, 2005, 2:14:10 PM7/13/05
to perl6-l...@perl.org
You guys are beating a live horse. Apocalypse 5 already
discusses arrays pretending to be strings for the sake of parsing.
The capability has to be there, and in fact Patrick has been bearing
that in mind in the design of PGE. The only question for p6l is how
much syntactic sugar you want.

I've always been a bit partial to explicit polymorphic declarations:

my byte $@foo;

to mean that $foo and @foo are two views of the same object. In that
sense, the implicit declaration of $/ is really $@%/ or some such.

And by happy chance, $@ and $% have both come available in Perl 6. :-)

Or we can just use traits like the Apocalypse suggests. But I like
the idea of highlanderish variables as long as they're explicitly
declared that way.

On the other hand, it does admit the possibility of people mixing up
$@foo with @$foo. So perhaps the polymorphic forms are allowed only
in the declaration, and in normal code you have to pick one or the other.

Though I suppose it then becomes an interesting question whether

@foo ~~ $foo

returns true or not.

Larry

Reply all
Reply to author
Forward
0 new messages