I have some code running on big- and little-endian machines that
uses unpack to convert big-endian signed integers. Currently, this
has to be rather ugly (and ssllooww) to be portable:
@val = unpack 's*', pack 'S*', unpack 'n*', $data;
Other, even uglier and slower solutions would be
@val = map { $_>32767 ? $_ - 65536 : $_ } unpack 'n*', $data;
or:
$data =~ s/(.)(.)/$2$1/gs if $Config{byteorder} eq '1234' ||
$Config{byteorder} eq '12345678';
@val = unpack 's*', $data;
I thought it would be nice to have this built into perl.
Of course, there are problems with different representations
of signed integers, but I'd assume that two's complement is
widely enough used to leave these special cases to the user.
My "wish" would be to have 4 new template characters for pack
and unpack:
m An signed short in "network" (big-endian) order.
M An signed long in "network" (big-endian) order.
y An signed short in "VAX" (little-endian) order.
Y An signed long in "VAX" (little-endian) order.
M/m have been chosen because they're just next to N/n. (Like
we already have for i/I and j/J). Unfortunately, w is already
taken, so VAX byteorder needs a different pair. Y/y have been
chosen because of their "visual" relationship to V/v.
I must confess that I actually have a patch already ;-)
Here's some benchmarking:
mhx@r2d2 $ cat /tmp/bench.pl
use Benchmark;
use Config;
my $data = pack 'm*', -32768 .. 32767;
timethese(-2, {
unpack_pack_unpack => sub {
my @val = unpack 's*', pack 'S*', unpack 'n*', $data;
},
regex_unpack => sub {
my $swapped = $data;
$swapped =~ s/(.)(.)/$2$1/gs if $Config{byteorder} eq '1234' ||
$Config{byteorder} eq '12345678';
my @val = unpack 's*', $swapped;
},
map_unpack => sub {
my @val = map { $_>32767 ? $_ - 65536 : $_ } unpack 'n*', $data;
},
new_unpack => sub {
my @val = unpack 'm*', $data;
},
});
mhx@r2d2 $ ./perl -Ilib /tmp/bench.pl
Benchmark: running map_unpack, new_unpack, regex_unpack, unpack_pack_unpack for at least 2 CPU seconds...
map_unpack: 2 wallclock secs ( 2.05 usr + 0.00 sys = 2.05 CPU) @ 3.90/s (n=8)
new_unpack: 3 wallclock secs ( 1.99 usr + 0.03 sys = 2.02 CPU) @ 9.41/s (n=19)
regex_unpack: 2 wallclock secs ( 2.05 usr + 0.02 sys = 2.07 CPU) @ 1.93/s (n=4)
unpack_pack_unpack: 3 wallclock secs ( 2.08 usr + 0.00 sys = 2.08 CPU) @ 5.77/s (n=12)
I'd appreciate some comments on whether this would make sense
or not. The patch itself is rather straightforward, and I'll
submit it for review if the majority likes the idea (and I've
finalized it).
Marcus
--
And on the seventh day, He exited from append mode.
What would these differ from n/N and v/V ?
--
H.Merijn Brand Amsterdam Perl Mongers (http://amsterdam.pm.org/)
using perl-5.6.1, 5.8.3, & 5.9.x, and 806 on HP-UX 10.20 & 11.00, 11i,
AIX 4.3, SuSE 9.0, and Win2k. http://www.cmve.net/~merijn/
http://archives.develooper.com/daily...@perl.org/ per...@perl.org
send smoke reports to: smokers...@perl.org, QA: http://qa.perl.org
> On Mon 05 Apr 2004 18:26, Marcus Holland-Moritz <mhx-...@gmx.net> wrote:
> > I wonder if this ever came up as a feature request.
> >
> > I have some code running on big- and little-endian machines that
> > uses unpack to convert big-endian signed integers. Currently, this
> > has to be rather ugly (and ssllooww) to be portable:
> >
> > @val = unpack 's*', pack 'S*', unpack 'n*', $data;
> >
> > Other, even uglier and slower solutions would be
> >
> > @val = map { $_>32767 ? $_ - 65536 : $_ } unpack 'n*', $data;
> >
> > or:
> >
> > $data =~ s/(.)(.)/$2$1/gs if $Config{byteorder} eq '1234' ||
> > $Config{byteorder} eq '12345678';
> > @val = unpack 's*', $data;
> >
> > I thought it would be nice to have this built into perl.
> > Of course, there are problems with different representations
> > of signed integers, but I'd assume that two's complement is
> > widely enough used to leave these special cases to the user.
> >
> > My "wish" would be to have 4 new template characters for pack
> > and unpack:
> >
> > m An signed short in "network" (big-endian) order.
> > M An signed long in "network" (big-endian) order.
> > y An signed short in "VAX" (little-endian) order.
> > Y An signed long in "VAX" (little-endian) order.
>
> What would these differ from n/N and v/V ?
They would be equivalent for the pack case, but would yield
signed values instead of unsigned ones when used with unpack:
mhx@r2d2 $ ./perl -e'print join ",", unpack "vynm", "\xcc\xcc"x4'
52428,-13108,52428,-13108
--
BOFH Excuse #153:
Big to little endian conversion error
Ahh, now it's clear. I re-read the original post with this in mind
I don't think eating up all these letters is a good case.
<brainstorming mode>
Why not allow suffixes, like ! for native short/long, like A allows a count:
unpack "vv-nn-", "\xcc\xcc" x 4
or
unpack "vv!nn!", "\xcc\xcc" x 4
to follow l, L, s, S, and - differently - x and X which all accept !
I think this fit's better in the current structure of implementation and is
easier to explain
</brainstorming>
> On Mon 05 Apr 2004 18:44, Marcus Holland-Moritz <mhx-...@gmx.net> wrote:
> > > > My "wish" would be to have 4 new template characters for pack
> > > > and unpack:
> > > >
> > > > m An signed short in "network" (big-endian) order.
> > > > M An signed long in "network" (big-endian) order.
> > > > y An signed short in "VAX" (little-endian) order.
> > > > Y An signed long in "VAX" (little-endian) order.
> > >
> > > What would these differ from n/N and v/V ?
> >
> > They would be equivalent for the pack case, but would yield
> > signed values instead of unsigned ones when used with unpack:
> >
> > mhx@r2d2 $ ./perl -e'print join ",", unpack "vynm", "\xcc\xcc"x4'
> > 52428,-13108,52428,-13108
>
> Ahh, now it's clear. I re-read the original post with this in mind
>
> I don't think eating up all these letters is a good case.
>
> <brainstorming mode>
> Why not allow suffixes, like ! for native short/long, like A allows a count:
>
> unpack "vv-nn-", "\xcc\xcc" x 4
>
> or
>
> unpack "vv!nn!", "\xcc\xcc" x 4
>
> to follow l, L, s, S, and - differently - x and X which all accept !
>
> I think this fit's better in the current structure of implementation and is
> easier to explain
Very good point!
I'll see if I can get that to work.
> </brainstorming>
>
> --
> H.Merijn Brand Amsterdam Perl Mongers (http://amsterdam.pm.org/)
> using perl-5.6.1, 5.8.3, & 5.9.x, and 806 on HP-UX 10.20 & 11.00, 11i,
> AIX 4.3, SuSE 9.0, and Win2k. http://www.cmve.net/~merijn/
> http://archives.develooper.com/daily...@perl.org/ per...@perl.org
> send smoke reports to: smokers...@perl.org, QA: http://qa.perl.org
>
>
--
Harriet's Dining Observation:
In every restaurant, the hardness of the butter pats
increases in direct proportion to the softness of the bread.
> On Mon, Apr 05, 2004 at 06:44:11PM +0200, Marcus Holland-Moritz wrote:
> > > On Mon 05 Apr 2004 18:26, Marcus Holland-Moritz <mhx-...@gmx.net> wrote:
> > > > I have some code running on big- and little-endian machines that
> > > > uses unpack to convert big-endian signed integers. Currently, this
> > > > has to be rather ugly (and ssllooww) to be portable:
> > --
> > BOFH Excuse #153:
> >
> > Big to little endian conversion error
>
> Now was that deliberate or just coincidence *g*
Believe it or not, it was just coincidence.
Or maybe some sort of AI built into my mail client... ;-)
> --
> Jody
> knew (at) pimb (dot) org
>
--
On a clear disk you can seek forever.
-- P. Denning
> I wonder if this ever came up as a feature request.
>
> I have some code running on big- and little-endian machines that
> uses unpack to convert big-endian signed integers. Currently, this
> has to be rather ugly (and ssllooww) to be portable:
>
> @val = unpack 's*', pack 'S*', unpack 'n*', $data;
>
> Other, even uglier and slower solutions would be
>
> @val = map { $_>32767 ? $_ - 65536 : $_ } unpack 'n*', $data;
>
> or:
>
> $data =~ s/(.)(.)/$2$1/gs if $Config{byteorder} eq '1234' ||
> $Config{byteorder} eq '12345678';
> @val = unpack 's*', $data;
>
> I thought it would be nice to have this built into perl.
> Of course, there are problems with different representations
> of signed integers, but I'd assume that two's complement is
> widely enough used to leave these special cases to the user.
I vote FOR. I just coded a workaround to this missing feature in an
application last week.
--
Glenn -- http://nevcal.com/
===========================
The best part about procrastination is that you are never bored,
because you have all kinds of things that you should be doing.
> On Mon 05 Apr 2004 18:44, Marcus Holland-Moritz <mhx-...@gmx.net> wrote:
> > > > My "wish" would be to have 4 new template characters for pack
> > > > and unpack:
> > > >
> > > > m An signed short in "network" (big-endian) order.
> > > > M An signed long in "network" (big-endian) order.
> > > > y An signed short in "VAX" (little-endian) order.
> > > > Y An signed long in "VAX" (little-endian) order.
> > >
> > > What would these differ from n/N and v/V ?
> >
> > They would be equivalent for the pack case, but would yield
> > signed values instead of unsigned ones when used with unpack:
> >
> > mhx@r2d2 $ ./perl -e'print join ",", unpack "vynm", "\xcc\xcc"x4'
> > 52428,-13108,52428,-13108
>
> Ahh, now it's clear. I re-read the original post with this in mind
>
> I don't think eating up all these letters is a good case.
>
> <brainstorming mode>
> Why not allow suffixes, like ! for native short/long, like A allows a count:
>
> unpack "vv-nn-", "\xcc\xcc" x 4
>
> or
>
> unpack "vv!nn!", "\xcc\xcc" x 4
>
> to follow l, L, s, S, and - differently - x and X which all accept !
>
> I think this fit's better in the current structure of implementation and is
> easier to explain
> </brainstorming>
Ok, attached is a patch that implements makes the '!' suffix turn
n/N/v/V into signed integers.
Comments?
If it's not considered a Bad Idea, I'll apply it.
> --
> H.Merijn Brand Amsterdam Perl Mongers (http://amsterdam.pm.org/)
> using perl-5.6.1, 5.8.3, & 5.9.x, and 806 on HP-UX 10.20 & 11.00, 11i,
> AIX 4.3, SuSE 9.0, and Win2k. http://www.cmve.net/~merijn/
> http://archives.develooper.com/daily...@perl.org/ per...@perl.org
> send smoke reports to: smokers...@perl.org, QA: http://qa.perl.org
>
>
--
"Take it off or else I break it off." -Leela, with Fry's arm around her
If you want my opinion: go for it. Looks good and useful.
I think we're just a tid late for 5.8.4 ...
FWIW I consider it to be a Good Idea.
Now was that deliberate or just coincidence *g*
--
It all looks good, but I'm worried about the further overloading of '!'.
I think we now have:
[sS] - replace 'standard short (16 bits)' with 'native short'
[lL] - replace 'standard long (32 bits)' with 'native long'
[xX] - replace 'skip a byte' with 'skip to type [something] alignment'
[vV] - replace 'unsigned VAX short/long' with 'signed'
[nN] - replace 'unsigned network short/long' with 'signed'
It's not entirely unreasonable to let '!' mean "something special", but
I think there'd be value in giving the perlfunc entry a bit of an
overhaul, or possibly pulling it out entirely into a separate document.
You may also need to give the new combinations a mention in perlpacktut.
Hugo
> :> easier to explain
> :> </brainstorming>
> :
> :Ok, attached is a patch that implements makes the '!' suffix turn
> :n/N/v/V into signed integers.
> :
> :Comments?
> :
> :If it's not considered a Bad Idea, I'll apply it.
Having the functionality is a good idea.
> It all looks good, but I'm worried about the further overloading of '!'.
> I think we now have:
> [sS] - replace 'standard short (16 bits)' with 'native short'
> [lL] - replace 'standard long (32 bits)' with 'native long'
> [xX] - replace 'skip a byte' with 'skip to type [something] alignment'
> [vV] - replace 'unsigned VAX short/long' with 'signed'
> [nN] - replace 'unsigned network short/long' with 'signed'
The specific use of ! was also a concern I had. For the other number types
it means "native size", so I was wondering if there were another symbol
which could be used to indicate differing sign treatment.
It's frustrating that for the other number types case indicate signedness,
whereas for the network types case indicates, but this is how things are.
I guess this won't change in perl6
Nicholas Clark
Yes. The use of '!' is already complicated, even without my patch. :)
Collecting some of the feedback on my original post, and having
a 3-hour train ride today, I had an idea how most requested
features could be added without introducing further overloading
of '!'.
Here's a proposal to extend the pack template syntax to allow
arbitrary signedness and arbitrary size of integer types as
well as both big- and little-endianness for integers, floats
and pointers.
Yes, I know that the floating point stuff is dangerous. But it
should be up to the user to decide whether it's safe to do it.
I the few cases where I (and people I know) needed to unpack
IEEE floats from a foreign byteorder system, it worked fine by
just reversing the bytes. It should be clearly pointed out that
it's dangerous, non-portable and all that, but IMHO Perl should
not be the limiting instance here.
These are the new syntax elements (in the hope that unpack
strings won't look too much like regexes in the near future):
< New suffix to force little-endian byte order on
a value. [sSiIlLqQjJfdFpP]
> New suffix to force big-endian byte order on a
value. [sSiIlLqQjJfdFpP]
=n= An n-byte unsigned integer value
=-n= An n-byte signed integer value
Examples:
n! "a two-byte big-endian unsigned integer"
i!<20 "20 native sized little-endian signed integers"
=-8=> "a big-endian signed quad (8-byte) integer"
q> exactly the same as =-8=>
d< "a "little-endian" double"
We may or may not keep the functionality of my last patch.
If we keep it, we just have some additional redundancy:
n! s> =2=>
N! l> =4=>
v! s< =2=<
V! l< =4=<
Do you think it makes sense to "enhance" pack/unpack in this
way? Especially the =n= syntax would be future-proof when it
comes to 128-bit (256-bit?) types.
I'm not very focussed on (nor am I really happy with) =n=/=-n=
yet, but it's the best I can think of right now. Suggestions on
a better, backwards-compatible syntax for a n-byte wide integer
are welcome.
Also, if there are better ideas for big-/little-endian suffixes,
I'd appreciate that. I kinda like < and >, because they're easy
(at least for me) to remember:
< -> less -> little-endian
> -> greater -> big-endian
If you treat them as arrows, they indicate the direction from
the most significant to the least significant byte.
(No, I haven't started patching yet... ;-)
Marcus
--
Calculon: I'm programmed to be very busy.
Somehow I see the =..= syntax as clutter, and not realy needed for 128 bit
If we go for the <, > scheme, something I _do_ like, we can make nNvV obsolete
by the time we have to support 128 bit stuffness, and reassign nN to 128 bit
and vV to 256 bit.
> I'm not very focussed on (nor am I really happy with) =n=/=-n=
> yet, but it's the best I can think of right now. Suggestions on
> a better, backwards-compatible syntax for a n-byte wide integer
> are welcome.
>
> Also, if there are better ideas for big-/little-endian suffixes,
> I'd appreciate that. I kinda like < and >, because they're easy
> (at least for me) to remember:
>
> < -> less -> little-endian
> > -> greater -> big-endian
>
> If you treat them as arrows, they indicate the direction from
> the most significant to the least significant byte.
All the reasons why I liked it
> (No, I haven't started patching yet... ;-)
--
H.Merijn Brand Amsterdam Perl Mongers (http://amsterdam.pm.org/)
using perl-5.6.1, 5.8.3, & 5.9.x, and 809 on HP-UX 10.20 & 11.00, 11i,
> I'm obviously in favor of the approach, and can't offhand suggest
> anything better for the syntax. My mnemonic for the endian-ness:
> q< (the "little end of the arrowhead" touches the construct)
> q> (the "big end" touches)
Nice!
> Go for it, I'd say! -- jpl
I think there's (up to now) a consensus that the >/< notation
is a Good Thing. So I think I'll start looking into that.
The =..= syntax would be just an additional feature to save
characters and be both backwards- and future-compatible.
Marcus
--
Cole's Law:
Thinly sliced cabbage.
I'm not sure whether it would be a good idea to obsolete something as
often used as n or N. Or is it just me who's using it all the time?
> > I'm not very focussed on (nor am I really happy with) =n=/=-n=
> > yet, but it's the best I can think of right now. Suggestions on
> > a better, backwards-compatible syntax for a n-byte wide integer
> > are welcome.
> >
> > Also, if there are better ideas for big-/little-endian suffixes,
> > I'd appreciate that. I kinda like < and >, because they're easy
> > (at least for me) to remember:
> >
> > < -> less -> little-endian
> > > -> greater -> big-endian
> >
> > If you treat them as arrows, they indicate the direction from
> > the most significant to the least significant byte.
>
> All the reasons why I liked it
>
> > (No, I haven't started patching yet... ;-)
>
> --
> H.Merijn Brand Amsterdam Perl Mongers (http://amsterdam.pm.org/)
> using perl-5.6.1, 5.8.3, & 5.9.x, and 809 on HP-UX 10.20 & 11.00, 11i,
> AIX 4.3, SuSE 9.0, and Win2k. http://www.cmve.net/~merijn/
> http://archives.develooper.com/daily...@perl.org/ per...@perl.org
> send smoke reports to: smokers...@perl.org, QA: http://qa.perl.org
>
>
--
May all your PUSHes be POPped.
> If we go for the <, > scheme, something I _do_ like, we can make nNvV
> obsolete by the time we have to support 128 bit stuffness, and reassign
> nN to 128 bit and vV to 256 bit.
Break existing code by changing the meaning of [nNvV]? Yeah,
that's gonna happen :-).
The =..= construct may not be pretty, but it's rational.
Vv is cryptic and arcane (even for those of us that used VAXen).
And suppose some upstart decides to support a 12-byte integer format?
Rather than dream up new (or, *shudder*, recycle old) keyletters,
I'd prefer to make the length explicit. Maybe even embrace both
integers and floats? -- jpl
No, I use it quite often, but as 64bit and even 128bit are lurking around the
corner, I can see no problem in here. I didn't say to drop them immediately
we're talking probably 5.14.x before is is invalidated. By that time we're all
switched to perl-6
The rest looks nice.
Since the '!' suffix is already used for native sizes, how about using
'-' or '+' as a suffix for signed/unsigned. Those would indicate that
the sign is different. I prefer '+' since it wouldn't be confused
with a range.
- Ian