Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

2's complement conversion. Is this right?

3 views
Skip to first unread message

Bob Greschke

unread,
Apr 18, 2008, 3:57:56 PM4/18/08
to
I'm reading 3-byte numbers from a file and they are signed (+8 to
-8million). This seems to work, but I'm not sure it's right.

# Convert the 3-characters into a number.
Value1, Value2, Value3 = unpack(">BBB", Buffer[s:s+3])
Value = (Value1*65536)+(Value2*256)+Value3
if Value >= 0x800000:
Value -= 0x1000000
print Value

For example:
16682720 = -94496

Should it be Value -= 0x1000001 so that I get -94497, instead?

Thanks!

Bob

Grant Edwards

unread,
Apr 18, 2008, 4:17:35 PM4/18/08
to

Nope. -94496 is the right answer:

>>> -94496 & 0xffffff
16682720

Here's another way to do it:

------------------------------------------------------------------
import struct

def tohex(s):
return '0x' + ''.join(['%0.2x' % ord(b) for b in s])

def from3Bytes(s):
# sign extend the three byte number in s to make it 4 bytes long
if ord(s[0]) & 0x80:
s = '\xff'+s
else:
s = '\x00'+s
return struct.unpack('>i',s)[0]

for v in 0,1,-2,500,-500,7777,-7777,-94496,98765,-98765,8388607,-8388607,-8388608:
s = struct.pack('>i',v)[1:]
print "%8d %s %8d" % (v, tohex(s), from3Bytes(s))
------------------------------------------------------------------


--
Grant Edwards grante Yow! I'll eat ANYTHING
at that's BRIGHT BLUE!!
visi.com

Ross Ridge

unread,
Apr 18, 2008, 4:37:21 PM4/18/08
to

Your first case is correct, "Value -= 0x1000000". The value 0xFFFFFFF
should be -1 and 0xFFFFFFF - 0x1000000 == -1.

An alternative way of doing this:

Value = unpack(">l", Buffer[s:s+3] + "\0")[0] >> 8

Ross Ridge

--
l/ // Ross Ridge -- The Great HTMU
[oo][oo] rri...@csclub.uwaterloo.ca
-()-/()/ http://www.csclub.uwaterloo.ca/~rridge/
db //

Bob Greschke

unread,
Apr 18, 2008, 5:26:58 PM4/18/08
to
On 2008-04-18 14:37:21 -0600, Ross Ridge
<rri...@caffeine.csclub.uwaterloo.ca> said:

> Bob Greschke <b...@passcal.nmt.edu> wrote:
>> I'm reading 3-byte numbers from a file and they are signed (+8 to
>> -8million). This seems to work, but I'm not sure it's right.
>>
>> # Convert the 3-characters into a number.
>> Value1, Value2, Value3 = unpack(">BBB", Buffer[s:s+3])
>> Value = (Value1*65536)+(Value2*256)+Value3
>> if Value >= 0x800000:
>> Value -= 0x1000000
>> print Value
>>
>> For example:
>> 16682720 = -94496
>>
>> Should it be Value -= 0x1000001 so that I get -94497, instead?
>
> Your first case is correct, "Value -= 0x1000000". The value 0xFFFFFFF
> should be -1 and 0xFFFFFFF - 0x1000000 == -1.
>
> An alternative way of doing this:
>
> Value = unpack(">l", Buffer[s:s+3] + "\0")[0] >> 8
>
> Ross Ridge

Good to know (never was good on the math front).

However, in playing around with your suggestion and Grant's code I've
found that the struct stuff is WAY slower than doing something like this

Value = (ord(Buf[s])*65536)+(ord(Buf[s+1])*256)+ord(Buf[s+2])


if Value >= 0x800000:
Value -= 0x1000000

This is almost twice as fast just sitting here grinding through a few
hundred thousand conversions (like 3sec vs. ~5secs just counting on my
fingers - on an old Sun...it's a bit slow). Replacing *65536 with <<16
and *256 with <<8 might even be a little faster, but it's too close to
call without really profiling it.

I wasn't planning on making this discovery today! :)

Bob

George Sakkis

unread,
Apr 18, 2008, 6:04:37 PM4/18/08
to

You'd better use a more precise timing method than finger counting,
such as timeit. Twice as fast is probably a gross overestimation; on
my box (Python 2.5, WinXP) avoiding unpack is around 10% and 40%
faster from Ross's and Grant's method, respectively:

python -m timeit "for i in xrange(1000):from3Bytes_bob(s)" \
-s "from bin import *; s=pack('>i',1234567)[1:]"
1000 loops, best of 3: 1.02 msec per loop

python -m timeit "for i in xrange(1000):from3Bytes_grant(s)" \
-s "from bin import *; s=pack('>i',1234567)[1:]"
1000 loops, best of 3: 1.43 msec per loop

python -m timeit "for i in xrange(1000):from3Bytes_ross(s)" \
-s "from bin import *; s=pack('>i',1234567)[1:]"
1000 loops, best of 3: 1.12 msec per loop

### bin.py ##########################################

from struct import unpack

def from3Bytes_bob(s):
Value = (ord(s[0])<<16) + (ord(s[1])<<8) + ord(s[2])


if Value >= 0x800000:
Value -= 0x1000000

return Value

def from3Bytes_grant(s):


if ord(s[0]) & 0x80:
s = '\xff'+s
else:
s = '\x00'+s

return unpack('>i',s)[0]

def from3Bytes_ross(s):
return unpack(">l", s + "\0")[0] >> 8


> Replacing *65536 with <<16
> and *256 with <<8 might even be a little faster, but it's too close to
> call without really profiling it.

> I wasn't planning on making this discovery today! :)
>
> Bob

If you are running this on a 32-bit architecture, get Psyco [1] and
add at the top of your module:
import psyco; psyco.full()

Using Psyco in this scenatio is up to 70% faster:

python -m timeit "for i in xrange(1000):from3Bytes_bob(s)" \
-s "from bin import *; s=pack('>i',1234567)[1:]"
1000 loops, best of 3: 624 usec per loop

python -m timeit "for i in xrange(1000):from3Bytes_grant(s)" \
-s "from bin import *; s=pack('>i',1234567)[1:]"
1000 loops, best of 3: 838 usec per loop

python -m timeit "for i in xrange(1000):from3Bytes_ross(s)" \
-s "from bin import *; s=pack('>i',1234567)[1:]"
1000 loops, best of 3: 834 usec per loop


George

[1] http://psyco.sourceforge.net/

Bob Greschke

unread,
Apr 18, 2008, 6:41:34 PM4/18/08
to

I'm on a Solaris 8 with Python 2.3.4 and when crunching through,
literally, millions and millions of samples of seismic data fingers
point out the difference nicely. :) I'll look into this more on some
of our bigger better faster machines (there is no -m option for timeit
on the Sun :). The Sun is just what I develop on. If stuff runs fast
enough to keep me awake on there over an ssh'ed X11 connection it
should run even better on the real field equipment (Macs, Linuxes,
WinXPs).

Never heard of Psyco. That's nice!

Thanks!

Bob

Ross Ridge

unread,
Apr 18, 2008, 7:55:32 PM4/18/08
to
George Sakkis <george...@gmail.com> wrote:
>You'd better use a more precise timing method than finger counting,
>such as timeit. Twice as fast is probably a gross overestimation; on
>my box (Python 2.5, WinXP) avoiding unpack is around 10% and 40%
>faster from Ross's and Grant's method, respectively:
...

>def from3Bytes_ross(s):
> return unpack(">l", s + "\0")[0] >> 8

If you have Python 2.5, here's a faster version:

from struct import *
unpack_i32be = Struct(">l").unpack

def from3Bytes_ross2(s):
return unpack_i32be(s + "\0")[0] >> 8

Ross Ridge

--

Bob Greschke

unread,
Apr 18, 2008, 8:23:25 PM4/18/08
to
On 2008-04-18 17:55:32 -0600, Ross Ridge
<rri...@caffeine.csclub.uwaterloo.ca> said:

> George Sakkis <george...@gmail.com> wrote:
>> You'd better use a more precise timing method than finger counting,
>> such as timeit. Twice as fast is probably a gross overestimation; on
>> my box (Python 2.5, WinXP) avoiding unpack is around 10% and 40%
>> faster from Ross's and Grant's method, respectively:
> ...
>> def from3Bytes_ross(s):
>> return unpack(">l", s + "\0")[0] >> 8
>
> If you have Python 2.5, here's a faster version:
>
> from struct import *
> unpack_i32be = Struct(">l").unpack
>
> def from3Bytes_ross2(s):
> return unpack_i32be(s + "\0")[0] >> 8
>
> Ross Ridge

That's not even intelligible. I wanna go back to COBOL. :)

Ross Ridge

unread,
Apr 18, 2008, 9:36:05 PM4/18/08
to
Ross Ridge <rri...@caffeine.csclub.uwaterloo.ca> said:
> If you have Python 2.5, here's a faster version:
>
> from struct import *
> unpack_i32be = Struct(">l").unpack
>
> def from3Bytes_ross2(s):
> return unpack_i32be(s + "\0")[0] >> 8

Bob Greschke <b...@passcal.nmt.edu> wrote:
> That's not even intelligible. I wanna go back to COBOL. :)

It's the same as the previous version except that it "precompiles"
the struct.unpack() format string. It works similar to the way Python
handles regular expressions.

Grant Edwards

unread,
Apr 18, 2008, 11:30:45 PM4/18/08
to
On 2008-04-18, Bob Greschke <b...@passcal.nmt.edu> wrote:

> However, in playing around with your suggestion and Grant's code I've
> found that the struct stuff is WAY slower than doing something like this
>
> Value = (ord(Buf[s])*65536)+(ord(Buf[s+1])*256)+ord(Buf[s+2])
> if Value >= 0x800000:
> Value -= 0x1000000
>
> This is almost twice as fast just sitting here grinding through a few
> hundred thousand conversions (like 3sec vs. ~5secs just counting on my
> fingers - on an old Sun...it's a bit slow). Replacing *65536 with <<16
> and *256 with <<8 might even be a little faster, but it's too close to
> call without really profiling it.

I didn't know speed was important. This might be a little
faster (depending on hardware):

Value = (ord(Buf[s])<<16) | (ord(Buf[s+1])<<8) | ord(Buf[s+2])

It also makes the intention a bit more obvious (at least to me).

A decent C compiler will recognize that <<16 and <<8 are
special and just move bytes around rather than actually doing
shifts. I doubt the Python compiler does optimizations like
that, but shifts are still usually faster than multiplies
(though, again, a good compiler will recognize that multiplying
by 65536 is the same as shifting by 16 and just move bytes
around).

--
Grant Edwards grante Yow! If elected, Zippy
at pledges to each and every
visi.com American a 55-year-old
houseboy...

Grant Edwards

unread,
Apr 18, 2008, 11:33:34 PM4/18/08
to
On 2008-04-18, Bob Greschke <b...@passcal.nmt.edu> wrote:

> I'm on a Solaris 8 with Python 2.3.4 and when crunching
> through, literally, millions and millions of samples of
> seismic data fingers point out the difference nicely. :) I'll
> look into this more on some of our bigger better faster
> machines (there is no -m option for timeit on the Sun :). The
> Sun is just what I develop on. If stuff runs fast enough to
> keep me awake on there over an ssh'ed X11 connection it
> should run even better on the real field equipment (Macs,
> Linuxes, WinXPs).

If time is an issue, I might write a C program to convert the
files from 24-bit numbers to 32-bit numbers. Then you can use
numpy to load huge arrays of them in a single whack.

--
Grant Edwards grante Yow! .. he dominates the
at DECADENT SUBWAY SCENE.
visi.com

Ivan Illarionov

unread,
Apr 19, 2008, 12:45:54 AM4/19/08
to

So why not put it in C extension?

It's easier than most people think:

<code>
from3bytes.c
============
#include <Python.h>

PyObject*
from3bytes(PyObject* self, PyObject* args)
{
const char * s;
int len;
if (!PyArg_ParseTuple(args, "s#", &s, &len))
return NULL;
long n = (s[0]<<16) | (s[1]<<8) | s[2];
if (n >= 0x800000)
n -= 0x1000000;
return PyInt_FromLong(n);
}

static PyMethodDef functions[] = {
{"from3bytes", (PyCFunction)from3bytes, METH_VARARGS},
{NULL, NULL, 0, NULL},
};


DL_EXPORT(void)
init_from3bytes(void)
{
Py_InitModule("_from3bytes", functions);
}

buildme.py
==========
import os
import sys
from distutils.core import Extension, setup

os.chdir(os.path.dirname(os.path.abspath(__file__)))
sys.argv = [sys.argv[0], 'build_ext', '-i']
setup(ext_modules = [Extension('_from3bytes', ['from3bytes.c'])])
</code>

'python buildme.py' will create '_from3bytes.so' file
'from _from3bytes import from3bytes' will import C-optimized function

Hope this helps.

--
Ivan Illarionov

Ivan Illarionov

unread,
Apr 19, 2008, 1:35:12 AM4/19/08
to

Sorry,
the right code should be:

PyObject*
from3bytes(PyObject* self, PyObject* args)
{
const char * s;
int len;
if (!PyArg_ParseTuple(args, "s#", &s, &len))
return NULL;

long n = ((((unsigned char)s[0])<<16) | (((unsigned char)s[1])<<8) |
((unsigned char)s[2]));

Bob Greschke

unread,
Apr 19, 2008, 5:24:32 PM4/19/08
to

No thanks. Being able to alter and install these programs on whatever
computer they are installed on is more important than speed. I went
down the C-extension path years ago and it turned into a big mess.
Everything has to run on Sun's, Mac's, Linux and Windows without any
major hassels if they have to be changed. I can't reley on everything
the program needs to be rebuilt being installed beyond Python and
Tkinter (and pySerial and PIL for a couple of programs), which they
need to run. So no compiling and everything is in one file, in case a
new version has to be upgraded by a user by emailing it to them while
they're sitting on some mountain in Tibet. Just unzip, stick the .py
somewhere logical, (usually) double-click, and they are off and running.

Bob


Bob Greschke

unread,
Apr 19, 2008, 5:50:05 PM4/19/08
to
On 2008-04-18 21:33:34 -0600, Grant Edwards <gra...@visi.com> said:

> On 2008-04-18, Bob Greschke <b...@passcal.nmt.edu> wrote:
>
>> I'm on a Solaris 8 with Python 2.3.4 and when crunching
>> through, literally, millions and millions of samples of
>> seismic data fingers point out the difference nicely. :) I'll
>> look into this more on some of our bigger better faster
>> machines (there is no -m option for timeit on the Sun :). The
>> Sun is just what I develop on. If stuff runs fast enough to
>> keep me awake on there over an ssh'ed X11 connection it should run even
>> better on the real field equipment (Macs,
>> Linuxes, WinXPs).
>
> If time is an issue, I might write a C program to convert the
> files from 24-bit numbers to 32-bit numbers. Then you can use
> numpy to load huge arrays of them in a single whack.

Yes you could. :) But this is all in real time (as in 'gotta have it
right now' real time). Plus we're dealing with 100's of files
(instruments) at a time. It'll be 1000's of instruments this summer up
in Canada.

Here's the program having happily crunched away at nearly twice the
speed it was the day before yesterday.

www.greschke.com/unlinked/files/pocus.png

The white dots come from the 3-byte integers and make up the green line
which makes up one 'logical' chunk (in this case they recorded for
60mins at a time) of a whole data file with one file per instrument.
Generally we just take a quick look at the green lines to make sure the
instruments were working, and pull out the bad ones so they don't get
used again until they've been looked at. Zooming in to the level where
you can see the individual samples is used to (more) accurately
determine the time when an [intentional] explosion was set off. You
use instruments placed near the shot holes for that. Simple. :)

Bob

Bob Greschke

unread,
Apr 19, 2008, 5:53:01 PM4/19/08
to

George Sakkis

unread,
Apr 19, 2008, 8:10:30 PM4/19/08
to
On Apr 18, 9:36 pm, Ross Ridge <rri...@caffeine.csclub.uwaterloo.ca>
wrote:

> Ross Ridge <rri...@caffeine.csclub.uwaterloo.ca> said:
>
> > If you have Python 2.5, here's a faster version:
>
> >    from struct import *
> >    unpack_i32be = Struct(">l").unpack
>
> >    def from3Bytes_ross2(s):
> >        return unpack_i32be(s + "\0")[0] >> 8
>
> Bob Greschke  <b...@passcal.nmt.edu> wrote:
>
> > That's not even intelligible.  I wanna go back to COBOL. :)
>
> It's the same as the previous version except that it "precompiles"
> the struct.unpack() format string.  It works similar to the way Python
> handles regular expressions.

I didn't know about the Struct class; pretty neat. It's amazing that
this version without Psyco is as fast Bob's version with Psyco! Adding
Psyco to it though makes it *slower*, not faster. So here's how I'd
write it (if I wanted or had to stay in pure Python):

try: import psyco
except ImportError:
from struct import Struct


unpack_i32be = Struct(">l").unpack

def from3Bytes(s):


return unpack_i32be(s + "\0")[0] >> 8

else:
def from3Bytes(s):


Value = (ord(s[0])<<16) + (ord(s[1])<<8) + ord(s[2])
if Value >= 0x800000:
Value -= 0x1000000
return Value

psyco.bind(from3Bytes)


HTH,
George

Ross Ridge

unread,
Apr 20, 2008, 1:00:17 AM4/20/08
to
Ross Ridge <rri...@caffeine.csclub.uwaterloo.ca> wrote:
> It's the same as the previous version except that it "precompiles"
> the struct.unpack() format string. =A0It works similar to the way Python
> handles regular expressions.

George Sakkis <george...@gmail.com> wrote:
>I didn't know about the Struct class; pretty neat. It's amazing that
>this version without Psyco is as fast Bob's version with Psyco!

Unfortunately, it doesn't seem to documented in the Python 2.5 manual.
The speed improvement mainly comes from avoiding the Python code in the
struct module that wraps the Struct class. The struct module caches
already compiled format strings, but looking format strings in the cache
ends taking a fair chunk of time in my original example.

John Machin

unread,
Apr 20, 2008, 2:01:59 AM4/20/08
to
Ross Ridge wrote:
> Ross Ridge <rri...@caffeine.csclub.uwaterloo.ca> wrote:
>> It's the same as the previous version except that it "precompiles"
>> the struct.unpack() format string. =A0It works similar to the way Python
>> handles regular expressions.
>
> George Sakkis <george...@gmail.com> wrote:
>> I didn't know about the Struct class; pretty neat. It's amazing that
>> this version without Psyco is as fast Bob's version with Psyco!
>
> Unfortunately, it doesn't seem to documented in the Python 2.5 manual.

It seems to me that it's documented:

http://docs.python.org/lib/struct-objects.html

The struct module doc page
(http://docs.python.org/lib/module-struct.html) provides links but no
explicit mention of the Struct class in its text.

Peter Otten

unread,
Apr 20, 2008, 6:21:24 AM4/20/08
to
Bob Greschke wrote:

Just for fun, here's a way to convert lots of 3-byte integers using PIL:

from PIL import Image
import array

def int3_pil(s, trafo="\x00"*128+"\xff"*128):
n = len(s)//3
im = Image.new("RGB", (n, 1))
im.fromstring(s)
hi, mid, lo = im.split()
sign = Image.new("L", (n, 1))
sign.fromstring(hi.tostring().translate(trafo))
im = Image.merge("RGBA", (lo, mid, hi, sign))
a = array.array("i")
a.fromstring(im.tostring())
return a.tolist()

Not as fast as I had hoped, though. Also, it needs some work to make it
independent of the processor architecture.

Peter

Ivan Illarionov

unread,
Apr 21, 2008, 4:10:26 PM4/21/08
to

I was able to get even faster pure-python version using array module:

a = array.array('l', ''.join(('\0' + s[i:i+3] for i in xrange(0,
len(s), 3))))
if sys.byteorder == 'little':
a.byteswap()

It actually moves bytes around on C level.


test code:
import struct
import array
import sys

unpack_i32be = struct.Struct(">l").unpack
s = ''.join(struct.pack('>i', 1234567)[1:]*1000)

def from3bytes_ord(s):
values = []
for i in xrange(0, len(s), 3):
Value = (ord(s[i])<<16) | (ord(s[i+1])<<8) | ord(s[i+2])


if Value >= 0x800000:
Value -= 0x1000000

values.append(Value)
return values

def from3bytes_struct(s):
return [unpack_i32be(s[i:i+3] + "\0")[0] >> 8 for i in xrange(0,
len(s), 3)]

def from3bytes_array(s):
a = array.array('l', ''.join(('\0' + s[i:i+3] for i in xrange(0,
len(s), 3))))
if sys.byteorder == 'little':
a.byteswap()
return a.tolist()

from timeit import Timer

t1 = Timer("from3bytes_ord(s)", "from __main__ import s,
from3bytes_ord")
t2 = Timer("from3bytes_struct(s)", "from __main__ import s,
from3bytes_struct")
t3 = Timer("from3bytes_array(s)", "from __main__ import s,
from3bytes_array")

print 'ord:\t', t1.timeit(1000)
print 'struct:\t', t2.timeit(1000)
print 'array:\t', t3.timeit(1000)

Output:
ord: 7.08213110884
struct: 3.7689164405
array: 2.62995268952

Inspired by Guido's essay http://www.python.org/doc/essays/list2str/

Ivan Illarionov

unread,
Apr 21, 2008, 4:50:13 PM4/21/08
to
On 22 апр, 00:10, Ivan Illarionov <ivan.illario...@gmail.com> wrote:

And even faster:
a = array.array('i', '\0' + '\0'.join((s[i:i+3] for i in xrange(0,


len(s), 3))))
if sys.byteorder == 'little':
a.byteswap()

I think it's a fastest possible implementation in pure python

Peter Otten

unread,
Apr 21, 2008, 5:01:14 PM4/21/08
to
Ivan Illarionov wrote:

> And even faster:
> a = array.array('i', '\0' + '\0'.join((s[i:i+3] for i in xrange(0,
> len(s), 3))))
> if sys.byteorder == 'little':
> a.byteswap()
>
> I think it's a fastest possible implementation in pure python

Clever, but note that it doesn't work correctly for negative numbers. For
those you'd have to prepend "\xff" instead of "\0".

Peter

Bob Greschke

unread,
Apr 21, 2008, 5:06:08 PM4/21/08
to

Geeze! :) How did struct get so fast? I'm guessing there have been
improvements since 2.3 (which is what I've been working on). I'm going
to be able to go back to my IBM PC XT pretty soon. :)

Thanks!

Steve Holden

unread,
Apr 21, 2008, 5:20:15 PM4/21/08
to pytho...@python.org
Yes, Bob Ippolito spent quite a bit of time improving it at the Need for
Speed sprint shortly before the 2.4 (?) release. It shows.

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/

Ivan Illarionov

unread,
Apr 21, 2008, 5:30:04 PM4/21/08
to

Thanks for correction.

Another step is needed:

a = array.array('i', '\0' + '\0'.join((s[i:i+3] for i in xrange(0,
len(s), 3))))
if sys.byteorder == 'little':
a.byteswap()

result = [n if n < 0x800000 else n - 0x1000000 for n in a]

And it's still pretty fast :)

Bob Greschke

unread,
Apr 21, 2008, 6:51:13 PM4/21/08
to
Something is fishy. I just ran this simple-minded thing and I'm,
again, getting better times for ord() than I am for unpack() on a
2.8GHz OSX iMac with 2.5.1. This is the iterate so many times you can
use your wristwatch method:

----

#! /usr/bin/env python

from os import system
from struct import unpack

print "unpack 1"
system("date")
for x in xrange(0, 100000000):
Value = unpack(">B", "a")
system("date")
print

print "ord 1"
system("date")
for x in xrange(0, 100000000):
Value = ord("a")
system("date")
print

print "unpack 3"
system("date")
for x in xrange(0, 100000000):
Value = unpack(">BBB", "abc")
system("date")
print

print "ord 3"
system("date")
for x in xrange(0, 100000000):
Value = (ord("a") << 16)+(ord("b") << 8)+ord("c")
system("date")


----

Unpack 1 is about 1m20s
Ord 1 is about 20s
Unpack 3 is about 1m20s
Ord 3 is about 1m03s
after averaging a few runs.

Bob Greschke

unread,
Apr 21, 2008, 7:06:39 PM4/21/08
to
On 2008-04-21 16:51:13 -0600, Bob Greschke <b...@passcal.nmt.edu> said:
JUST COMPLETELY IGNORE THAT LAST ONE. What a dope. Here:

#! /usr/bin/env python

from os import system
from struct import unpack

print "unpack 1"
system("date")
for x in xrange(0, 100000000):

Value = unpack(">B", "a")[0]
if Value > 0x800000:
Value -= 0x1000000
system("date")
print

print "ord 1"
system("date")
for x in xrange(0, 100000000):
Value = ord("a")

if Value > 0x800000:
Value -= 0x1000000
system("date")
print

print "unpack 3"
system("date")
for x in xrange(0, 100000000):

Value1, Value2, Value3 = unpack(">BBB", "abc")
Value = (Value1 << 16)+(Value2 << 8)+Value3
if Value > 0x800000:
Value -= 0x1000000
system("date")
print

print "ord 3"
system("date")
for x in xrange(0, 100000000):
Value = (ord("a") << 16)+(ord("b") << 8)+ord("c")

if Value > 0x800000:
Value -= 0x1000000
system("date")


Still, the differences between unpack and ord are significant (I just
threw in the if's for fun).


George Sakkis

unread,
Apr 21, 2008, 7:10:05 PM4/21/08
to
On Apr 21, 5:30 pm, Ivan Illarionov <ivan.illario...@gmail.com> wrote:

Indeed, the array idea is paying off for largeish inputs. On my box
(Python 2.5, WinXP, 2GHz Intel Core Duo), the cutoff point where
from3Bytes_array becomes faster than from3Bytes_struct is close to 150
numbers (=450 bytes).

The struct solution though is now almost twice as fast with Psyco
enabled, while the array doesn't benefit from it. Here are some
numbers from a sample run:

*** Without Psyco ***
size=1
from3Bytes_ord: 0.033493
from3Bytes_struct: 0.018420
from3Bytes_array: 0.089735
size=10
from3Bytes_ord: 0.140470
from3Bytes_struct: 0.082326
from3Bytes_array: 0.142459
size=100
from3Bytes_ord: 1.180831
from3Bytes_struct: 0.664799
from3Bytes_array: 0.690315
size=1000
from3Bytes_ord: 11.551990
from3Bytes_struct: 6.390999
from3Bytes_array: 5.781636
*** With Psyco ***
size=1
from3Bytes_ord: 0.039287
from3Bytes_struct: 0.009453
from3Bytes_array: 0.098512
size=10
from3Bytes_ord: 0.174362
from3Bytes_struct: 0.045785
from3Bytes_array: 0.162171
size=100
from3Bytes_ord: 1.437203
from3Bytes_struct: 0.355930
from3Bytes_array: 0.800527
size=1000
from3Bytes_ord: 14.248668
from3Bytes_struct: 3.331309
from3Bytes_array: 6.946709


And here's the benchmark script:

import struct
from array import array

def from3Bytes_ord(s):
return [n if n<0x800000 else n-0x1000000 for n in
((ord(s[i])<<16) | (ord(s[i+1])<<8) | ord(s[i+2])
for i in xrange(0, len(s), 3))]

unpack_i32be = struct.Struct('>l').unpack

def from3Bytes_struct(s):


return [unpack_i32be(s[i:i+3] + '\0')[0]>>8
for i in xrange(0,len(s),3)]

def from3Bytes_array(s):
a = array('l', ''.join('\0' + s[i:i+3]


for i in xrange(0,len(s), 3)))

a.byteswap()
return [n if n<0x800000 else n-0x1000000 for n in a]


def benchmark():
from timeit import Timer
for n in 1,10,100,1000:
print ' size=%d' % n
# cycle between positive and negative
buf = ''.join(struct.pack('>i', 1234567*(-1)**(i%2))[1:]
for i in xrange(n))
for func in 'from3Bytes_ord', 'from3Bytes_struct',
'from3Bytes_array':
print ' %s: %f' % (func,
Timer('%s(buf)' % func ,
'from __main__ import %s; buf=%r' % (func,buf)
).timeit(10000))


if __name__ == '__main__':
s = ''.join(struct.pack('>i',v)[1:] for v in
[0,1,-2,500,-500,7777,-7777,-94496,98765,
-98765,8388607,-8388607,-8388608,1234567])
assert from3Bytes_ord(s) == from3Bytes_struct(s) ==
from3Bytes_array(s)

print '*** Without Psyco ***'
benchmark()

import psyco; psyco.full()
print '*** With Psyco ***'
benchmark()


George

Bob Greschke

unread,
Apr 21, 2008, 7:59:51 PM4/21/08
to

I just ran this on my SunBlade 1000 with 2.3.4 and

unpack 1: 7m53s
ord 1: 2m00s
unpack 3: 14m20s
ord 3: 5m30s

timeit looks broken somehow to me.

Ivan Illarionov

unread,
Apr 22, 2008, 12:04:16 AM4/22/08
to

Comments:
You didn't use the faster version of array approach:


''.join('\0' + s[i:i+3] for i in xrange(0,len(s), 3))

is slower than
'\0' + '\0'.join(s[i:i+3] for i in xrange(0,len(s), 3))


To Bob Greschke:

Struct is fast in Python 2.5 with struct.Struct class.

Array approach should work with Python 2.3 and it's probably the fastest
one (without psyco) with large inputs:

def from3bytes_array(s):
a = array.array('i', '\0' + '\0'.join([s[i:i+3]
for i in xrange(0, len(s), 3)]))
a.byteswap() # if your system is little-endian
return [n >= 0x800000 and n - 0x1000000 or n for n in a]

--
Ivan

Ivan Illarionov

unread,
Apr 22, 2008, 12:04:46 AM4/22/08
to
On Mon, 21 Apr 2008 16:10:05 -0700, George Sakkis wrote:

Comments:

George Sakkis

unread,
Apr 22, 2008, 9:03:27 AM4/22/08
to
On Apr 22, 12:04 am, Ivan Illarionov <ivan.illario...@gmail.com>
wrote:

Good catch; the faster version reduces the cutoff point between
from3Bytes_array and from3Bytes_struct to ~50 numbers (=150 bytes)
only (without Psyco).

George

0 new messages