Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Byte oriented data types in python

81 views
Skip to first unread message

Ravi

unread,
Jan 24, 2009, 1:55:30 PM1/24/09
to
I have following packet format which I have to send over Bluetooth.

packet_type (1 byte unsigned) || packet_length (1 byte unsigned) ||
packet_data(variable)

How to construct these using python data types, as int and float have
no limits and their sizes are not well defined.

"Martin v. Löwis"

unread,
Jan 24, 2009, 2:52:15 PM1/24/09
to
> packet_type (1 byte unsigned) || packet_length (1 byte unsigned) ||
> packet_data(variable)
>
> How to construct these using python data types, as int and float have
> no limits and their sizes are not well defined.

In Python 2.x, use the regular string type: chr(n) will create a single
byte, and the + operator will do the concatenation.

In Python 3.x, use the bytes type (bytes() instead of chr()).

Regards,
Martin

sk...@pobox.com

unread,
Jan 24, 2009, 2:55:13 PM1/24/09
to Ravi, pytho...@python.org

Ravi> packet_type (1 byte unsigned) || packet_length (1 byte unsigned) ||
Ravi> packet_data(variable)

Ravi> How to construct these using python data types, as int and float have
Ravi> no limits and their sizes are not well defined.

Take a look at the struct and ctypes modules.

--
Skip Montanaro - sk...@pobox.com - http://smontanaro.dyndns.org/

Ravi

unread,
Jan 25, 2009, 10:27:33 AM1/25/09
to

> Take a look at the struct and ctypes modules.

struct is really not the choice. it returns an expanded string of the
data and this means larger latency over bluetooth.

ctypes is basically for the interface with libraries written in C
(this I read from the python docs)

Ravi

unread,
Jan 25, 2009, 10:28:05 AM1/25/09
to

This looks really helpful thanks!

Steve Holden

unread,
Jan 25, 2009, 12:12:58 PM1/25/09
to pytho...@python.org
Ravi wrote:
>> Take a look at the struct and ctypes modules.
>
> struct is really not the choice. it returns an expanded string of the
> data and this means larger latency over bluetooth.
>
If you read the module documentation more carefully you will see that it
"converts" between the various native data types and character strings.
Thus each native data type occupies only as many bytes as are required
to store it in its native form (modulo any alignments needed).

> ctypes is basically for the interface with libraries written in C
> (this I read from the python docs)
>

I believe it *is* the struct module you need.

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/

Grant Edwards

unread,
Jan 25, 2009, 12:46:10 PM1/25/09
to
On 2009-01-25, Ravi <ra.ra...@gmail.com> wrote:
>
>> Take a look at the struct and ctypes modules.
>
> struct is really not the choice. it returns an expanded string of the
> data and this means larger latency over bluetooth.

I don't know what you mean by "returns an expanded string of
the data".

I do know that struct does exactly what you requested.

It converts between Python objects and what is bascially a C
"struct" where you specify the endianness of each field and
what sort of packing/padding you want.

I use the struct module frequenty to impliment binary,
communications protocols in Python. I've used Python/struct
with transport layers ranging from Ethernet (raw, TCP, and UDP)
to async serial, to CAN.

--

"Martin v. Löwis"

unread,
Jan 25, 2009, 2:54:45 PM1/25/09
to

>>> Take a look at the struct and ctypes modules.
>> struct is really not the choice. it returns an expanded string of the
>> data and this means larger latency over bluetooth.
>
> I don't know what you mean by "returns an expanded string of
> the data".
>
> I do know that struct does exactly what you requested.

I disagree. He has a format (type, length, value), with the
value being variable-sized. How do you do that in the struct
module?

> It converts between Python objects and what is bascially a C
> "struct" where you specify the endianness of each field and
> what sort of packing/padding you want.

Sure. However, in the specific case, there is really no C
struct that can reasonably represent the data. Hence you
cannot really use the struct module.

> I use the struct module frequenty to impliment binary,
> communications protocols in Python. I've used Python/struct
> with transport layers ranging from Ethernet (raw, TCP, and UDP)
> to async serial, to CAN.

Do you use it for the fixed-size parts, or also for the variable-sized
data?

Regards,
Martin

Grant Edwards

unread,
Jan 25, 2009, 3:13:24 PM1/25/09
to
On 2009-01-25, Martin v. Löwis <mar...@v.loewis.de> wrote:
>
>>>> Take a look at the struct and ctypes modules.
>>> struct is really not the choice. it returns an expanded string of the
>>> data and this means larger latency over bluetooth.
>>
>> I don't know what you mean by "returns an expanded string of
>> the data".
>>
>> I do know that struct does exactly what you requested.
>
> I disagree. He has a format (type, length, value), with the
> value being variable-sized. How do you do that in the struct
> module?

You construct a format string for the "value" portion based on
the type/length header.

>> It converts between Python objects and what is bascially a C
>> "struct" where you specify the endianness of each field and
>> what sort of packing/padding you want.
>
> Sure. However, in the specific case, there is really no C
> struct that can reasonably represent the data.

I don't see how that can be the case. There may not be a
single C struct that can represent all frames, but for every
frame you should be able to come up with a C struct that can
represent that frame.

> Hence you cannot really use the struct module.

Perhaps I don't understand his requirements, but I use the
struct module for protocols with type/len/value sorts of
packets.

>> I use the struct module frequenty to impliment binary,
>> communications protocols in Python. I've used Python/struct
>> with transport layers ranging from Ethernet (raw, TCP, and
>> UDP) to async serial, to CAN.
>
> Do you use it for the fixed-size parts, or also for the
> variable-sized data?

Both. For varible size/format stuff you decode the first few
bytes and use them to figure out what format/layout to use for
the next chunk of data. It's pretty much the same thing you do
in other languages.

--
Grant

"Martin v. Löwis"

unread,
Jan 25, 2009, 3:25:05 PM1/25/09
to
>> I disagree. He has a format (type, length, value), with the
>> value being variable-sized. How do you do that in the struct
>> module?
>
> You construct a format string for the "value" portion based on
> the type/length header.

Can you kindly provide example code on how to do this?

> I don't see how that can be the case. There may not be a
> single C struct that can represent all frames, but for every
> frame you should be able to come up with a C struct that can
> represent that frame.

Sure. You would normally have a struct such as

struct TLV{
char type;
char length;
char *data;
};

However, the in-memory representation of that struct is *not*
meant to be sent over the wire. In particular, the character
pointer has no meaning outside the address space, and is thus
not to be sent.

> Both. For varible size/format stuff you decode the first few
> bytes and use them to figure out what format/layout to use for
> the next chunk of data. It's pretty much the same thing you do
> in other languages.

In the example he gave, I would just avoid using the struct module
entirely, as it does not provide any additional value:

def encode(type, length, value):
return chr(type)+chr(length)+value

Regards,
Martin

John Machin

unread,
Jan 25, 2009, 4:16:40 PM1/25/09
to

Provided that you don't take Martin's last sentence too literally :-)


| Python 2.6.1 (r261:67517, Dec 4 2008, 16:51:00) [MSC v.1500 32 bit
(Intel)] on win32
| >>> p_data = b"abcd" # Omit the b prefix if using 2.5 or earlier
| >>> p_len = len(p_data)
| >>> p_type = 3
| >>> chr(p_type) + chr(p_len) + p_data
| '\x03\x04abcd'

| Python 3.0 (r30:67507, Dec 3 2008, 20:14:27) [MSC v.1500 32 bit
(Intel)] on win32
| >>> p_data = b"abcd"
| >>> p_len = len(p_data)
| >>> p_type = 3
| >>> bytes(p_type) + bytes(p_len) + p_data # literal translation
| b'\x00\x00\x00\x00\x00\x00\x00abcd'
| >>> bytes(3)
| b'\x00\x00\x00'
| >>> bytes(10)
| b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
| >>> bytes([p_type]) + bytes([p_len]) + p_data
| b'\x03\x04abcd'
| >>> bytes([p_type, p_len]) + p_data
| b'\x03\x04abcd'

Am I missing a better way to translate chr(n) from 2.x to 3.x? The
meaning assigned to bytes(n) in 3.X is "interesting":

2.X:
nuls = '\0' * n
out_byte = chr(n)

3.X:
nuls = b'\0' * n
or
nuls = bytes(n)
out_byte = bytes([n])

Looks to me like there was already a reasonable way of getting a bytes
object containing a variable number of zero bytes. Any particular
reason why bytes(n) was given this specialised meaning? Can't be the
speed, because the speed of bytes(n) on my box is about 50% of the
speed of the * expression for n = 16 and about 65% for n = 1024.

Cheers,
John

"Martin v. Löwis"

unread,
Jan 25, 2009, 4:21:08 PM1/25/09
to John Machin
> Looks to me like there was already a reasonable way of getting a bytes
> object containing a variable number of zero bytes. Any particular
> reason why bytes(n) was given this specialised meaning?

I think it was because bytes() was originally mutable, and you need a
way to create a buffer of n bytes. Now that bytes() ended up immutable
(and bytearray was added), it's perhaps not so useful anymore. Of
course, it would be confusing if bytes(4) created a sequence of one
byte, yet bytearray(4) created four bytes.

Regards,
Martin

Grant Edwards

unread,
Jan 25, 2009, 5:12:08 PM1/25/09
to
On 2009-01-25, Martin v. Löwis <mar...@v.loewis.de> wrote:

>> You construct a format string for the "value" portion based on
>> the type/length header.
>
> Can you kindly provide example code on how to do this?

OK, something like this to handle received data where there is
an initial 8-bit type field that is 1 for 16-bit unsigned
integers in network byte-order, 2 for 32-bit IEEE floats in
network byte-order. We'll further assume that the 'length'
field comes next as a 16 bit unsigned value in network order
and represents "how many" objects of the specified type follow:

dtype = ord(rawdata[0])
dcount = struct.unpack("!H",rawdata[1:3])
if dtype == 1:
fmtstr = "!" + "H"*dcount
elif dtype == 2:
fmtstr = "!" + "f"*dcount
rlen = struct.calcsize(fmtstr)

data = struct.unpack(fmtstr,rawdata[3:3+rlen])

leftover = rawdata[3+rlen:]

>> I don't see how that can be the case. There may not be a
>> single C struct that can represent all frames, but for every
>> frame you should be able to come up with a C struct that can
>> represent that frame.
>
> Sure. You would normally have a struct such as
>
> struct TLV{
> char type;
> char length;
> char *data;
> };
>
> However, the in-memory representation of that struct is *not*
> meant to be sent over the wire. In particular, the character
> pointer has no meaning outside the address space, and is thus
> not to be sent.

Well if it's not representing the layout of the data we're
trying to deal with, then it's irrelevent. We are talking
about how convert python objects to/from data in the
'on-the-wire' format, right?

Or isn't that what the OP is asking about?

>> Both. For varible size/format stuff you decode the first few
>> bytes and use them to figure out what format/layout to use for
>> the next chunk of data. It's pretty much the same thing you do
>> in other languages.
>
> In the example he gave, I would just avoid using the struct module
> entirely, as it does not provide any additional value:
>
> def encode(type, length, value):
> return chr(type)+chr(length)+value

Like this?

>>> def encode(type,length,value):
... return chr(type)+chr(length)+value
...
>>> print encode('float', 1, 3.14159)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 2, in encode
TypeError: an integer is required
>>>

--
Grant

"Martin v. Löwis"

unread,
Jan 25, 2009, 5:28:55 PM1/25/09
to Grant Edwards
> dtype = ord(rawdata[0])
> dcount = struct.unpack("!H",rawdata[1:3])
> if dtype == 1:
> fmtstr = "!" + "H"*dcount
> elif dtype == 2:
> fmtstr = "!" + "f"*dcount
> rlen = struct.calcsize(fmtstr)
>
> data = struct.unpack(fmtstr,rawdata[3:3+rlen])
>
> leftover = rawdata[3+rlen:]

Unfortunately, that does not work in the example. We have
a message type (an integer), and a variable-length string.
So how do you compute the struct format for that?

>> Sure. You would normally have a struct such as
>>
>> struct TLV{
>> char type;
>> char length;
>> char *data;
>> };
>>
>> However, the in-memory representation of that struct is *not*
>> meant to be sent over the wire. In particular, the character
>> pointer has no meaning outside the address space, and is thus
>> not to be sent.
>
> Well if it's not representing the layout of the data we're
> trying to deal with, then it's irrelevent. We are talking
> about how convert python objects to/from data in the
> 'on-the-wire' format, right?

Right: ON-THE-WIRE, not IN MEMORY. In memory, there is a
pointer. On the wire, there are no pointers.

> Like this?
>
>>>> def encode(type,length,value):
> ... return chr(type)+chr(length)+value
> ...
>>>> print encode('float', 1, 3.14159)
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "<stdin>", line 2, in encode
> TypeError: an integer is required

No:

py> CONNECT_REQUEST=17
py> payload="call me"
py> encode(CONNECT_REQUEST, len(payload), payload)
'\x11\x07call me'

Regards,
Martin

Grant Edwards

unread,
Jan 25, 2009, 6:05:31 PM1/25/09
to
On 2009-01-25, Martin v. Löwis <mar...@v.loewis.de> wrote:
>> dtype = ord(rawdata[0])
>> dcount = struct.unpack("!H",rawdata[1:3])
>> if dtype == 1:
>> fmtstr = "!" + "H"*dcount
>> elif dtype == 2:
>> fmtstr = "!" + "f"*dcount
>> rlen = struct.calcsize(fmtstr)
>>
>> data = struct.unpack(fmtstr,rawdata[3:3+rlen])
>>
>> leftover = rawdata[3+rlen:]
>
> Unfortunately, that does not work in the example. We have
> a message type (an integer), and a variable-length string.
> So how do you compute the struct format for that?

I'm confused. Are you asking for an introductory tutorial on
programming in Python?

> Right: ON-THE-WIRE, not IN MEMORY. In memory, there is a
> pointer. On the wire, there are no pointers.

I don't understand your point.

> py> CONNECT_REQUEST=17
> py> payload="call me"
> py> encode(CONNECT_REQUEST, len(payload), payload)
> '\x11\x07call me'

If all your data is comprised of 8-bit bytes, then you don't
need the struct module.

--
Grant

"Martin v. Löwis"

unread,
Jan 25, 2009, 6:36:51 PM1/25/09
to Grant Edwards
>> Unfortunately, that does not work in the example. We have
>> a message type (an integer), and a variable-length string.
>> So how do you compute the struct format for that?
>
> I'm confused. Are you asking for an introductory tutorial on
> programming in Python?

Perhaps. I honestly do not know how to deal with variable-sized
strings in the struct module in a reasonable way, and thus believe
that this module is incapable of actually supporting them
(unless you use inappropriate trickery).

However, as you keep claiming that the struct module is what
should be used, I must be missing something about the struct
module.

> I don't understand your point.
>
>> py> CONNECT_REQUEST=17
>> py> payload="call me"
>> py> encode(CONNECT_REQUEST, len(payload), payload)
>> '\x11\x07call me'
>
> If all your data is comprised of 8-bit bytes, then you don't
> need the struct module.

Go back to the original message of the OP. It says

# I have following packet format which I have to send over Bluetooth.
# packet_type (1 byte unsigned) || packet_length (1 byte unsigned) ||
# packet_data(variable)

So yes, all his date is comprised of 8-bit bytes, and yes, he doesn't
need the struct module. Hence I'm puzzled why people suggest that
he uses the struct module.

I think the key answer is "use the string type, it is appropriate
to represent byte oriented data in python" (also see the subject
of this thread)

Regards,
Martin

Grant Edwards

unread,
Jan 25, 2009, 6:48:59 PM1/25/09
to
On 2009-01-25, Martin v. Löwis <mar...@v.loewis.de> wrote:
>>> Unfortunately, that does not work in the example. We have
>>> a message type (an integer), and a variable-length string.
>>> So how do you compute the struct format for that?
>>
>> I'm confused. Are you asking for an introductory tutorial on
>> programming in Python?
>
> Perhaps. I honestly do not know how to deal with variable-sized
> strings in the struct module in a reasonable way, and thus believe
> that this module is incapable of actually supporting them
> (unless you use inappropriate trickery).

It deals with variable sized fields just fine:

dtype = 18
dlength = 32
format = "!BB%ds" % dlength

rawdata = struct.pack(format, (dtype,dlength,data))

> However, as you keep claiming that the struct module is what
> should be used, I must be missing something about the struct
> module.

http://docs.python.org/library/struct.html

>> I don't understand your point.
>>
>>> py> CONNECT_REQUEST=17
>>> py> payload="call me"
>>> py> encode(CONNECT_REQUEST, len(payload), payload)
>>> '\x11\x07call me'
>>
>> If all your data is comprised of 8-bit bytes, then you don't
>> need the struct module.
>
> Go back to the original message of the OP. It says
>
> # I have following packet format which I have to send over Bluetooth.
> # packet_type (1 byte unsigned) || packet_length (1 byte unsigned) ||
> # packet_data(variable)
>
> So yes, all his date is comprised of 8-bit bytes,

He doesn't specify what format the packet_data is, and we all
assumed he needed to handle conversion of various data types
to/from raw byte-strings.

> and yes, he doesn't need the struct module. Hence I'm puzzled
> why people suggest that he uses the struct module.

We all assumed that "packet_data" might contain values of
various types such as 16 or 32 bit integers, floating point
values -- that packet_data was not solely arbitrary-length
strings of 8-bit bytes.

> I think the key answer is "use the string type, it is
> appropriate to represent byte oriented data in python" (also
> see the subject of this thread)

I, for one, interpreted "byte-oriented" to mean that the data
was received/sent as blocks of bytes but needed to be converted
into other data types. If the data really is just strings of
bytes, and it's sent as strings of bytes, then I have no idea
what the OP was asking, since there's nothing that needs to be
done with the data.

--
Grant

"Martin v. Löwis"

unread,
Jan 25, 2009, 6:53:30 PM1/25/09
to Grant Edwards
> It deals with variable sized fields just fine:
>
> dtype = 18
> dlength = 32
> format = "!BB%ds" % dlength
>
> rawdata = struct.pack(format, (dtype,dlength,data))

I wouldn't call this "just fine", though - it involves
a % operator to even compute the format string. IMO,
it is *much* better not to use the struct module for this
kind of problem, and instead rely on regular string
concatenation.

Regards,
Martin

Grant Edwards

unread,
Jan 25, 2009, 7:04:59 PM1/25/09
to
On 2009-01-25, Martin v. Löwis <mar...@v.loewis.de> wrote:

If all you need to do is concatenate strings, then you're
correct, there's no advantage to using struct or ctypes.

If you need a generic way to deal with arbitrary data types,
then that's what the struct and ctypes modules are designed to
do. The protocols I've implemented always required the ability
to deal with integers greater than 8 bits wide as well as
various other data types.

--
Grant

John Machin

unread,
Jan 25, 2009, 10:54:44 PM1/25/09
to

IMO, it would be a good idea if struct.[un]pack supported a variable *
length operator that could appear anywhere that an integer constant
could appear, as in C's printf etc and Python's % formatting:

dlen = len(data)
rawdata = struct.pack("!BB*s", dtype, dlen, dlen, data)
# and on the other end of the wire:
dtype, dlen = struct.unpack("!BB", rawdata[:2])
data = struct.unpack("!*s", rawdata[2:], dlen)
# more than 1 count arg could be used if necessary
# *s would return a string
# *B, *H, *I, etc would return a tuple of ints in (3.X-speak)

I've worked with variable-length data that looked like
len1, len2, len3, data1, data2, data3
and the * gadget would have been very handy:
len1, len2, len3 = unpack('!BBB', raw[:3])
data1, data2, data3 = unpack('!*H*i*d', raw[3:], len1, len2, len3)

Note the semantics of '!*H*i*d' would be different from '!8H2i7d'
because otherwise you'd need to do:
bundle = unpack('!*H*i*d', raw[3:], len1, len2, len3)
data1 = bundle[:len1]
data2 = bundle[len1:len1+len2]
data3 = bundle[len1+len2:]

0 new messages