WriteDelimited/parseDelimited in python

1,535 views
Skip to first unread message

Graham

unread,
Dec 31, 2009, 9:28:41 PM12/31/09
to Protocol Buffers
I'm experimenting with Protocol Buffers as the basis of a network
protocol that I'm putting together, with the server written in Java.
Rather than muck around with having to send the size of the message
independently of the message itself, in Java I'm using WriteDelimited
and ParseDelimited to to the encoding/decoding of messages. This works
wonderfully.

I've got the server written just fine, and can run it up and send hand-
encoded messages to it and it does the right thing. I've just started
writing some simple functional tests as python scripts, generated
using the same .proto file as the java code, but I can't see how to do
WriteDelimited/ParseDelimited (or the equivilent of) in the Python
API. Is there a way of doing this? And if so, what am I missing? :)

Cheers
--
Graham Cox

Kenton Varda

unread,
Jan 1, 2010, 2:32:27 AM1/1/10
to Graham, Protocol Buffers
I don't think an equivalent has been added to the Python API.  Want to write up a patch?


--

You received this message because you are subscribed to the Google Groups "Protocol Buffers" group.
To post to this group, send email to prot...@googlegroups.com.
To unsubscribe from this group, send email to protobuf+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.



Graham

unread,
Jan 1, 2010, 3:53:39 PM1/1/10
to Protocol Buffers
On Jan 1, 7:32 am, Kenton Varda <ken...@google.com> wrote:
> I don't think an equivalent has been added to the Python API.  Want to write
> up a patch?

Well - if you insist... Here's a first run, which seems to work but
I'm a very long way from a competent python programmers so feel free
to fix it up some :)

I can't see how to attach files using the google groups interface, so
I've stuck them on my webspace for now: http://grahamcox.co.uk/patches/protobuf/
There's two patches - one for serializing in a delimited form, and one
for deserializing from a delimited form.
--
Graham Cox

Kenton Varda

unread,
Jan 4, 2010, 3:11:10 PM1/4/10
to Graham, Protocol Buffers
Mostly looks good.  There are some style issues (e.g. lines over 80 chars) but I can clean those up myself.

You'll need to sign the contributor license agreement:

http://code.google.com/legal/individual-cla-v1.0.html -- If you own copyright on this change.
http://code.google.com/legal/corporate-cla-v1.0.html -- If your employer does.

Please let me know after you've done this and then I can submit these.

--
Graham Cox

Kenton Varda

unread,
Jan 4, 2010, 4:57:55 PM1/4/10
to Graham, Protocol Buffers
Hmm, it occurs to me that this currently is not useful for reading from a socket or similar stream since the caller has to make sure to read an entire message before trying to parse it, but the caller doesn't actually know how long the message is (because the code that determines this is encapsulated).  Any thoughts on this?

Kenton Varda

unread,
Jan 4, 2010, 8:32:45 PM1/4/10
to Graham Cox, Protocol Buffers
Make sure to "reply all" so that the group is CC'd.

So you are saying that the user should read whatever data is on the socket, then attempt to parse it, and if it fails, assume that it's because there is more data to read?  Seems rather wasteful.  I think what we ideally want is either:
(a) Provide a way for the caller to read the size independently, so that they can then make sure to read that many bytes from the input before parsing.
(b) Provide a method that reads from a stream, so that the protobuf library can automatically take care of reading all necessary bytes.

Option (b) is obviously cleaner but has a few problems:
- We have to choose a particular stream interface to support.  While the Python "file-like" interface is pretty common I'm not sure if it's universal for this kind of task.
- If not all bytes of the message are available yet, we'd have to block.  This might be fine most of the time, but would be unacceptable for some uses.

Thoughts?

On Mon, Jan 4, 2010 at 3:09 PM, Graham Cox <cox.g...@gmail.com> wrote:
I'm using it for reading/writing to sockets in my functional tests - works well enough there...
In my Java-side server code, I read from the socket into a byte buffer, then deserialize the byte buffer into Protobuf objects, throwing away the data that has been deserialized. The python "MergeDelimitedFromString" function also returns the number of bytes that were processed to build up the Protobuf object, so the user could easily do the same - read the socket onto the end of a buffer, and then while the buffer is successfully deserializing into objects throw away the first x bytes as appropriate...

Just a thought :)

Graham Cox

unread,
Jan 5, 2010, 6:57:47 AM1/5/10
to Kenton Varda, Protocol Buffers
I was saying the user *could* do that, and that it's currently what I'm doing in my server-side code. The reason being, as you said, if you naively read from a stream and the message isn't all present then you need to block until it is with the way that the Java code works at present. If you are using it for client-side code then likely this is not an issue in the slightest, but a server that needs to be able to handle many clients at once just can not block on one of them...

As to your other alternative, (a), I would suggest that this leaves too much of the underlying network protocol bare to the caller. This will make it very difficult to change the way that delimiting messages happens in the future should such a thing be required. If - for example - it is decided to go from having the length prefixed to having a special delimiting sequence after the message then it will cause all current calling code to need to be changed. It might be that this is considered a low enough level library that this is acceptable, but that would be a Google decision...

One more alternative would be how the asn1c library works for parsing ASN.1 streams into objects, which is to be resumable. The decoder reads all the data it is given, and tries to build the object from this. If it doesn't have enough data yet then it does what it can, remembers where it got to and returns back to the user who can then supply more data when it becomes available. If the entire message does parse from the data provided then return back to the user the amount of data consumed so that they can discard this (reading from the stream directly makes this slightly cleaner still). At present, the Protobuf libraries (any of them) can not support this method of decoding an object, and it is not a trivial change to make it possible to do, but it does - IMO - give a much cleaner and easier to use method of use.
-- 
Graham Cox

Kenton Varda

unread,
Jan 13, 2010, 6:28:01 PM1/13/10
to Graham Cox, Protocol Buffers
(I have this on the back burner as I'm kind of swamped, but I do want to get this submitted at some point, hopefully within a week.)

Kenton Varda

unread,
Jan 19, 2010, 10:43:23 PM1/19/10
to Graham Cox, Protocol Buffers
On Tue, Jan 5, 2010 at 3:57 AM, Graham Cox <cox.g...@gmail.com> wrote:
I was saying the user *could* do that, and that it's currently what I'm doing in my server-side code. The reason being, as you said, if you naively read from a stream and the message isn't all present then you need to block until it is with the way that the Java code works at present. If you are using it for client-side code then likely this is not an issue in the slightest, but a server that needs to be able to handle many clients at once just can not block on one of them...

Well, you should be able to read just the size prefix, then wait until all the bytes have arrived before attempting to parse.  If the size prefix itself has not completely arrived, then you would have to retry reading it.
 
As to your other alternative, (a), I would suggest that this leaves too much of the underlying network protocol bare to the caller. This will make it very difficult to change the way that delimiting messages happens in the future should such a thing be required. If - for example - it is decided to go from having the length prefixed to having a special delimiting sequence after the message then it will cause all current calling code to need to be changed. It might be that this is considered a low enough level library that this is acceptable, but that would be a Google decision...

Realistically, we would never be able to change the encoding like this without requiring some sort of update to the callers, because otherwise we'd be breaking all the callers' abilities to talk to code compiled before the change, which is unacceptable.
 
One more alternative would be how the asn1c library works for parsing ASN.1 streams into objects, which is to be resumable. The decoder reads all the data it is given, and tries to build the object from this. If it doesn't have enough data yet then it does what it can, remembers where it got to and returns back to the user who can then supply more data when it becomes available. If the entire message does parse from the data provided then return back to the user the amount of data consumed so that they can discard this (reading from the stream directly makes this slightly cleaner still). At present, the Protobuf libraries (any of them) can not support this method of decoding an object, and it is not a trivial change to make it possible to do, but it does - IMO - give a much cleaner and easier to use method of use.

I think this would add far too much complication to the system, probably harming performance and increasing code size and memory usage.

Maybe we want is a class called DelemitedMessageReader which has a method Add(bytes).  You read some bytes off the wire, then pass it to Add().  Add() returns a list of byte strings representing messages that are now complete.  It may return an empty list if no new messages were completed.  So you have a loop like:

  while True:
    bytes = ReadSomeBytes()
    messages = reader.Add(bytes)
    for message in messages:
      parsed = MyProtobuf()
      parsed.ParseFromString(message)
      HandleMessage(parsed)

If you want to do event-driven I/O, you simply remove the "while True:" and instead execute this code each time you detect that bytes are available on the input.
Reply all
Reply to author
Forward
0 new messages