Quick Hacky text_mode Parse in Python

Nicholas Reid

unread,

Dec 8, 2008, 8:03:28 AM12/8/08

to Protocol Buffers

Hi All,

Firstly, just wanted to thank Kenton and the Google team, PB2 is a
beautiful piece of work! Thanks heaps.

I will almost certainly go to some deep circle of Programmer's Hell
for this, but it might be useful for someone until the guys get a
chance to add text_mode message parsing functionality to the Python
API. There are almost certainly more elegant ways of doing this.

Code:

def parse_text_format(message_string, generated_message_type):
"""Parses the given Protobuf text_format into a new instance of
the given type."""

# Should be defined globally somewhere
PROTO_FILENAME = "person.proto"

# Instance new message
obj = generated_message_type()

# Wrap the protoc command-line utility, expects that 'protoc'
should be on your PATH somewhere
(stdout, stdin) = popen2.popen2("protoc %s --encode=%s" %
(PROTO_FILENAME, message_type.DESCRIPTOR.name), bufsize=1024)

# Feed in the message_string in text_format
stdin.write(message_string)
stdin.close()

# Read out the protoc-encoded binary format
binary_string = stdout.read()
stdout.close()

# Parse the resulting binary representation.
obj.ParseFromString(binary_string)
return obj

Example:

Assuming person.proto contains:

message Person {
required string name = 1;
}

Code:

from person_pb2 import *
guido = parse_text_format("""name: "Guido"""", person)

Should give you a person object which you can use for nefarious
purposes.

Kind regards,

Nicholas Reid

Kenton Varda

unread,

Dec 8, 2008, 1:21:39 PM12/8/08

to Nicholas Reid, Petar Petrov, Protocol Buffers

Hey Petar, isn't there a patch someone was trying to submit that implements text format parsing? (For real, not by wrapping protoc.) What's the status of that?

Petar Petrov

unread,

Dec 8, 2008, 1:27:57 PM12/8/08

to Kenton Varda, Nicholas Reid, Protocol Buffers

On Mon, Dec 8, 2008 at 10:21 AM, Kenton Varda <ken...@google.com> wrote:

Hey Petar, isn't there a patch someone was trying to submit that implements text format parsing? (For real, not by wrapping protoc.) What's the status of that?

I'll review it today.
Hopefully the author hasn't forgotten about it.

Piotr Findeisen

unread,

Dec 30, 2008, 3:19:37 AM12/30/08

to Protocol Buffers

On Dec 8, 7:27 pm, Petar Petrov <pe...@google.com> wrote:
> On Mon, Dec 8, 2008 at 10:21 AM, Kenton Varda <ken...@google.com> wrote:
> > Hey Petar, isn't there a patch someone was trying to submit that implements
> > text format parsing? (For real, not by wrapping protoc.) What's the status
> > of that?
>
> I'll review it today.
> Hopefully the author hasn't forgotten about it.

Hey!
This gonna be a feature I miss really much!
Is there happening anything about this?

regards!
Piotr

Helder Suzuki

unread,

Jan 20, 2009, 6:58:45 PM1/20/09

to Piotr Findeisen, pe...@google.com, Protocol Buffers

I sent a partial patch a while ago and I disappeared without completing the text_format in python, sorry about that.
So far I've only implemented the tokenizer part (w/ test cases), but anyone is free to use it to implement the parser part (I'd be really glad), and for some reason I couldn't set up the codereview (I didn't try hard though). For various reasons I won't be able to touch it in the next few weeks.
This is such an important feature, specially if you use protobufs for configuration files, it'd be so handy!

Thanks,
Helder

Petar Petrov

unread,

Jan 21, 2009, 12:34:34 PM1/21/09

to Helder Suzuki, Piotr Findeisen, Protocol Buffers

On Tue, Jan 20, 2009 at 3:58 PM, Helder Suzuki <helder...@gmail.com> wrote:
>
> I sent a partial patch a while ago and I disappeared without completing the text_format in python, sorry about that.
> So far I've only implemented the tokenizer part (w/ test cases), but anyone is free to use it to implement the parser part (I'd be really glad), and for some reason I couldn't set up the codereview (I didn't try hard though). For various reasons I won't be able to touch it in the next few weeks.
> This is such an important feature, specially if you use protobufs for configuration files, it'd be so handy!

Can you sign the contributor agreement:
http://code.google.com/legal/individual-cla-v1.0.html

It's required for us to accept your patch (each contributor has to sign it).

I have reviewed you patch but by some reason I'm unable to send you
the comments.

We have 2 choices:
1. I can apply my own code review comments and submit it in the
internal repository then I will write the rest of the ParseASCII code.
2. You can apply the code review comments below and I'll submit the
patch in external SVN.

In any case you have to sign the agreement above.

Here are my comments on the patch:

google/protobuf/text_format.py:
-__all__ = [ 'MessageToString', 'PrintMessage', 'PrintField',
'PrintFieldValue' ]
+#__all__ = ['MessageToString', 'PrintMessage', 'PrintField', 'PrintFieldValue']
(Draft) 2008/12/08 18:37:10 Why is this line commented?

+class Token(object):
+ """TODO(helder): Document me."""
(Draft) 2008/12/08 18:37:50 Please add the docstrings.

+class Tokenizer(object):
+ """TODO(helder): Document me."""
(Draft) 2008/12/08 18:39:09 1-2 sentences will be enough.

+ def ConsumeNumber(self, started_with_zero, started_with_dot):
+ is_float = False
+ if started_with_zero and self.TryConsumeOneOf(('x', 'X')):
+ # A hex number (started with "0x").
+ self.ConsumeOneOrMore(HexDigit, '"0x" must be followed by hex digits.')
+ elif started_with_zero and self.LookingAt(OctaDigit):
(Draft) 2008/12/08 19:05:42 LookingAt(OctaDigit) -> LookingAt(Digit).
It isn't a bug but a hidden feature. This will accept 09 as a decimal
9 (will proceed with the next case - "decimal number"). I think in
this case we want to error out with a message like the one below -
"Numbers starting with leading zero must be in octal.". That's what
Kenton's implementation does.

google/protobuf/internal/text_format_test.py:
+ def testParseInteger(self):
+ self.assertEqual(text_format.ParseInteger('0'), 0)
+ self.assertEqual(text_format.ParseInteger('1'), 1)
+ self.assertEqual(text_format.ParseInteger('012345'), 5349) # base 8
(Draft) 2008/12/08 19:41:49 Can you also add a test which tries to
parse 09. It should result in an error.

Helder Suzuki

unread,

Jan 21, 2009, 2:51:51 PM1/21/09

to Petar Petrov, Piotr Findeisen, Protocol Buffers

Hi Petar,

Thanks for the code review! Unfortunately I won't be able to touch the code for the next few weeks, so you can go ahead and apply the review comments and start using it right away if you feel like, otherwise just let me know and I'll apply the comments asap.
I've just signed the electronic version of the agreement.

Thanks,
Helder

Reply all

Reply to author

Forward