Protobuf in python during message deserialization complains about 'Unexpected end-group tag.'

5,897 views
Skip to first unread message

maqi...@gmail.com

unread,
Jul 13, 2015, 7:11:55 PM7/13/15
to prot...@googlegroups.com
Hi guys,

I need help with protobuf library. It might be that in concatenation with ZMQ library I have some problems.

I described all details in this topic on stack overflow:

Could you help me with my problem?

Best Regards,
- Krystian

maqi...@gmail.com

unread,
Jul 13, 2015, 7:12:00 PM7/13/15
to prot...@googlegroups.com

I'm developing zmq/protobuf application and I have a problem with deserialization of messages sent from C++ to python. I easily handle messages from python to C++ however in the other direction I have a problem.

Protobuf library in python client application complains that it detected 'Unexpected end-group tag.'

I presume there is a problem between C++ serizalization and python deserialization. I'm wondering if there is some problem with null terminator in C/C++ :(.

This is my C++ serialization code..

// Test Code.
// Try to send some 'demo' response back
RPiProtocol::Message response;
std::string response_string;
response.set_type(RPiProtocol::Message::RESPONSE);
response.set_command(RPiProtocol::Message::GET_SYS_INFO);
response.set_version(0);

// Serialize ZMQ message to string.
if (response.SerializeToString(&response_string))
{
    // Debug prints.
    printf("%#010x\n", response_string.c_str());
    cout << "Response string length= " << response_string.length() << endl;

    //  Send response message back to the client.
    zmq::message_t reply(response_string.length());
    memcpy((void *)reply.data(), &response_string, response_string.length());
    socket.send(reply);
}

This is my python deserialization code..

#  Get the reply.
message = socket.recv()
print len(message)
print ':'.join(x.encode('hex') for x in str(message))
response = rpi_protocol_pb2.Message()

# This line fails
response.ParseFromString(message)

I debugged that deserialization fails in this function \google\protobuf\internal\python_message.py

  def InternalParse(self, buffer, pos, end):
    self._Modified()
    field_dict = self._fields
    unknown_field_list = self._unknown_fields
    while pos != end:
      (tag_bytes, new_pos) = local_ReadTag(buffer, pos)
      field_decoder, field_desc = decoders_by_tag.get(tag_bytes, (None, None))
      if field_decoder is None:
        value_start_pos = new_pos
        new_pos = local_SkipField(buffer, new_pos, end, tag_bytes)
        if new_pos == -1: # HERE I HAVE -1 !!!
          return pos
        if not unknown_field_list:
          unknown_field_list = self._unknown_fields = []
        unknown_field_list.append((tag_bytes, buffer[value_start_pos:new_pos]))
        pos = new_pos
      else:
        pos = field_decoder(buffer, new_pos, end, self, field_dict)
        if field_desc:
          self._UpdateOneofState(field_desc)
    return pos
  cls._InternalParse = InternalParse

C++ (ZMQ SERVER - REP): http://pastebin.com/ACaXk8Vz

PYTHON (ZMQ CLIENT - REQ): http://pastebin.com/X9DR8ue9

Could you help me with enabling my application?

Ilia Mirkin

unread,
Jul 13, 2015, 7:21:19 PM7/13/15
to maqi...@gmail.com, prot...@googlegroups.com
Is what you're sending the same thing as what you're receiving? Do the
lengths match up? Pretty easy to buggily truncate at the first null
byte...
> --
> You received this message because you are subscribed to the Google Groups
> "Protocol Buffers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to protobuf+u...@googlegroups.com.
> To post to this group, send email to prot...@googlegroups.com.
> Visit this group at http://groups.google.com/group/protobuf.
> For more options, visit https://groups.google.com/d/optout.

maqi...@gmail.com

unread,
Jul 14, 2015, 1:46:00 PM7/14/15
to prot...@googlegroups.com, imi...@alum.mit.edu
The length is these same on each side.
However binary it is something else on each side.

First of all I noted in python client that first byte of received message sporadically changes,
Secondly

This is serialized protobuf message to string in C++ server application 0x08 0x02 0x10 0x01 0x18 0x00
This is received packet in ZMQ client written in python 0xe4 0x1f 0x02 0x00 0x90 0xf6

So it is totally different..

maqi...@gmail.com

unread,
Jul 14, 2015, 5:32:12 PM7/14/15
to prot...@googlegroups.com, imi...@alum.mit.edu

Finally I have found buggy code. I had an error in this line in C++ server:

memcpy((void *)reply.data(), &response_string, response_string.length());

Instead of the buggy code above it should be:

memcpy((void *)reply.data(), (void *)response_string.data(), response_string.length());

I understood how to convert C++ string into ZMQ string because I've found this function on the web:

//  Convert string to 0MQ string and send to socket    
static bool s_send (zmq::socket_t & socket, const std::string & string) {

    zmq::message_t message(string.size());
    memcpy (message.data(), string.data(), string.size());

    bool rc = socket.send (message);
    return (rc);
}

Below is the link to zhelpers.hpp header file which contains the function pasted above and many other useful function for C++ ZMQ based application:https://github.com/imatix/zguide/blob/master/examples/C%2B%2B/zhelpers.hpp

Tanmay Saha

unread,
Jun 30, 2017, 9:45:08 AM6/30/17
to Protocol Buffers, imi...@alum.mit.edu
This is what I have done.

mymessageobj = mymessageproto.MyMessage()
myrdd
= mysparkcontext.sequenceFile(filename1, 'org.apache.hadoop.io.Text', 'org.apache.hadoop.io.BytesWritable')
firstvaluebytearray
= myrdd.first()[1]

myhexstring
= ''.join(hex(eachvalue) for eachvalue in firstvaluebytearray)
print mymessageobj.ParseFromString(myhexstring)

But I get the error 'Unexpected end-group tag.'

When I try to send a byte string instead of a hexstring, it throws an error stating 'Invalid wire tag.'

Any help would be appreciated.
Thanks,
Tanmay.

Jisi Liu

unread,
Jun 30, 2017, 5:54:54 PM6/30/17
to Tanmay Saha, Protocol Buffers, imi...@alum.mit.edu
ParseFromString only takes the binary string that SerializeToString() generates. I don't know you get the input bytes, but you probably would have to decode it first. 

Tanmay Saha

unread,
Jul 3, 2017, 12:48:46 AM7/3/17
to Jisi Liu, Protocol Buffers, imi...@alum.mit.edu
So my byte strings are basically kafka generated protobuf strings stored in sequential files as ".gz" files. Now I am reading them and trying to create a spark rdd to run analytics on them. When I try to print the strings I have, they are basically bytes and not the "usual" strings.

To unsubscribe from this group and stop receiving emails from it, send an email to protobuf+unsubscribe@googlegroups.com.

To post to this group, send email to prot...@googlegroups.com.
Visit this group at https://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.



--
With Due Regards
Tanmay Saha,

Tanmay Saha

unread,
Jul 3, 2017, 12:50:12 AM7/3/17
to Jisi Liu, Protocol Buffers, imi...@alum.mit.edu
And I also converted them to hex strings, cause I noticed a lot of hex symbols in the strings, like A0, F3, and stuff.

Tell me what else should I be looking into?

Greg Field

unread,
Feb 7, 2018, 4:13:57 PM2/7/18
to Protocol Buffers
Tanmay I'm sure you already found the solution to this long ago, but I think all you needed to do was convert the bytes array into a bytes object like this:

mymessageobj = mymessageproto.MyMessage()
myrdd
= mysparkcontext.sequenceFile(filename1, 'org.apache.hadoop.io.Text', 'org.apache.hadoop.io.BytesWritable')
firstvaluebytearray
= myrdd.first()[1]

print mymessageobj.ParseFromString(bytes(firstvaluebytearray))


I have just started doing something similar, and I'm looking for an efficient way to decompile the entire sequenceFile, the same as you have done for just the first record (using Python of course). My ultimate goal is to decompile the entire sequenceFile and turn that into a Dataframe.
--
With Due Regards
Tanmay Saha,
Reply all
Reply to author
Forward
0 new messages