Problem encoding between C# string on windows and C++ on unix

852 views
Skip to first unread message

Jamie Sherman

unread,
Jan 4, 2017, 11:38:15 AM1/4/17
to Protocol Buffers
I have a proto message that I have defined. I'm consuming the message on OSX using C++ and producing it on windows using C#.
I'm using release 3.0.0 of Google.protobuf (NuGet on windows, compiled and built on OSX). 

I have read that Protobuf stores UTF-8 strings. I realize that native C# strings are UTF-16. I assumed the C# library would 
take care of the conversion from UTF-16 to UTF-8 but that doesn't seem to be the case. The online examples that I've found 
seem to just assign a string the variable (wiffName) but that doesn't seem to work. 

Can someone point out where I'm going wrong and how get around this? If the library doesn't handle the conversion how should 
I go about changing a UTF-16 string into a UTF-8 string in C#? Any help is really appreciated


Proto File:

message XicSetHeader{

int64 TotalXicSets = 1;

string wiffName = 2;

}



C# Code:

            var xsetHeader = new XicSetHeader();
            xsetHeader.TotalXicSets = xsetVec.Count;
            xsetHeader.WiffName = "myWiffNameHolder"; 

           using (var stream = File.Create(FileOutName(oPath))) // MemoryStream stream = new MemoryStream())
            {
                xsetHeader.WriteTo(stream);
            }


C++ Code:
       // This is being passed a pointer ifstream in a good state to the encoded proto message

    XicContainer::XicContainer(std::istream *in): m_xheader(new XicHeader())

    {

        // m_xheader is a pointer of type XicHeader and initialized above

        m_xheader->ParseFromIstream(in);

    }


C++ Error Message:

[libprotobuf ERROR google/protobuf/wire_format_lite.cc:532] String field 'XicHeader.wiffName' contains invalid UTF-8 data when parsing a protocol buffer. Use the 'bytes' type if you intend to send raw bytes.

Feng Xiao

unread,
Jan 4, 2017, 2:36:04 PM1/4/17
to Jamie Sherman, Protocol Buffers
On Wed, Jan 4, 2017 at 8:38 AM, Jamie Sherman <jamie....@gmail.com> wrote:
I have a proto message that I have defined. I'm consuming the message on OSX using C++ and producing it on windows using C#.
I'm using release 3.0.0 of Google.protobuf (NuGet on windows, compiled and built on OSX). 

I have read that Protobuf stores UTF-8 strings. I realize that native C# strings are UTF-16. I assumed the C# library would 
take care of the conversion from UTF-16 to UTF-8 but that doesn't seem to be the case. The online examples that I've found 
seem to just assign a string the variable (wiffName) but that doesn't seem to work. 

Can someone point out where I'm going wrong and how get around this? If the library doesn't handle the conversion how should 
I go about changing a UTF-16 string into a UTF-8 string in C#? Any help is really appreciated


Proto File:

message XicSetHeader{

int64 TotalXicSets = 1;

string wiffName = 2;

}



C# Code:

            var xsetHeader = new XicSetHeader();
            xsetHeader.TotalXicSets = xsetVec.Count;
            xsetHeader.WiffName = "myWiffNameHolder"; 

           using (var stream = File.Create(FileOutName(oPath))) // MemoryStream stream = new MemoryStream())
            {
                xsetHeader.WriteTo(stream);
            }


C++ Code:
       // This is being passed a pointer ifstream in a good state to the encoded proto message

    XicContainer::XicContainer(std::istream *in): m_xheader(new XicHeader())

    {

        // m_xheader is a pointer of type XicHeader and initialized above

        m_xheader->ParseFromIstream(in);

    }

Can you check if the data size is the same on both ends? If you are writing the data into a file, make sure you are writing/reading it in binary mode.
 

C++ Error Message:

[libprotobuf ERROR google/protobuf/wire_format_lite.cc:532] String field 'XicHeader.wiffName' contains invalid UTF-8 data when parsing a protocol buffer. Use the 'bytes' type if you intend to send raw bytes.

--
You received this message because you are subscribed to the Google Groups "Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to protobuf+unsubscribe@googlegroups.com.
To post to this group, send email to prot...@googlegroups.com.
Visit this group at https://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.

Jamie Sherman

unread,
Jan 4, 2017, 2:57:29 PM1/4/17
to Protocol Buffers, jamie....@gmail.com
So I forced the message to have fixed values:
     
C#
            xsetHeader.TotalXicSets = 10; 
            xsetHeader.WiffName = "myWiffNameHolder";

hexdump of message: 

080a12106d79576966664e616d65486f6c646572

C++

        XicHeader test;

        test.set_totalxics(10);

        test.set_wiffname("myWiffNameHolder");

        test.SerializeToOstream(of);

 
hexdump of message:

080a1a106d79576966664e616d65486f6c646572


I recompiled my C++ protobuf to make sure the flags matched the flags I'm compiling with 
the error message has gone away but I get an empty string out now when I deserialize. 
After building I did a make check and it passed the 7 tests. 
Any guesses as to what to kick next? Should the messages serialize to the same binary sequence?
I generated the hexdump using xxd -p filename.


C++ Error Message:

[libprotobuf ERROR google/protobuf/wire_format_lite.cc:532] String field 'XicHeader.wiffName' contains invalid UTF-8 data when parsing a protocol buffer. Use the 'bytes' type if you intend to send raw bytes.

--
You received this message because you are subscribed to the Google Groups "Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to protobuf+u...@googlegroups.com.

Feng Xiao

unread,
Jan 4, 2017, 3:05:18 PM1/4/17
to Jamie Sherman, Protocol Buffers
This is wrong.
08 => field number 1, wire-type varint => totalxics field.
0a => 10
1a => field number 3, wire-type length-delimited => doesn't exist

The C# version is correct at this position: 12 => field number 2, wire-type length-delimited => wiffname field. 


I recompiled my C++ protobuf to make sure the flags matched the flags I'm compiling with 
the error message has gone away but I get an empty string out now when I deserialize. 
After building I did a make check and it passed the 7 tests. 
Any guesses as to what to kick next? Should the messages serialize to the same binary sequence?
Yes, they should serialize to the same binary sequence.

Tim Kientzle

unread,
Jan 4, 2017, 3:05:51 PM1/4/17
to Jamie Sherman, Protocol Buffers

On Jan 4, 2017, at 11:57 AM, Jamie Sherman <jamie....@gmail.com> wrote:

So I forced the message to have fixed values:
     
C#
            xsetHeader.TotalXicSets = 10; 
            xsetHeader.WiffName = "myWiffNameHolder";

hexdump of message: 

080a12106d79576966664e616d65486f6c646572

C++

        XicHeader test;
        test.set_totalxics(10);
        test.set_wiffname("myWiffNameHolder");
        test.SerializeToOstream(of);
 
hexdump of message:

080a1a106d79576966664e616d65486f6c646572



It looks like you have two incompatible versions of your proto file
floating around.  You should look through your source code very
carefully to see how that happened.

Your C# code thinks the `wiffName` field is field #2 (the third byte there
is 0x12 = 2 * 8 + 2).

Your C++ code thinks the `wiffName` field is field #3 (the third byte is
0x1a = 3 * 8 + 2).

Also, both are correctly writing UTF-8 strings into the output.

Tim

Jamie Sherman

unread,
Jan 4, 2017, 3:20:19 PM1/4/17
to Tim Kientzle, Protocol Buffers
Thanks you nailed it. I thought the generated proto file was the same but they were slightly different between the C++ and C# repositories I was working with. This will get me to use git submodule. Thanks again.  

Reply all
Reply to author
Forward
0 new messages