Difference between java and C++ generated byte buffer

280 views
Skip to first unread message

Ranger

unread,
Dec 1, 2015, 4:45:32 AM12/1/15
to FlatBuffers
Dear all,
I've studied FlatBuffer recently and tried to create the a byte buffer for the same schema with same data in both C++ and Java. But the final results were different between 2 platforms.

The schema definition:

table Record4 {
byte1:byte;
ubyte2:byte;
bool3:bool;
short4:short;
ushort5:ushort;
int6:int;
uint7:int;
float8:float;
long9:long;
ulong10:ulong;
double11:double;
string12:string;
}

The data I inserted:
table Record4 {
byte1:byte = 123;
ubyte2:byte = 22;
bool3:bool = false;
short4:short = 25;
ushort5:ushort = 44;
int6:int = 6534;
uint7:int = 934;
float8:float = 25.34f;
long9:long = 239042223L;
ulong10:ulong = 34902013L;
double11:double = 232455323.323;
string12:string = "AAABBBB";
}


And the final byte arrays I got:

In JAVA:
28 0 52 0 51 0 50 0 0 0 48 0 46 0 40 0 36 0 32 0 24 0 16 0 8 0 4 0 28 0 0 48 0 0 0 66 96 -91 54 -7 -75 -85 65  -3 -113 20 2 0 0 0 0 -81 126 63 14 0 0 0 0 -20 81 -68 65 -90 3 0 0 -122 25 0 0 0 0 0 44 0 25 0 22 123 7 0 0 0 65 65 65 66 66 66

In C:

36 0 0 0 0 0 0 0 28 0 56 0 6 0 7 0 0 0 8 0 10 0 12 0 16 0 20 0 28 0 36 0 44 0 24 0 28 0 0 0 0 0 123 22 25 0 44 0 134 25 0 0 166 3 0 0 236 81 188 65 32 0 0 0 175 126 63 14 0 0 0 0 253 143 20 2 0 0 0 0 66 96 165 54 249 181 171 65 0 0 0 0 7 0 0 0 65 65 65 66 66 66 66 0


Could anyone explain to me in detail about C binary buffer HEADER part that contains the offsets (I've already understood the header that Java generated), and why is this difference?
Thanks!

mikkelfj

unread,
Dec 1, 2015, 6:17:00 AM12/1/15
to FlatBuffers
You should provide a hexdump if there is any chance to decipher the input.

The flatbuffer header has a 4 byte offset to the first table and an optional 4 byte ASCII identifier like "MONS" for monster data.

But the Java buffer is wrong in several ways:
The string 65 65 65 66 66 66 (= "AAABBBB") is not zero terminated in the java version and is also missing the last B, but the length is 7 as it should be.

The first 4 bytes in the buffer is the offset to the table start. In C++ this is at byte offset 36 which is reasonable. In Java it is 52 * 256 + 28, which is way outside the buffer and cannot be right. The buffer is probably missing the initial 8 byte header (4 bytes offset and 4 bytes identifier - although the identifier is optional).

The 28 at offset 8 in C++ and at offset 0 in Java is the vtable length (12 fields of size 2 each, and two header fields of size 2 yields 28).
The next number is 56 in C and 52 in Java. This is the tables data size. Depending on how data is stored the table may be smaller or larger due to alignment, so this is probably fine.

After 56 or 52 follows 12 16 bit entries in the vtable with offsets into the table. These are different in C++ because the order of the values stored are different. But this is perfectly fine as along as all values are aligned correctly.

After the 28 vtable bytes follows the table in Java. You can see the 28 value again which indicates the vtable starts 28 bytes earlier. However, this offset is 4 bytes long (table header), so you have 28 0 0 48. But 48 is not correct. It should have been 0. 48 might be the offset to the AAABBBB string, but it is one byte too early.

If you had a 16 byte per line hexdump it would be much easier to see.

The C++ buffer shows unsigned bytes and Java shows signed bytes, but table values are probably the same when converted.

BTW: Please refer to the C++ buffer as C++ and not C. There is a separate generator for C and it uses a different layout storing vtables at then end when possible.

Mozart

unread,
Dec 2, 2015, 1:37:53 AM12/2/15
to FlatBuffers
Dear mikkelfi,
Thank for you explaination. I am new in Flatbuffer. I have tried with the above example. The last bytearray I got in Java:
40 0 0 0 84 69 84 51 28 0 52 0 51 0 50 0 0 0 48 0 46 0 40 0 36 0 32 0 24 0 16 0 8 0 4 0 28 0 0 48 0 0 0 66 96 -91 54 -7 -75 -85 65  -3 -113 20 2 0 0 0 0 -81 126 63 14 0 0 0 0 -20 81 -68 65 -90 3 0 0 -122 25 0 0 0 0 0 44 0 25 0 22 123 7 0 0 0 65 65 65 66 66 66 0

The output is probably fine. But I have some questions when I take a new example:
namespace flatbuffers_test;

table
User {
    name
: string; /// name  = "ABC"
    age
: int;     /// age   = 22
    state
: bool;  /// state = true
}

root_type
User;
file_identifier
"TEST";
file_extension
"test";

So ByteArray I got in C++:
20 0 0 0 84 69 83 84 0 0 10 0 16 0 12 0 8 0 7 0 10 0 0 0 0 0 0 1 22 0 0 0 4 0 0 0 3 0 0 0 65 66 67 0

I follow your explaination
20 0 0 0: 4 bytes offset
84 69 83 84: 4 bytes identifier
0 0 10 0: according to you explaination,I think 2 bytes (0 0) for vtable length, isn't right? (If it is right, why it equal zero?), the next 2 byte (10 0) is tables data size?
I don't know the 4 bytes (4 0 0 0) at off 32 for what. Can you explain to me more detail? Thank you!

One more question, why have difference between Java and C++ when create buffer?

mikkelfj

unread,
Dec 2, 2015, 3:04:24 AM12/2/15
to FlatBuffers
Hi again,

Answering the last part first; yes vtables are two bytes for each entry. There are two vtable header entries: vtable size and table size.
You cannot assume the vtable follows the buffer header. The vtable is pointed to by the table. (0 0) is definitely not a correct vtable size - it is padding.

Regarding Java (and the old input data):

I am not sure I trust your binary dumps. Either theJava flatbuffer generator is wrong, or your dump tool is wrong, and I really hope the last is correct.
The last Java version ends with (7 0 0 0 65 65 65 66 66 66 0) which now has zero termination, but length is 7 and there are only 6 letters "AAABBB".

So why are the buffers different?
Some possibilities are:

- generated with different input data by mistake.
- dump tool may be buggy.
- flatbuffers with same content can have different layout because the format is flexible.
- incorrect use of flatbuffers api in Java and/or in C++.

And again, without hexdump it is very difficult to compare. And with different input data it is impossible.

On Unix there is a tool called "hexdump", and in C/C++ you can use the tool I grapped and modified from the internet somwhere: 


...

Wouter van Oortmerssen

unread,
Dec 4, 2015, 5:25:59 PM12/4/15
to mikkelfj, FlatBuffers
First, the two implementations are not guaranteed to give byte for byte the same encoding, there's a lot of variability in terms of how things can be encoded because everything is using offsets (see https://google.github.io/flatbuffers/md__internals.html ).

Then there's the possibility that your code in either language does something different (or even wrong). Hard to tell without seeing your code.

--
You received this message because you are subscribed to the Google Groups "FlatBuffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flatbuffers...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Mozart

unread,
Dec 8, 2015, 3:47:34 AM12/8/15
to FlatBuffers, mik...@dvide.com
Hi,
You are true, I did use the lastest version so the buffer I got was difference with Java. When I use the release version 1.2.0, the final results are the same between 2 platform
Reply all
Reply to author
Forward
0 new messages