trouble understanding schemaless serialization header for REQUEST_RECORD

Michael Peterson

unread,

Feb 7, 2015, 11:21:31 AM2/7/15

to orient-...@googlegroups.com

Hello,

I am continuing work on a Go (golang) driver for OrientDB and working on the binary network protocol. I am wondering if someone can help me interpret the output of a REQUEST_RECORD_LOAD request.

I am using orientdb-community-2.0-rc2 and orientdb-community-2.1 to test against.

For reference when I do the query with the command line client here's the result:

orientdb {db=cars}> load record #11:0

+---------------------------------------------------------------------+
| Document - @class: Person             @rid: #11:0      @version: 1 |
+---------------------------------------------------------------------+
|                     Name | Value                                    |
+---------------------------------------------------------------------+
|                     name | Luke                                     |
+---------------------------------------------------------------------+

When I do it with my golang client specifying the SCHEMALESS binary serialization format, here's what the server sends back (my annotations included):

Reading byte (1 byte)... [OChannelBinaryServer]
Read byte: 30 [OChannelBinaryServer] => REQUEST_RECORD_LOAD
Reading int (4 bytes)... [OChannelBinaryServer]
Read int: 59 [OChannelBinaryServer]   => session-id
Reading short (2 bytes)... [OChannelBinaryServer]
Read short: 11 [OChannelBinaryServer] => cluster-id
Reading long (8 bytes)... [OChannelBinaryServer]
Read long: 0 [OChannelBinaryServer]   => cluster-position
Reading string (4+N bytes)... [OChannelBinaryServer]
Read string: [OChannelBinaryServer] => fetch plan (empty string)
Reading byte (1 byte)... [OChannelBinaryServer]
Read byte: 0 [OChannelBinaryServer]   => ignore-cache
Reading byte (1 byte)... [OChannelBinaryServer]
Read byte: 0 [OChannelBinaryServer]   => load-tombstones

Writing byte (1 byte): 0 [OChannelBinaryServer]   => status: SUCCESS
Writing int (4 bytes): 59 [OChannelBinaryServer] => session-id
Writing byte (1 byte): 1 [OChannelBinaryServer]   => payload-status: record=resultset
Writing byte (1 byte): 100 [OChannelBinaryServer] => record-type: 'd' (ascii 100) = document
Writing int (4 bytes): 1 [OChannelBinaryServer]   => record-version
Writing bytes (4+19=23 bytes): [0, 12, 80, 101, 114, 115, 111, 110, 1, 0, 0, 0, 14, 0, 8, 76, 117, 107, 101] [OChannelBinaryServer] => record-content (see below)
Writing byte (1 byte): 0 [OChannelBinaryServer] => payload-status: no more records

Everything looks good except for how to interpret the record-content bytes. They don't look like what I would expect from this spec: https://raw.githubusercontent.com/wiki/orientechnologies/orientdb/Record-Schemaless-Binary-Serialization.md

      Version
      |---|----------Classname-----------|------Header--------|--------Data--------|
           len |-------- string ---------| ? ? ? ? ptr ? |len |----string-----|
            6   P    e    r    s   o    n                      4   L   u    k    e
bytes: [0, 12, 80, 101, 114, 115, 111, 110, 1, 0, 0, 0, 14, 0, 8, 76, 117, 107, 101]
idx : 0   1   2    3    4    5    6    7 8 9 10 11 12 13 14 15   16   17   18

The version, classname and data sections look right. But I can't figure out the header piece. It is supposed to be

    +--------------------------+-------------------+-------------------------------+----------------+
    | field_name_length:varint | field_name:byte[] | pointer_to_data_structure:int | data_type:byte |
    +--------------------------+-------------------+-------------------------------+----------------+

But the field_name_length and field_name seem to be missing. The ptr-to-data looks right (idx 12 is "14", which points to the start of the data section).

The last byte of the header (idx 13) is 0 and that maps to "boolean" type according to this page: https://github.com/orientechnologies/orientdb/wiki/Types, but that is wrong, since the data type is of type string.

I also tried to compare it to what this proposal doc says: https://groups.google.com/forum/#!searchin/orient-database/varint$20variable$20length$20int/orient-database/8r1ES_LEDxE/rwdpxjMr-BQJ

but I am having trouble making that work and I'm not clear what parts of that proposal were actually accepted and implemented.

Please help.

Thanks,
-Michael

Michael Peterson

unread,

Feb 7, 2015, 8:50:16 PM2/7/15

to orient-...@googlegroups.com

I have done some additional analysis by stepping through the Java client code to see what it writes out.

For this code:

    ODocument person = new ODocument("Person");
    person.field("name", "Han");
    person.field("surname", "Solo");
    person.save();

The serialized record {Person[name=Han, surname=Solo]} gets written out like this:

           |---------- className ---------|
        V   6   P    e    r    s    o    n

bytes: [0, 12, 80, 101, 114, 115, 111, 110,

idx: 0 1 2 3 4 5 6 7

         |------------- Header ------------|---------------- Data ---------------|
         0 <---ptr---> 20 <---ptr---> EOH 3   H   a    n 4   S    o    l    o
bytes:   1, 0, 0, 0, 19, 41, 0, 0, 0, 23, 0, 6, 72, 97, 110, 8, 83, 111, 108, 111]
idx:     8 9 10 11 12 13 14 15 16 17 18 19 20 21   22 23 24   25   26   27

Positions 9-12 and 14-17 are (non-varint) integers that are pointers to the values in the data section. And position 18 is an end-of-header marker. While this doesn't match the documentation, those make sense to me.

What I don't get is the byte before the pointers - positions 8 and 13. According to the Java client, these are (oddly) encoded values of the Property id. In my case the "name" property has id 0 and the "surname" property has id 20. The encoding is this:

    zigzagEncode( (propertyId+1) * -1 )

so, going in reverse, 1 (val of idx 8) is -1 when zigzag decoded and working backwards:

(p + 1) * -1 = -1
p + 1 = -1/-1 = 1
p = 1 - 1 = 0, the ID of property 'name'

and 41 (val of idx 13)
zigzagDecode(41) = -21
-21 + 1 => 20, the ID of property 'surname'

So three things:

1. the schemaless serialization documentation needs to be corrected
2. what is the logic for the way the property id is encoded? Why add 1 and then make it negative, rather than just typical zigzag encoding of the ID?
3. how do I obtain the ID of a property? what call do I make in the binary protocol to get that?

Thank you
-Michael

Emanuel

unread,

Feb 7, 2015, 9:05:18 PM2/7/15

to orient-...@googlegroups.com

First point over all, we probably will remove the schemafull serialization over network soon.

anyway i'll explain inline :D

Yes, I agree is missed this part on the docs, going to be update soon.

2. what is the logic for the way the property id is encoded? Why add 1 and then make it negative, rather than just typical zigzag encoding of the ID?

The id is negative because take the place of the size of the string, so for us if is negative is id of the property, if is positive means that there is a string of that size serialized after.
the +1 is for because we cannot store id of value '0' that is used as marker of header end, so we sum 1 for allow to store also property with id 0.

3. how do I obtain the ID of a property? what call do I make in the binary protocol to get that?

the easiest way is get the schema with a "select from metadata:schema" in there there is a field called "globalProperties" with all the ids.

Thank you
-Michael

hope i helped

by
Emanuel

--

---
You received this message because you are subscribed to the Google Groups "OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to orient-databa...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward

trouble understanding schemaless serialization header for REQUEST_RECORD_LOAD

Michael Peterson

Michael Peterson

Emanuel