REQUEST_RECORD_LOAD - sequential client implementation not possible?

115 views
Skip to first unread message

mindplay.dk

unread,
Dec 8, 2014, 11:19:07 AM12/8/14
to orient-...@googlegroups.com
I'm trying to tackle REQUEST_RECORD_LOAD as the first useful function in my PHP client. (I have the basics like connect and open, error handling, etc. working so far.)

This being a PHP client, one major concern for me, is to avoid parsing (with a state machine, as was necessary with the old format) since this is extremely inefficient in PHP - this is one reason I'm targeting OrientDB 2.0 and the new binary format exclusively, as this appears to make that possible (?)

Unfortunately, the response format of REQUEST_RECORD_LOAD itself appears to make that impossible.

[(payload-status:byte)[(record-content:bytes)(record-version:int)(record-type:byte)]*]+

In order to read sequentially over "record-content", I need to know the "record-type" in advance, so the order of this data appears to be wrong? I believe the record format of each payload chunk would need to backwards, basically:

[(payload-status:byte)[(record-type:byte)(record-version:int)(record-content:bytes)]*]+

Otherwise, I am forced to load the whole record-content into memory first, before I can know how to interpret the data.

Or am I missing something here?

Also, it appears the "record-content" is in the old CSV format, regardless of my having selected the new binary serialization format? Does the REQUEST_RECORD_LOAD command not support the new binary serialization format? Is it not supported everywhere yet?

I really do not want a client that has to load and then parse in two stages - this adds considerable complexity, run-time overhead, and duplicates everything in-memory while loading. I'm probably doing something wrong or missing something obvious?

mindplay.dk

unread,
Dec 9, 2014, 5:38:07 AM12/9/14
to orient-...@googlegroups.com
Is there a different group for developers with more technical questions?

I want to help bring OrientDB to php - is this the right place for that? Or is nobody interested?

Curtis Mosters

unread,
Dec 9, 2014, 6:36:40 AM12/9/14
to orient-...@googlegroups.com
Well there is no other Google Group. But why not use the Github already existing PHP OrientDB projects?

https://github.com/AntonTerekhov/OrientDB-PHP
https://github.com/doctrine/orientdb-odm
https://packagist.org/packages/orientdb-php/orientdb-php

I don't know but this would be way better to do it there. WDYT?

Rasmus Schultz

unread,
Dec 9, 2014, 6:48:57 AM12/9/14
to orient-...@googlegroups.com
Doctrine is the only one of those projects that still have any traction - and it's a full scale data mapper, what we need is a simple driver/client.

We are of course referencing those projects for lots of implementation details, but we're shooting for something much simpler and more low-level, something people can use to build their own mappers/DAO/AR implementations on top of.

We're also designing the whole thing using very basic OOP patterns (no traits) in the hopes of porting this to a native extension (e.g. Zephir) eventually.

We're also designing the whole thing with zero dependencies on other libraries.

So we have somewhat different objectives from the other projects, and more of a minimalist mindset, I think.


--

---
You received this message because you are subscribed to a topic in the Google Groups "OrientDB" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/orient-database/9CKEun_WrrA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to orient-databa...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Emanuel

unread,
Dec 9, 2014, 11:02:35 AM12/9/14
to orient-...@googlegroups.com
We have also other drivers php this one https://github.com/orientechnologies/php-orientdb that also already have a few forks (example this : https://github.com/Ostico/PhpOrient ).

i would like to say that is better have less drivers more update and i warn you, write a driver from scratch is not so easy as it seems :)

anyway you are free to do so ;)

one big step is actually implement the serialize/deserialize  of hte document correctly from the binary serialization, that is quite complex and can be also target of evolution/optimization in not to far future.

Here in orient we are evaluating to give an easier way to read/write the document on the binary protocol, but i will open another thread on this :)

bye

Emanuel
You received this message because you are subscribed to the Google Groups "OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to orient-databa...@googlegroups.com.

Rasmus Schultz

unread,
Dec 9, 2014, 11:15:13 AM12/9/14
to orient-...@googlegroups.com
We are well aware that this is no small undertaking, but we believe in OrientDB and we think it's worthwhile.

> one big step is actually implement the serialize/deserialize  of hte document correctly from the binary serialization

To my knowledge, that has not been done in php yet, by anyone? All existing implementations, including the fork by Ostico, support only the CSV style serialization. The binary serialization format actually ought to be a lot easier to implement, as it won't require a state machine/parser like the CSV format - and also should be a lot more CPU friendly, memory efficient, and less bandwidth overhead, so we're targeting that exclusively.

We're also targeting the most recent protocol, which already differs substantially from what we were able to reference from existing implementations, which are based on older versions of the protocol. We hope to support the final version of the protocol when OrientDB 2.0 is released - we do not want this client library to only support a legacy protocol from inception.

As said though, it doesn't appear that REQUEST_RECORD_LOAD respects the serializer setting - it appears to always return records in the CSV format. If this a missing feature or server-side issue, we won't get very far with our client anytime soon... Either way, we need someone who can at least answer the question and help set us on the right path.

At this moment, we are stalled, since we don't even know if the server is behaving correctly, or whether we need to support the CSV format or not.

GoorMoon

unread,
Dec 10, 2014, 7:09:42 AM12/10/14
to
Hey,
I don't know about PHP driver but, i contribute to .NET Driver https://github.com/orientechnologies/OrientDB-NET.binary
i implemented a lot of features of Binary Serializer, and may help you with your question.
About RECORD_LOAD i don't have any problem and get document in binary format.

You welcome to chat with me on gitter https://gitter.im/GoorMoon/OrientDB-NET.binary

Rasmus Schultz

unread,
Dec 11, 2014, 3:55:46 AM12/11/14
to orient-...@googlegroups.com

This looks great, thanks! So much simpler than the CSV serializer.

I see that you do have to buffer the record in memory still, as I suspected. I really do wish they would make the change I suggested below, putting the record format before the actual record - I think then you wouldn't need to buffer records in memory before you can deserialize. Thoughts?

On Dec 10, 2014 1:09 PM, "GoorMoon" <goor...@gmail.com> wrote:
Hey,
I don't know about PHP driver but, i contribute to .NET Driver https://github.com/orientechnologies/OrientDB-NET.binary
i implemented a lot of features of Binary Serializer, and may help you with your question.
About RECORD_LOAD i don't have any problem and get document in binary format.
On Tuesday, December 9, 2014 6:15:13 PM UTC+2, mindplay.dk wrote:

GoorMoon

unread,
Dec 11, 2014, 1:10:51 PM12/11/14
to orient-...@googlegroups.com
I am agree with you, but this not our decision.
I suggest you to open issue here https://github.com/orientechnologies/orientdb that describe your purpose.

Rasmus Schultz

unread,
Dec 11, 2014, 1:15:58 PM12/11/14
to orient-...@googlegroups.com
Of course, I just wanted to see if someone with more experience with Orient agreed with me, before opening an issue.

Thanks :-)

Rasmus Schultz

unread,
Dec 11, 2014, 1:32:01 PM12/11/14
to orient-...@googlegroups.com

On Thu, Dec 11, 2014 at 7:10 PM, GoorMoon <goor...@gmail.com> wrote:

Rasmus Schultz

unread,
Dec 11, 2014, 4:03:08 PM12/11/14
to orient-...@googlegroups.com
That was a fast decision - very happy to see them reacting to this issue so quickly and slating it for the 2.0 release! :-)

So I'm referencing a bunch of your code for my client now, and I'm hung up on a small issue, maybe you can point me in the right direction...

The so-called "varint" type in the binary serialization - I understand it's a variable-size integer, encoded like UTF-8 character codes. Where or how do you handle this in your implementation?

Rasmus Schultz

unread,
Dec 11, 2014, 4:41:05 PM12/11/14
to orient-...@googlegroups.com
Never mind, spotted it :-)

GoorMoon

unread,
Dec 11, 2014, 5:29:14 PM12/11/14
to orient-...@googlegroups.com
Glad to hear !!!

Luca Garulli

unread,
Dec 12, 2014, 5:46:16 AM12/12/14
to orient-database
Authors of drivers have such kind of high priority on requests :-)

Lvc@

Rasmus Schultz

unread,
Dec 12, 2014, 6:44:01 AM12/12/14
to orient-...@googlegroups.com
Glad to hear that, thanks :-)

So on a related note - the "varint" type used in the OrientDB binary protocol, what specification does it follow precisely? Because apparently there are lots of ways to encode a variable-size integer.

I sort of wish there was an option for the client to disable variable-length integers in the protocol, instead encoding them with a fixed size.

I can implement UTF-8 style reading/writing of variable-size integers in PHP, but this is going to add considerable CPU overhead - in the case of a PHP client (probably other scripting languages too) a small amount of bandwidth overhead is likely preferable to CPU overhead. What we want is a fast client - whether that means using a little more bandwidth is probably secondary, as is the ability to support more than 2 billion records for most projects.

Just putting that out there :-)

But for the time being, can you point me to a specification or (better) a reference implementation (in any language) of the VLI encoding used by OrientDB?

I can reference the one in GoorMoon's .NET driver, or the one on wikipedia, but neither of them appear to have tests, and I'm unsure how to test them. Go has a nice implementation with tests, but since there are so many types of VLI which don't appear to have any official names or standardization, I can't be sure it's the same type of encoding...

GoorMoon

unread,
Dec 12, 2014, 6:49:06 AM12/12/14
to orient-...@googlegroups.com

Rasmus Schultz

unread,
Dec 12, 2014, 6:53:42 AM12/12/14
to orient-...@googlegroups.com
I know, but that's hardly a specification - not enough to reference for an implementation.

For now, I will try to port your implementation...

Rasmus Schultz

unread,
Dec 12, 2014, 9:56:57 AM12/12/14
to orient-...@googlegroups.com
Yikes, I blew my entire day on this.

Can't find a PHP implementation, can't get a port to work, because PHP only has one numeric type, and it's a 32-bit signed integer.

What's worse, it's platform-dependent and could be either 32-bit or 64-bit.

I'm afraid we're at a dead end with this client, unless somebody else can figure out how to read/write variable-size integers, or unless OrientDB offers a protocol option for clients in languages that don't have proper support for native numerical types... man, PHP stinks :-(

GoorMoon

unread,
Dec 12, 2014, 11:51:35 AM12/12/14
to orient-...@googlegroups.com
What if you use byte array to represent and manipulate variable-size integers ?

Rasmus Schultz

unread,
Dec 12, 2014, 1:12:49 PM12/12/14
to orient-...@googlegroups.com
There is no byte type in PHP, hahaha... I know, right?! :-)

There is only one integer type in PHP, and it's always signed, and always 32-bit or 64-bit depending on your hardware and OS.

So of course you can juggle values within byte-range, but you can't do unsigned bit arithmetic because there is no unsigned integer type, so it's going to be clunky.

And it's going to be horrendously slow since every byte you're working with is actually 32 or 64 bits, and every arithmetic or bitwise operation is actually a full word or long-word operation.

All in all, writing it in PHP probably isn't a good choice to begin with, as PHP just isn't suitable for this kind of low-level stuff.

My colleague suggests writing a PHP module wrapper for the C API, but that's not a great option either - proliferation is a really big concern for me, I'd like to have something you can deploy in most hosting environments without building a custom extension.

I really hope to see OrientDB catch on and become a real alternative to MySQL, but step one is a working client. I'm not even a fan of PHP particularly, but it is the leading web platform, and it's what I do for a living, so... :-)

Where in the OrientDB codebase is variable-size integers implemented?

Or is a native Java feature? If so, it must be documented somewhere?

This is a huge roadblock for something that should be trivial, so I can get to implementing the actual protocol and client - the documentation really needs to include a proper description and/or links and/or reference implementation, preferably all of those... I am completely in the dark here.

GoorMoon

unread,
Dec 12, 2014, 2:26:17 PM12/12/14
to orient-...@googlegroups.com

Rasmus Schultz

unread,
Dec 13, 2014, 7:10:46 AM12/13/14
to orient-...@googlegroups.com
Looks like your implementation is a source to source port?

So I'm none the wiser.

This may be possible in PHP, but it's going to be based on horrible work-arounds, it will be slow, and it will have some ugly limitations.

Really sad to get this far and have to drop the whole thing because of such a small trivial technical thing, but this is obviously not a good fit for PHP, and with the limitations this will have, it doesn't seem like it's going to be worth the effort.

Now thinking about writing a binary driver in Zephir. But I really don't want a PHP module for something that is without a doubt going to need ongoing maintenance and continuous upgrades.

The only other option is the REST API, which might be a more realistic choice for PHP - this was actually my first choice, because PHP isn't really suitable for a binary driver, I guess I hadn't realized yet just how unsuitable it is...

However, I gave up on the REST API earlier, because it turns out it denormalizes links:


Should I file this as a bug report? I guess it might be "by design", but it seems like a strange choice - why wouldn't it work just like the binary API, just with data-structures in JSON of course, but not with substantial differences in terms of the data/structures you get in a response? And certainly not duplicating data structures, which adds unnecessary encoding overhead on the server, network overhead, and decoding overhead on the client...

mindplay.dk

unread,
Dec 17, 2014, 4:04:55 AM12/17/14
to orient-...@googlegroups.com
I finally have a working implementation, with some pretty ugly (platform-dependent) limitations - good enough for now, but I decided to open a feature request for an option to encoding variable-size integers in a more CPU-friendly way.


On Saturday, December 13, 2014 1:10:46 PM UTC+1, mindplay.dk wrote:
Looks like your implementation is a source to source port?

So I'm none the wiser.

This may be possible in PHP, but it's going to be based on horrible work-arounds, it will be slow, and it will have some ugly limitations.

Really sad to get this far and have to drop the whole thing because of such a small trivial technical thing, but this is obviously not a good fit for PHP, and with the limitations this will have, it doesn't seem like it's going to be worth the effort.

Now thinking about writing a binary driver in Zephir. But I really don't want a PHP module for something that is without a doubt going to need ongoing maintenance and continuous upgrades.

The only other option is the REST API, which might be a more realistic choice for PHP - this was actually my first choice, because PHP isn't really suitable for a binary driver, I guess I hadn't realized yet just how unsuitable it is...

However, I gave up on the REST API earlier, because it turns out it denormalizes links:


Should I file this as a bug report? I guess it might be "by design", but it seems like a strange choice - why wouldn't it work just like the binary API, just with data-structures in JSON of course, but not with substantial differences in terms of the data/structures you get in a response? And certainly not duplicating data structures, which adds unnecessary encoding overhead on the server, network overhead, and decoding overhead on the client...

To unsubscribe from this group and all its topics, send an email to orient-database+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages