GenericRecord/Portable usage/availability

129 views
Skip to first unread message

Pranas Baliuka

unread,
Aug 26, 2020, 7:23:54 PM8/26/20
to Hazelcast
Hello,

When it may be released?

Do I understand correctly the GenericRecord API allow accessing Portable and perform lazy deserialization. Does it work on Client side?

Is linear scan performed on each record in order to find a field? i.e. can Portable/GenericRecord work with 1K+ fields? 

I'd like to have ability to access to some of fields but not neceserally all of them as it's required on client side Portable impl at the same time have ability to have server side lazy parsing optimisations.

FlatBuffers is quite nice compact format for strict schema objecgts, but deser would require to copy of memory on every touched entity (e.g. ArraySerializable) and would create wast amount of garbage objects on the server side.

Can someone share experience in using wide column data structures on Hazelcast/Jet?

Thanks for sharing your experince!

M. Sancar Koyunlu

unread,
Aug 27, 2020, 3:08:58 AM8/27/20
to Hazelcast
Hi, 
The main objective of GenericRecord is to allow access to the data even if the related classes are not on the classpath. You can see draft documentation here
https://github.com/hazelcast/hazelcast-reference-manual/issues/859

As you expect, GenericRecord interface indeed planned to give lazy deserialization but we could not do it for Portable. 
It is mostly because Portable's format does not allow copying the buffer into another buffer without altering its content(offsets needs to be rearranged because they are not relative). Mainly problem arose when someone wants to put a GenericRecord as a field onto another GenericRecord. That is why we decided not to do it for Portable and mostly because we have a plan for a new format to replace the current format. It is a work in progress right now. Idea is to keep the Generic Record API. It will be designed to give proper lazy deserialization along with other benefits. Stay Tuned.



Visit our website at http://www.algoteq.com

Privileged/Confidential information may be contained in this message and may be subject to legal privilege. Access to this e-mail by anyone other than the intended is unauthorised. If you are not the intended recipient (or responsible for delivery of the message to such person), you may not use, copy, distribute or deliver to anyone this message (or any part of its contents) or take any action in reliance on it. In such case, you should destroy this message, and notify us immediately. If you have received this email in error, please notify us immediately by e-mail and delete the e-mail from your computer. If you or your employer does not consent to internet e-mail messages of this kind, please notify us immediately. All reasonable precautions have been taken to ensure no viruses are present in this e-mail. As Algoteq Pty Ltd cannot accept responsibility for any loss or damage arising from the use of this e-mail or attachments we recommend that you subject these to your virus checking procedures prior to use. The views, opinions, conclusions and other informations expressed in this electronic mail are not given or endorsed by the company unless otherwise indicated by an authorised representative independent of this message.

--
You received this message because you are subscribed to the Google Groups "Hazelcast" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hazelcast+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hazelcast/a3cd22d2-8d78-4977-97a3-2d8523ee9aa6n%40googlegroups.com.


--
Sancar Koyunlu
Software Engineer
   hazelcast®
 
 
2 W 5th Ave, Ste 300 | San Mateo, CA 94402 | USA
+1 (650) 521-5453 | hazelcast.com



This message contains confidential information and is intended only for the individuals named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. If verification is required, please request a hard-copy version. -Hazelcast

Pranas Baliuka

unread,
Aug 27, 2020, 3:31:05 AM8/27/20
to haze...@googlegroups.com
Thanks for the heads up about the upcoming feature.
Are you still planning to keep meta-data of each field in UTF-16 form e.g. "longLabel*2forUTF-16":true (1 byte as in Externalizable) * 64 boolean fields? 

Or having plans to keep class definition with offset and payload data only? 
Flatbuffers can be good inspiration for compact and efficient representation and AVRO for chema migration. You already have class definition (meta-data with ability to create good indexing) and versioning (portable by using different class definitions of each version) i.e. can provide very compact and efficient storage and retrieval.

Also may consider Erlang/ASN.1 BER style of packing booleans and/or enums (nice to have).
Writing not only UTF-16 but also Latin-1/ASCII text fields see e.g. FlatBuffers.

Thanks again and looking forward to getting rid of the current Portable at some point of time.

M. Sancar Koyunlu

unread,
Aug 27, 2020, 4:18:11 AM8/27/20
to Hazelcast
> Are you still planning to keep meta-data of each field in UTF-16 form e.g. "longLabel*2forUTF-16":true (1 byte as in Externalizable) * 64 boolean fields? 
This one of the other benefits. The metadata will be separated from the content. This will allow for better latency and more efficient memory usage. 
We have researched Flabuffers, CapnProto, Protobuf, sbe, avro and many others. Of course, it does not mean we will provide all the features. It will be a trade-off decision as usual. Design is still shaping up. 


> Also may consider Erlang/ASN.1 BER style of packing booleans and/or enums (nice to have).

I have just checked BER style boolean and enum from https://docs.oracle.com/cd/E19476-01/821-0510/def-basic-encoding-rules.html#:~:text=The%20Basic%20Encoding%20Rules%20(BER,underlying%20mechanism%20for%20encoding%20message.
Is that what you are referring to?

BOOLEAN
The value of this element is always a single byte. If all the bits in that byte are set to zero (0x00), then the value is FALSE. If one or more of the bytes is set to one, then the value is TRUE. As a result, there are 255 different ways to encode a BOOLEAN value of TRUE, but in practice it is generally encoded as 0xFF (that is, all the bits are set to one).


> Writing not only UTF-16 but also Latin-1/ASCII text fields see e.g. FlatBuffers.
We have not considered support for Latin-1/ASCII yet. 
Flat buffers mention their strings are UTF-8 in the doc. From https://google.github.io/flatbuffers/flatbuffers_internals.html :
Strings (TYPE_STRING) are similar to blobs, except they have an additional 0 termination byte for convenience, and they MUST be UTF-8 encoded (since an accessor in a language that does not support pointers to UTF-8 data may have to convert them to a native string type).
I wonder if it is exposed as a different type. Could not find any doc about it.

And thanks for the feedback. 

Pranas Baliuka

unread,
Aug 27, 2020, 7:06:15 PM8/27/20
to haze...@googlegroups.com
Thanks not duplicating meta-data is a huge improvement. Thanks for considering!

Bit syntax in Erlan (bit ancient, but idea is clear): https://erlang.org/doc/programming_examples/bit_syntax.html
ASN.1 (again bit ancient and ITU standard) intro http://luca.ntop.org/Teaching/Appunti/asn1.html BIT STRING data type 

There would be nice to be able to pack some values into word e.g. 64 bits into a single long (underlying type int/long).

For bit more advanced stuff you can check open source project Hyper - https://hyper-db.de/index.html#summary and their storage http://db.in.tum.de/downloads/publications/datablocks.pdf  scheme matching/beating both VoltDB (OLTP) and MonetDB (time series) in benchmarks.

Pranas Baliuka

unread,
Aug 27, 2020, 10:08:29 PM8/27/20
to Hazelcast
> BOOLEAN The value of this element is always a single byte. If all the bits in that byte are set to zero (0x00), then the value is FALSE. If one or more of the bytes is set to one, then the value is TRUE. As a result, there are 255 different ways to encode a BOOLEAN value of TRUE, but in practice it is generally encoded as 0xFF (that is, all the bits are set to one).

Yes, it's not very good example. I guess the IP packed defintion in ASN.1 and Erlang is better examples for meta data syntax. Some inspiration for syntax from both ASN and Erlang: http://erlang.org/documentation/doc-5.2/pdf/asn1-1.4.pdf

> Strings (TYPE_STRING) are similar to blobs, except they have an additional 0 termination byte for convenience, and they MUST be UTF-8 encoded (since an accessor in a language that does not support pointers to UTF-8 data may have to convert them to a native string type). I wonder if it is exposed as a different type. Could not find any doc about it.

The UTF-8 encoding/decoding overhead is still paid in FlatBuffers (desinged to support junior gaming developers), but impl is quite tuned and impressive otherwise.
I'd strongly suggest using separte types (if possible) and not to pay penalty of encoding decoding (branch missprediction) specially such serialization optiosn would be used for advanced users i.e. they would understand the difference.

For decent server-side storage you may get inspiration fro HyPer (not trivial, but bleading edge commercial and open source solution). 
The whole design is described and sources (not in Java) available and applicable in JVM world starting from 9 (better SIMD support). 
The stuff with fizzled pointers and super fast Bloom filters can be appplied in EE/of-heap ;-) 

M. Sancar Koyunlu

unread,
Sep 1, 2020, 3:51:01 AM9/1/20
to Hazelcast
Thanks for the advice. I will definitely check them out. 

Reply all
Reply to author
Forward
0 new messages