You can create an index file with vanilla chronicle however indexed only stores 4.5 bytes additional to the message on average . There is not a lot of redundancy.
--
You received this message because you are subscribed to the Google Groups "Chronicle" group.
To unsubscribe from this group and stop receiving emails from it, send an email to java-chronicl...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Btw all that is in the index file. In the data file is only the bytes you wrote.
You received this message because you are subscribed to a topic in the Google Groups "Chronicle" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/java-chronicle/REBMcg2fKfM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to java-chronicl...@googlegroups.com.
The vanilla chronicle file stores the length however as it writes to multiple files you can't determine the original order. Ie you could make an index file if you had only one writer.
It can to a limited degree. It is designed to be simpler and faster. In extreme cases you can turn off bounds checking and it just copies fields or raw data which is unusual in Java.
It also supports reusing mutable objects in cases you need to deserialize data. This is needed where variable length/optional fields are used to make the data more compact.
This is important as our data replication and data storage solutions are often limited by the raw bandwidth of the network and/or disk subsystem.
Fixed length messages work best when a) the data fits in memory , b) there is a high read to write ratio, c) there isn't simple opportunities to compact the data.
The biggest use case for us; market data fails on all counts so we need an efficient means of serializing/deserialization the data.
However, I am working on a higher level abstraction where the same serialization / deserialization code could handle different wire formats from;
log messages text & binary,
XML,
JSON & BSON,
YAML & a binary form of YAML
FIX,
a schema extendable raw binary format,
a variable length raw format and finally a fixed length raw format eg struct.
The idea being you provide the data, the serialization style and the library does the rest.
For formats with fields ids you can have a name (string) or a number (stop bit encoded)
Values can be compressed. Simple compression for short fields and Snappy or LZW compression for long fields.
I would like to support not only zero copy but a self describing zero copy format. Ie you can read it even under schema changes but if no schema changes you can read it straight.
It should support field name based indexing (subscriber driven) and twitter style indexing (publisher driven)
For human readability it will support redundant information like comments, hints such as formatting.
The idea is that you can test your protocol and export/import your data using the human readable format and switch to more binary/terse formats in a trade off of performance vs maintainability.
You could even translate between text formats like fix <-> Yaml to make Fix more readable when dumping it.