Chronicle Queue recovery?

285 views
Skip to first unread message

Andrei Pozolotin

unread,
Jan 11, 2015, 11:19:12 AM1/11/15
to java-ch...@googlegroups.com
I am curious if Chronicle Queue queue.data can be recovered when queue.index is lost?
In other words, does queue.data entries contain size header? Is there official API for that? Test case? 
Thank you.

Peter Lawrey

unread,
Jan 11, 2015, 11:51:26 AM1/11/15
to java-ch...@googlegroups.com

You can create an index file with vanilla chronicle however indexed only stores 4.5 bytes additional to the message on average . There is not a lot of redundancy.

--
You received this message because you are subscribed to the Google Groups "Chronicle" group.
To unsubscribe from this group and stop receiving emails from it, send an email to java-chronicl...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Peter Lawrey

unread,
Jan 11, 2015, 12:02:51 PM1/11/15
to java-ch...@googlegroups.com

Btw all that is in the index file. In the data file is only the bytes you wrote.

Andrei Pozolotin

unread,
Jan 11, 2015, 1:13:54 PM1/11/15
to java-ch...@googlegroups.com
just to confirm: neither "vanilla" nor "indexed" store entry size as part of entry in the queue.data?
and to confirm again :-) if queue.index file is lost then queue.data is not recoverable?
You received this message because you are subscribed to a topic in the Google Groups "Chronicle" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/java-chronicle/REBMcg2fKfM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to java-chronicl...@googlegroups.com.

Peter Lawrey

unread,
Jan 12, 2015, 12:09:16 AM1/12/15
to java-ch...@googlegroups.com

The vanilla chronicle file stores the length however as it writes to multiple files you can't determine the original order. Ie you could make an index file if you had only one writer.

Andrei Pozolotin

unread,
Jan 12, 2015, 7:47:06 AM1/12/15
to java-ch...@googlegroups.com
got it, thank you.

what we are thinking about is using chronicle queue for both historical and real time streams
to avoid code and data format duplication. carrying index file the in history store seems redundant.

so current options seems either to carry the index file or to re-implement "indexed" chronicle with index recovery support.

Peter Lawrey

unread,
Jan 12, 2015, 9:10:12 AM1/12/15
to java-ch...@googlegroups.com
Since you can't recreate the index from the data files, this is not redundant data, nor would it save space by putting everything in one file.

What it would do is simplify the design conceptually and make it easier to maintain i.e. as you have one file instead of two. However, the one file approach is likely to be the same size or be larger.

Andrei Pozolotin

unread,
Jan 12, 2015, 9:49:18 AM1/12/15
to java-ch...@googlegroups.com
agreed. the remaining uneasiness is "what if 2 files are not in sync?"

Peter Lawrey

unread,
Jan 12, 2015, 11:06:24 AM1/12/15
to java-ch...@googlegroups.com
Unless you are copying the files while being written to, the two files should be no more or less likely to be in sync than one file is out of sync with itself (due to out of order writes)
If you copy the index after the data file, the extra data will be truncated. If the data is old compared to the index, you will have empty or possibly truncated entry.

Andrei Pozolotin

unread,
Jan 12, 2015, 11:23:01 AM1/12/15
to java-ch...@googlegroups.com
ok, clear.

Peter Lawrey

unread,
Jan 12, 2015, 11:46:55 AM1/12/15
to java-ch...@googlegroups.com
The consistency is assumed to be provided by the memory barriers.  This doesn't guarantee the data has been written to disk on a power failure, in which case the only way to get back for sure is replication on a running host. Anything else is unreliable, though there is no need to make it more unreliable than necessary.

One thing we could look at doing is providing a C++ and C# implementation for memory mapped data, replication and remote access. We would need to take a more cross platform approach to the file format. e.g. the handling of object serialization.

I am looking at a Binary form of Yaml as one option.

Andrei Pozolotin

unread,
Jan 12, 2015, 12:49:56 PM1/12/15
to java-ch...@googlegroups.com
that format would have to be a zero copy, such as Cap'n Proto / SBE / FlatBuffers?
http://kentonv.github.io/capnproto/news/2014-06-17-capnproto-flatbuffers-sbe.html

Peter Lawrey

unread,
Jan 12, 2015, 2:54:24 PM1/12/15
to java-ch...@googlegroups.com
I agree, we have had zero copy, mutable, random access as part of Java-Lang for some time, at least 18 months.

Andrei Pozolotin

unread,
Jan 12, 2015, 5:41:35 PM1/12/15
to java-ch...@googlegroups.com
is it something which can stand along as complete message codec and compete against Cap'n Proto / SBE / FlatBuffers?

Peter Lawrey

unread,
Jan 13, 2015, 2:53:31 AM1/13/15
to java-ch...@googlegroups.com

It can to a limited degree. It is designed to be simpler and faster. In extreme cases you can turn off bounds checking and it just copies fields or raw data which is unusual in Java.
It also supports reusing mutable objects in cases you need to deserialize data. This is needed where variable length/optional fields are used to make the data more compact.
This is important as our data replication and data storage solutions are often limited by the raw bandwidth of the network and/or disk subsystem.
Fixed length messages work best when a) the data fits in memory , b) there is a high read to write ratio, c) there isn't  simple opportunities to compact the data.
The biggest use case for us; market data fails on all counts so we need an efficient means of serializing/deserialization the data.

However, I am working on a higher level abstraction where the same serialization / deserialization code could handle different wire formats from;
log messages text & binary,
XML,
JSON & BSON,
YAML & a binary form of YAML
FIX,
a schema extendable raw binary format,
a variable length raw format and finally a fixed length raw format eg struct.

The idea being you provide the data, the serialization style and the library does the rest.
For formats with fields ids you can have a name (string) or a number (stop bit encoded)
Values can be compressed. Simple compression for short fields and Snappy or LZW compression for long fields.
I would like to support not only zero copy but a self describing zero copy format. Ie you can read it even under schema changes but if no schema changes you can read it straight.
It should support field name based indexing (subscriber driven) and twitter style indexing (publisher driven)
For human readability it will support redundant information like comments, hints such as formatting.
The idea is that you can test your protocol and export/import your data using the human readable format and switch to more binary/terse formats in a trade off of performance vs maintainability.
You could even translate between text formats like fix <-> Yaml to make Fix more readable when dumping it.

Andrei Pozolotin

unread,
Jan 13, 2015, 10:52:05 PM1/13/15
to java-ch...@googlegroups.com
wow, now that reads like a new bold OpenHTF project declaration, "a holy grail of financial data stream processing"

probably should actually go ahead and create the project, to have one more reason not to let the good idea go out of existence
Reply all
Reply to author
Forward
0 new messages