Newbie Questions

529 views
Skip to first unread message

Puiu Vlad

unread,
Jul 8, 2014, 5:22:33 PM7/8/14
to java-ch...@googlegroups.com
Hi,

Reading the documentation, FAQs, and the replies to various questions it is apparent that Chronicle is very good at what it is designed to do, namely to allow super fast IPC on the same machine, while persisting every message (a la Tib EMS, where a chronicle corresponds to an EMS queue). Via replication, it can be extended to IPC across machines. I am trying to figure out if I can push the envelope little further and use it for RV type subscriptions, where multiple publishers publish on the same subject. For that, I have a few questions:

1. The documentation mentions that "it is not thread safe for multiple threads to write to the same chronicle" and further "you cannot safely write to the same chronicle from multiple processes." However, in a reply recently to Vijay Veeraraghavan's question Peter replied "you can have one appender per thread. A simple way to do this is to call createAppender() in each thread" which implies that it is now safe for multiple threads to write to the same chronicle. Is my understanding correct? Does the FAQ refer to an older version of Chronicle? And if now it's safe to have multiple writers in the same process, what is the impact on performance?

2. How performant is the chronicle replication? Obviously it depends on the network hardware. If the writer to a chronicle is on one machine, the chronicle is replicated to another machine, and the reader is on a different machine, what is the latency between the writer and the reader? Are there any benchmarks?

3. Assume that the writer comes up and starts writing to the chronicle. One reader comes up one hour later and is interested only in the entries published by the writer from that moment on; it is not interested in any messages published before the reader came up. Can this be achieved in an efficient manner, i.e., not having to read all messages in the chronicle and discard them if older?

Thanks,
Vladimir

Peter Lawrey

unread,
Jul 8, 2014, 6:41:38 PM7/8/14
to java-ch...@googlegroups.com

Indexed Chronicle can have only one appender.  Vanilla Chronicle supports multiple concurrent appenders. (Only one per thread)

The concurrent vanilla version supports rolling and is easier to use. You can get up to 3 million events per second. The latency between processes is about 300 ns.

The indexed chronicle is single threaded but gets up to 40 million per second. The latency between threads is about 100 ns.

The benchmarks are in the test area for chronicle.

3) yes. Use toEnd() before reading.

--
You received this message because you are subscribed to the Google Groups "Java Chronicle" group.
To unsubscribe from this group and stop receiving emails from it, send an email to java-chronicl...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Puiu Vlad

unread,
Aug 14, 2014, 10:01:46 PM8/14/14
to java-ch...@googlegroups.com
Thanks Peter.

New question: I looked at the demo and demo2 samples and noticed that the reader thread usually goes in tight loops, continually trying to read the next excerpt. As a result I noticed that one processor (the one doing the reading presumably) has a close to 100% utilization. If I have N processes each with one such reader thread then I assume N processors will have 100% utilization. If N exceeds the number of processors then the CPU utilization will be 100% and there will be no more processing capability left for anything else. Do you see any flaw in my reasoning?

Thanks,
Vladimir

Peter Lawrey

unread,
Aug 15, 2014, 2:52:00 AM8/15/14
to java-ch...@googlegroups.com

Your reasoning is fine. If you want to minimise latency you want to have a small number of dedicated cpus. Most applications only need a small number of these critical threads. In fact 6 is the most I have ever seen for a tuned system and you can easily buy a server with much more cores than this.
Part of the reason you don't need more cpus because by this point your bottleneck has moved somewhere else such as the memory bandwidth (in synthetic test) but more often its something on the network or an upstream/downstream services over which you have no control eg an exchange.
These critical threads can feed any number of non critical threads.

Puiu Vlad

unread,
Aug 17, 2014, 6:06:25 AM8/17/14
to java-ch...@googlegroups.com
Hi Peter,

I envision a distributed system with discrete components: in addition to the standard components (like aggregator and smart order router) I would have one line handler for each exchange, and the number of exchanges is not under my control. Ideally all data paths would be critical (e.g., market data must flow from one component to the next with minimum delay) which would require tight loops, but I guess that one would need to experiment to see if indeed all are critical. Of course, one could also allocate different line handlers to different machines, but this would introduce network latencies. I guess one needs to fine tune the system by playing with the tight loops, non tight loops, and allocation of processes to machines.

Another question: market data does not need to be persisted, in the sense that I do not need a backing store for it. I am interested in the latest tick, not necessarily all I was getting while I was busy doing something else. Is persistence to the chronicle file affecting the performance of the system in any way? If so, is there a way to disable persistence for some excerpts? Obviously, I cannot miss trades, so I would still need persistence for those.

Thanks,
Vladimir

Peter Lawrey

unread,
Aug 17, 2014, 1:31:27 PM8/17/14
to java-ch...@googlegroups.com

If you have multiple sources of data, does each one need it's  own thread? Could you combine say 10 feeds in one thread and split out the ones which really need it?

Puiu Vlad

unread,
Aug 19, 2014, 9:22:58 PM8/19/14
to java-ch...@googlegroups.com
Hi Peter,

That's obviously a balancing act between flexibility and performance. It could be difficult to gage which 10 of the, say, 17 feeds could be combined for optimum performance (i.e., to be crammed into the same process). And being happy with the system for 3 months, when it's time to add 3 more feeds the tuning needs to be done anew. Also, if there is a problem with one of the feeds and it needs a restart, then all the remaining 9 will be restarted as well.

You have not answered my question about selectively persisting excerpts to the chronicle file. I assume the answer is no, but can you please confirm?

Thanks,
Vladimir

Peter Lawrey

unread,
Aug 20, 2014, 2:37:16 AM8/20/14
to java-ch...@googlegroups.com

If you only need the latest I would consider using a chronicle map. This only stores the latest. If you don't need persistence you can write it to a tmpfs file system so it doesn't actually touch a disk. (This trick works better for the map than the queue)  doing so can save about 300 ns on a typical market update message.
You are right that chronicle queue isn't so easy to use if you don't need a copy of the data but many organisations find a copy of what market data you received with timestamps if useful in a latency sensitive environment.
You can use packet capture technology and get more accurate results however this is often far more complicated, often so complicated that many organisations I have seen don't use packet capture much past recording the data.
If you have more active feed handlers than cpus, getting the OS to do the scheduling is much easier but you will get delays of up to 5 ms as a result. For most applications this is not a issue and might only happen once an hour but if 5 ms is unacceptable you will want a busy waiting solution on isolated cpus.
The balance doesn't have to be perfect as you should be picking up data ASAP. Thus it should he rare that two pieces of data come in say the same microsecond or within say 10 microseconds. In any case a 10 micro-seconds delays is surprisingly common if you use the OS scheduler. See my vanilla java blog article on micro-jitter.

Puiu Vlad

unread,
Aug 22, 2014, 5:52:22 AM8/22/14
to java-ch...@googlegroups.com
Thanks Peter, another question: can I have one source with multiple sinks (on different machines) or do I need one source for each sink?

Peter Lawrey

unread,
Aug 22, 2014, 6:03:43 AM8/22/14
to java-ch...@googlegroups.com
You can have any number of sinks on any number of machine in theory.  In practice we don't test more than a few connections and if you had hundreds of machine I would expect to see scalability issues.

Puiu Vlad

unread,
Oct 26, 2014, 12:27:08 AM10/26/14
to java-ch...@googlegroups.com
Hi Peter,

In chronicle-1.9.1 there used to be a demo2 sample that exemplified how to use the source and sink. I can not find that sample in chronicle-3.2.6. How are the source and sink being used in the latest version?

Thanks,
Vladimir

Luca Burgazzoli

unread,
Oct 26, 2014, 7:34:44 AM10/26/14
to java-ch...@googlegroups.com


- set up a source
InetSocketAddress address = new InetSocketAddress("localhost", 9876);
Chronicle chronicle = new VanillaChronicle("/tmp/chronicle");
Chronicle source = new ChronicleSource( chronicle , address)

- set up an in memory sink (does not replicate the source locally):
Chronicle sink1 = new ChronicleSink(address)

- set up a replica sink (creates a copy of the source to the given chronicle):
Chronicle chronicle1 = new VanillaChronicle("/tmp/chronicle-1");
Chronicle sink1 = new ChronicleSink(chronicle1, address);

- set up a shared chronicle sink (shares a chronicle with the source and uses tcp/ip to avoid busy loops and query the source for new data):
Chronicle sink2 = new ChronicleSink(chronicle, ChronicleSinkConfig.DEFAULT().clone().sharedChronicle(true), address)



--
You received this message because you are subscribed to the Google Groups "Chronicle" group.

Puiu Vlad

unread,
Oct 29, 2014, 10:59:47 PM10/29/14
to java-ch...@googlegroups.com
Thanks Luca, I gave the tests a cursory glance, but I will do it again more carefully. But in the meantime, maybe you could answer some quick questions.

Both the source and the sink (replica and shared) wrap chronicles. A chronicle has methods to create appenders and tailers, and so do sinks and sources (sink does not create apenders though). 

What is the difference between the source appender and its underlying chronicle appender (and the source tailer and underlying chronicle tailer, and the sink tailer and the underlying chronicle tailer)? Are they interchangeable objects? Why am I writing to the source appender and not the underlying chronicle appender?

What I am trying to accomplish is to write to a source which disseminates records to sinks connected to it, and also that can be read as a chronicle by local processes. And on the sink side, if the records get replicated as soon as the sink connects to the source can I just read them (in a different process) from the underlying chronicle? Or am I misunderstanding the sinks and sources?

Thanks,
Vladimir

Peter Lawrey

unread,
Oct 30, 2014, 4:07:10 AM10/30/14
to java-ch...@googlegroups.com

If you have two processes on the same machine, all you need is a chronicle eg VanillaChronicle.
If you need distribution between machines this requires replication. We only support TCP replication for chronicle.
When using replication you need to work with the wrapped source and sink as these implement the notification and receiving of changes in an efficient near sychronous manner. I.e. A background thread is notified when you append an entry to send it ASAP without slowing the appending thread.

Puiu Vlad

unread,
Oct 31, 2014, 10:51:00 PM10/31/14
to java-ch...@googlegroups.com
Hi Peter,

I am trying to use a tight loop to read from a set of chronicles, some of them local and some of them remote.

I am relying on the method nextIndex(), available for both AbstractPersistentSinkExcerpt and IndexedExcerptTailer. 

However, the semantics of the two methods is different. While IndexedExcerptTailer.nextIndex() returns false if there is no new record to read from the excerpt, AbstractPersistentSinkExcerpt.nextIndex() blocks (for a while) rather than returning false right away in two circumstances: 

(1) if there is no source connected to the sink, the debugger shows that the blockage is in method SinkConnector.open():

        public boolean open() {
            while (!closed) {
                try {
                    buffer.clear();
                    buffer.limit(0);

                    this.channel = SocketChannel.open(address);
                    this.channel.socket().setTcpNoDelay(true);
                    this.channel.socket().setReceiveBufferSize(config.minBufferSize());
                    logger.info("Connected to " + address);

                    return true;
                } catch (IOException e) {
                    logger.info("Failed to connect to {}, retrying", address, e);
                }

                try {
                    Thread.sleep(config.reconnectDelay());
                } catch (InterruptedException e) {
                    Thread.currentThread().interrupt();
                }
            }

            return false;
        }

SocketChannel.open(address) throws IOException (connection refused) then the thread sleeps for 500 ms and then SocketChannel.open(address) is retried. Moreover, SocketChannel.open(address) takes a noticeable amount to return, which is bad in a tight loop.

(2) if the sink is connected to a source, but the source has not published a record, the blockage is in SinkConnector.read().

        public boolean read(int threshod, int size) throws IOException {
            if(!closed) {
                int rem = buffer.remaining();
                if (rem < threshod) {
                    if (buffer.remaining() == 0) {
                        buffer.clear();
                    } else {
                        buffer.compact();
                    }

                    while (buffer.position() < size) {
                        if (channel.read(buffer) < 0) {
                            channel.close();
                            return false;
                        }
                    }

                    buffer.flip();
                }
            }

            return !closed;
        }

Here channel.read(buffer) again takes a noticeable time to return (which is bad in a tight loop), the buffer is flipped, and finally the method returns false, as it should.

So, it appears that reading from a sink in a tight loop is not such a good idea. Are there alternative solutions to reading from a set of local and remote chronicles? Or are there any settings that I missed?

Thanks,
Vladimir

Peter Lawrey

unread,
Nov 1, 2014, 3:11:22 AM11/1/14
to java-ch...@googlegroups.com

At the moment,  it is assumed that if you have a chronicle sink the thread won't be consuming another chronicle as well. Ie it is based on blocking io.

We could change this but we would need to understand the use case better.

Puiu Vlad

unread,
Nov 1, 2014, 7:55:12 AM11/1/14
to java-ch...@googlegroups.com
The use case is as follows: process p1 sits on host h1 and publishes messages. Process p2 sits on host h2 and publishes messages. Process p3 sits on h1 and listens to messages published by p1 and p2.

P1 and p3 communicate via an indexed chronicle (for example) while p2 and p3 communicate via a source sink pair.

Peter Lawrey

unread,
Nov 1, 2014, 9:13:46 AM11/1/14
to java-ch...@googlegroups.com
With VanillaChronicle, you could have a p4 which reads from p2's q, and writes to the same q as p1.  This way p3 is reading one queue in a non-blocking way.

On 1 November 2014 11:55, Puiu Vlad <puiu...@gmail.com> wrote:
The use case is as follows: process p1 sits on host h1 and publishes messages. Process p2 sits on host h2 and publishes messages. Process p3 sits on h1 and listens to messages published by p1 and p2.

P1 and p3 communicate via an indexed chronicle (for example) while p2 and p3 communicate via a source sink pair.

Puiu Vlad

unread,
Nov 5, 2014, 7:39:01 PM11/5/14
to java-ch...@googlegroups.com
Hi Peter,

Indeed, reading from a sink and writing to a local chronicle on a blocking thread, and then reading from the local chronicle (non blocking) in a tight loop seems a good idea.

However, I am having trouble implementing it: it seems that in certain circumstances VolatileExcerptTailer.nextIndex() is reading two records when only one is published on its port (the second record may come from another sink on a different port?). 

I have built a simplified test case that exemplifies this issue (see the attached file). The application consists of three processes: P1, P2 and P3. One needs to start P2 and P3 first in any order and then P1. The flow is as follows:

(1) in a loop, P1 writes a record to chr1 and waits to read a record from chr5.
(2) P2 reads a record from chr1 and writes it to a source chr2 on port2. On a separate thread, a NonBlockingSink reads from a source on port5 and writes to chr5.
(3) in its NonBlockingSink, on a separate thread, P3 reads from a source on port3 (same as port2) and writes to chr3. On its main thread, in a tight loop, P3 reads from chr3 and writes to source chr4 on port4 (same as port5).

P1, P2, and P3 are all running on the same machine, but the intent is to have P1 and P2 on one host and P3 on a different host (that's why P2 and P3 communicate via source/sink pairs).

P1 is supposed to send one record, wait until it reads it coming back, send another record and so on, for a total of 10 records. However, it receives only 1 record and then blocks.

I added some tracing to ChronicleSink.VolatileExcerptTailer.nextIndex() (see below) and I get the following trace in P3 (but a similar trace is obtained in P2):

excerptSize=20 Zamolxis/127.0.0.1:15311
receivedIndex=0 Zamolxis/127.0.0.1:15311 Thread-1
positionAddr=188696728
reading from sink 15311
read from sink 15311 { long=1000 string=foo bar int=0 }
wrote to local { long=1000 string=foo bar int=0 }
excerptSize=1000 Zamolxis/127.0.0.1:15311
receivedIndex=8029748840875687936 Zamolxis/127.0.0.1:15311 Thread-1

Notice the second excerptSize=1000 (I only send small messages) and receivedIndex=8029748840875687936 (while the first index was 0).

Am I doing something wrong? I am using version 3.2.6-SNAPSHOT of chronicle.

Thanks,
Vladimir


                int excerptSize = buffer.getInt();
                long receivedIndex = buffer.getLong();

                System.out.println("excerptSize="+excerptSize+" "+address);

                switch (excerptSize) {
                    case ChronicleTcp.IN_SYNC_LEN:
                    case ChronicleTcp.PADDED_LEN:
                    case ChronicleTcp.SYNC_IDX_LEN:
                        return false;
                }

                System.out.println("receivedIndex="+receivedIndex+" "+address+" "+Thread.currentThread().getName());

                if (excerptSize > 128 << 20 || excerptSize < 0) {
                    throw new StreamCorruptedException("Size was " + excerptSize);
                }

                if(buffer.remaining() < excerptSize) {
                    if(!connector.read(excerptSize)) {
                        return false;
                    }
                }

                index = receivedIndex;
                positionAddr = startAddr + buffer.position();
                limitAddr = positionAddr + excerptSize;
                lastSize = excerptSize;
                finished = false;
                System.out.println("positionAddr="+positionAddr);
test.artifact.zip

Luca Burgazzoli

unread,
Nov 6, 2014, 5:16:23 AM11/6/14
to java-ch...@googlegroups.com
Hi Vladimir,
I'm looking at it. 

Could you please open an issue on GitHub please?

Luca
--
lb

Puiu Vlad

unread,
Nov 6, 2014, 6:53:02 PM11/6/14
to java-ch...@googlegroups.com
To unsubscribe from this group and stop receiving emails from it, send an email to java-chronicle+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.


--
lb

Puiu Vlad

unread,
Nov 6, 2014, 6:55:39 PM11/6/14
to java-ch...@googlegroups.com
One more thing: when I change all IndexedChronicles to VanillaChronicles, P3 never gets any message from P2 on its sink. Am I doing something wrong?
Reply all
Reply to author
Forward
0 new messages