Not getting entire map on the client after restart

166 views
Skip to first unread message

Arun Battu

unread,
Apr 22, 2015, 12:23:49 PM4/22/15
to java-ch...@googlegroups.com

Hi,

 

I'm running chronicle map server and client on different machines. While the server is running on a unix machine, the client is running on a windows machine. Replication is enabled on both. Server persists the map in a file, but the client does not. The idea is to see how quickly the client can get the map in memory when it's started from scratch while the server is already running and has populated the map. I'm printing the map size in a loop on the client side to track map build up on the client. With below code, it takes about 30 sec for the client to get all 5M keys from the server when I start the client first time. The problem is if I stop and restart the client, the replication seems to stop after some time (the exact point when this happens varies but it is usually closer to the end of the map, after about 4.9M keys). The map size printed from the client loop does not change after that point. If I then restart the server and then restart client, all keys are replicated on client. But if I again restart the client, without restarting the server, it gets stuck again after about 4.9M keys. How can I ensure that the client gets the entire map whenever it's restarted (without restarting the server)?

 

SERVER:

 

    long totalKeyCount = 5000000L;

 

    File file = new File(file1);                  

                       

    TcpTransportAndNetworkConfig tcpConfig =

         TcpTransportAndNetworkConfig.of(port1, new InetSocketAddress(node2, port2))

            .heartBeatInterval(1L, TimeUnit.SECONDS)

            .tcpBufferSize(1000000);

                                                    

                                                                    

    final ChronicleMap<String, String> serverMap =

         ChronicleMapBuilder.of(String.class, String.class)

                    .putReturnsNull(true)

                    .entries(totalKeyCount)

                    .replication((byte) 1, tcpConfig)

                    .createPersistedTo(file);

 

CLIENT:

 

    TcpTransportAndNetworkConfig tcpConfig =

         TcpTransportAndNetworkConfig.of(port2, new InetSocketAddress(node1, port1))

                    .heartBeatInterval(1L, TimeUnit.SECONDS)

                    .tcpBufferSize(1000000);

 

                               

    final ChronicleMap<String, String> map =

         ChronicleMapBuilder

                     .of(String.class, String.class)

                     .putReturnsNull(true)

                     .entries(totalKeyCount)

                     .replication((byte) 2, tcpConfig)

                     .create();

               

Printing map size from client:

 

    while (map.size()<totalKeyCount) {

                System.out.println(map.size() + " keys");

                Thread.sleep(2000);

    }

 

Thanks.

Arun

Rob Austin

unread,
Apr 22, 2015, 12:52:29 PM4/22/15
to java-ch...@googlegroups.com
Arun

It sounds like this could be a bug.

When a client connects to the server it initially sends a flood of data ( which are the updates it missed while it was down, called the bootstrap ),
Followed by any new changes that occur from this point on. Could you try and identify if it is the bootstrap that is failing or is it failing to provide new updates.

Could you let me know if you still experience this problem with smaller volumes of data ?

As a hack  - When you restart the client,  allocate it a different identifier, does the problem go away ?

Please check the clock on your server is the same as your client.

Rob


--
You received this message because you are subscribed to the Google Groups "Chronicle" group.
To unsubscribe from this group and stop receiving emails from it, send an email to java-chronicl...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Arun Battu

unread,
Apr 22, 2015, 3:18:41 PM4/22/15
to java-ch...@googlegroups.com
Rob, 

Thanks for the quick response. The clocks on the client and server are same. Restarting client with a different identifier does not solve the problem. It still gets stuck near the end. Regarding bootstrap, when I start the client first time (after server restart), client gets all the data. The issue comes up only when I restart the client again without restarting the server. The point where the client gets stuck remains the same as long as the server is running. See the results of the 2 test below.

TEST 1:

Start server
Start client - client gets all data
Restart client - client stuck after 4995600 keys
Restart client - client stuck after 4995600 keys
any further restart of client - client gets stuck at same number, as explained above

TEST 2:

Start client - client gets all data
Restart client - client stuck after 4998411keys
Restart client - client stuck after 4998411keys
any further restart of client - client gets stuck at same number, as explained above

I will check if the new updates are flowing after client restart.

Thanks.
Arun

Rob Austin

unread,
Apr 22, 2015, 5:25:36 PM4/22/15
to java-ch...@googlegroups.com
Does it get stuck if the client and server are on the same host connected via loop back ?

Arun Battu

unread,
Apr 23, 2015, 2:18:35 PM4/23/15
to java-ch...@googlegroups.com
It does not get stuck if both client and server are running on the same host. The issue seems to be only with the bootstrapping part when client and server are running on different hosts. New updates have no issue (i.e. both client and server can see each other's updates). The bootstrapping issue exists for smaller data set as well.

Rob Austin

unread,
Apr 23, 2015, 4:55:14 PM4/23/15
to java-ch...@googlegroups.com
Does it get stuck if both the server and client are on separate Unix machines ?

Does it only fail if the server is on Unix and the client is on windows. ?

Rob

Arun Battu

unread,
Apr 24, 2015, 5:50:11 PM4/24/15
to java-ch...@googlegroups.com
Yes, it does get stuck if both the server and client are on separate Unix machines.

Rob Austin

unread,
Apr 24, 2015, 6:55:01 PM4/24/15
to java-ch...@googlegroups.com
If you have a smaller number of records does this issue still occur. if you had less than 4998411 keys, do you still see this issue ?

Thanks

Rob

Arun Battu

unread,
Apr 25, 2015, 7:44:56 PM4/25/15
to java-ch...@googlegroups.com
Yes, the issue still occurs even when I use smaller number of records. I have observed it for 20,000 or 50,000 keys also.

Rob Austin

unread,
Apr 26, 2015, 2:57:16 AM4/26/15
to java-ch...@googlegroups.com
Can you let me know which version you are using and I'll set up a test to reproduce the issue. 

Rob

Rob Austin

unread,
Apr 27, 2015, 8:07:00 AM4/27/15
to java-ch...@googlegroups.com, Peter Lawrey
Arun

I’ve raised the following JIRA - HCOLL-325 Issue when bootstrapping with a large number of entries
As you will see from the JIRA, I’ve been able to reproduce a similar issue, which I have just fixed.
In addition - I’ve added the following test case net.openhft.chronicle.map.TestReplication#testAllDataGetsReplicated

Rob

Arun Battu

unread,
Apr 27, 2015, 8:26:57 AM4/27/15
to java-ch...@googlegroups.com
We are using chronicle-map-2.1.4

Rob Austin

unread,
Apr 27, 2015, 8:37:43 AM4/27/15
to java-ch...@googlegroups.com
I’ll release a version 2.1.6 today, If you could test with this and let me know if it fixes your issue.

Thanks

Rob

Arun Battu

unread,
Apr 28, 2015, 10:25:17 AM4/28/15
to java-ch...@googlegroups.com
Thanks Rob. I'll test with version 2.1.6 and let you know.

Regards.
Arun

Rob Austin

unread,
Apr 28, 2015, 10:25:50 AM4/28/15
to java-ch...@googlegroups.com
thanks

Arun Battu

unread,
Apr 29, 2015, 1:56:27 PM4/29/15
to java-ch...@googlegroups.com
Hi Rob,

I tested with version 2.1.6 and found the issue still exists. I also noticed that if I use multiple maps, only one gets stuck. For example, here's the output from a client loop that checks sizes of two maps every 2 seconds:

TEST 1:
map1<String, String>
map2<String, String>
Number of keys = 200000

map1 = 0 keys, map2 = 0 keys,
map1 = 191867 keys, map2 = 13139 keys,
map1 = 196216 keys, map2 = 24560 keys,
map1 = 196216 keys, map2 = 55352 keys,
map1 = 196216 keys, map2 = 65629 keys,
map1 = 196216 keys, map2 = 86146 keys,
map1 = 196216 keys, map2 = 116940 keys,
map1 = 196216 keys, map2 = 137452 keys,
map1 = 196216 keys, map2 = 149195 keys,
map1 = 196216 keys, map2 = 161768 keys,
map1 = 196216 keys, map2 = 165407 keys,
map1 = 196216 keys, map2 = 174342 keys,
map1 = 196216 keys, map2 = 186926 keys,
map1 = 196216 keys, map2 = 190563 keys,
map1 = 196216 keys, map2 = 199499 keys,
map1 = 196216 keys, map2 = 200000 keys,
map1 = 196216 keys, map2 = 200000 keys,
map1 = 196216 keys, map2 = 200000 keys,
map1 = 196216 keys, map2 = 200000 keys,

TEST 2:
map1<Integer, String>
map2<Integer, String>
Number of keys = 200000

map1 = 0 keys, map2 = 0 keys,
map1 = 188285 keys, map2 = 11947 keys,
map1 = 188293 keys, map2 = 80576 keys,
map1 = 188293 keys, map2 = 162571 keys,
map1 = 188293 keys, map2 = 200000 keys,
map1 = 188293 keys, map2 = 200000 keys,
map1 = 188293 keys, map2 = 200000 keys,
map1 = 188293 keys, map2 = 200000 keys,
map1 = 188293 keys, map2 = 200000 keys,
map1 = 188293 keys, map2 = 200000 keys,
map1 = 188293 keys, map2 = 200000 keys,
map1 = 188293 keys, map2 = 200000 keys,
map1 = 188293 keys, map2 = 200000 keys,
map1 = 188293 keys, map2 = 200000 keys,
map1 = 188293 keys, map2 = 200000 keys,
map1 = 188293 keys, map2 = 200000 keys,

Looks like map1 starts off very well but gets stuck, while map2 keeps going. I found that map1 does eventually catch up gets all the data, but it takes several minutes to do so.

Thanks.
Arun

Rob Austin

unread,
Apr 29, 2015, 2:34:13 PM4/29/15
to java-ch...@googlegroups.com
Thanks for the info, I'll investigate it further. 

Kent Hoxsey

unread,
May 4, 2015, 12:34:09 PM5/4/15
to java-ch...@googlegroups.com
I am experiencing something similar, although with much smaller maps. I use a ReplicationHub to coordinate between two servers, and am seeing the maps not fully synchronized when bootstrap completes. In addition, I have been logging the most-recent value entered, and notice that the client map is not keeping up with the server.

Today I am refactoring to use replication without the hub and will report results.

Rob Austin

unread,
May 4, 2015, 3:55:38 PM5/4/15
to java-ch...@googlegroups.com
I'm sorry for not getting you a fix for this,  I'll try to spend some time tomorrow looking at the problem again, and will hopefully have something for you shortly. 

Rob

Kent Hoxsey

unread,
May 4, 2015, 7:27:08 PM5/4/15
to java-ch...@googlegroups.com
No problem, Rob. I am only trying to add some information.

Today I confirmed that I encounter the same problems as Arun reported, but with much smaller maps. I see the issue when configured to use a Replication Hub as well as a simple point-to-point TCP replication.

I am using Chronicle Map 2.1.4 on windows machines both client and server.

Rob Austin

unread,
May 5, 2015, 12:02:13 AM5/5/15
to java-ch...@googlegroups.com
>I am using Chronicle Map 2.1.4 on windows machines both client and server.

as this contains a bug fix around replication.

Arun Battu

unread,
May 5, 2015, 9:33:30 AM5/5/15
to java-ch...@googlegroups.com
Rob, I'm using 2.1.6 and still see this issue.

Thanks.
Arun

Rob Austin

unread,
May 7, 2015, 3:54:28 PM5/7/15
to java-ch...@googlegroups.com, arun.k...@gmail.com
Ken, Arun

I have been able to reproduce a similar issue and it has just been fixed in :

<dependency>
<groupId>net.openhft</groupId>
<artifactId>chronicle-map</artifactId>
<version>2.1.7</version>
</dependency>


Which should be on maven central soon, please retest and let me know if it fixes it in your environment.

Rob

Kent Hoxsey

unread,
May 8, 2015, 2:52:28 PM5/8/15
to java-ch...@googlegroups.com
Thanks for publishing the update, Rob. I tried it out in two different situations:

- running point-to-point replication (each map configured with IP address and port) seems to work now
- using ReplicationHub does not seem to work

I am using the ReplicationHub as detailed in the Channels and ReplicationChannel example. It appears as though the bootstrap takes place, but then no further updates flow across the channel in either direction.

Right now I am attempting to replicate 3 maps, but intending to grow that to 7-ish. None are particularly large. Any advice about how to better debug this will be appreciated.

Kent

Rob Austin

unread,
May 8, 2015, 3:01:16 PM5/8/15
to java-ch...@googlegroups.com
Can you confirm which version you are using, I just want to double check you are using the latest version. 
--

Kent Hoxsey

unread,
May 9, 2015, 1:38:20 PM5/9/15
to java-ch...@googlegroups.com
Using 2.1.7. Sorry I was not clear about that.

Arun Battu

unread,
May 12, 2015, 9:51:25 AM5/12/15
to java-ch...@googlegroups.com
Still seeing the issue. Using replication hub. Chronicle map version 2.1.7.

Arun Battu

unread,
May 28, 2015, 5:19:47 PM5/28/15
to java-ch...@googlegroups.com
Hi Rob,

Just checking to see if you got a chance to look into this further.

Thanks.
Arun
Reply all
Reply to author
Forward
0 new messages