Hello there.
We run Voldemort server with 4 nodes in the cluster using voldemort-release-1.10.14.
And then we connect Voldemort server through SolrCloud having five nodes.
Each SolrCloud node has two voldemort client created fore each Solr core. In total we have 10 voldemort client over 5 solr nodes.
I have updated Voldemort client jar to the latest 1.10.26.
2018-01-05 11:34:49.842 INFO (coreLoadExecutor-10-thread-2-processing-n:wp-np2-c0:8983_solr) [c:uniprot s:shard2 r:core_node7 x:uniprot_shard2_replica_n4] v.c.s.ClientRegistryRefresher Initial version obtained from client registry: version(0:1, 1:1, 2:3, 3:4) ts:1515152089828
2018-01-05 11:34:49.847 INFO (coreLoadExecutor-10-thread-2-processing-n:wp-np2-c0:8983_solr) [c:uniprot s:shard2 r:core_node7 x:uniprot_shard2_replica_n4] v.c.ZenStoreClient Client registry refresher thread started, refresh interval: 43200 seconds
2018-01-05 11:34:49.847 INFO (coreLoadExecutor-10-thread-2-processing-n:wp-np2-c0:8983_solr) [c:uniprot s:shard2 r:core_node7 x:uniprot_shard2_replica_n4] v.c.ZenStoreClient Voldemort client created: .avro-uniprot@wp-np2-c0:/nfs/public/rw/homes/uni_adm/solrcloud/dist/solr-7.1.0/server
bootstrapTime=1515152089702
context=
deploymentPath=/nfs/public/rw/homes/uni_adm/solrcloud/dist/solr-7.1.0/server
localHostName=wp-np2-c0
sequence=0
storeName=avro-uniprot
updateTime=1515152089383
releaseVersion=null
clusterMetadataVersion=0
bootstrap_urls=[tcp://ves-oy-ea:6666]
max_connections=20
connection_timeout_ms=60000
socket_timeout_ms=60000
routing_timeout_ms=60000
client_zone_id=-1
failuredetector_implementation=voldemort.cluster.failuredetector.ThresholdFailureDetector
failuredetector_threshold=95
failuredetector_threshold_count_minimum=30
failuredetector_threshold_interval=300000
failuredetector_threshold_async_recovery_interval=10000
fetch_all_stores_xml_in_bootstrap=true
idle_connection_timeout_minutes=-1
Every 12 hours, some clients would try to update the the connections.
2018-01-05 23:34:49.848 INFO (voldemort-scheduler-service1-t2) [c:uniprot s:shard2 r:core_node7 x:uniprot_shard2_replica_n4] v.c.s.ClientRegistryRefresher updating client registry with the following info for client: .avro-uniprot@wp-np2-c0:/nfs/public/rw/homes/uni_adm/solrcloud/dist/solr-7.1.0/server
And then we will get the following error information for for most of Vodemort nodes:
2018-01-05 23:34:49.850 INFO (voldemort-niosocket-client-system-t1) [c:uniprot s:shard2 r:core_node7 x:uniprot_shard2_replica_n4] v.s.s.c.ClientRequestExecutor IOException from Destination: ves-oy-ea:6666(vp1) , Socket: Socket[addr=ves-oy-ea/
10.3.7.234,port=6666,localport=45426] with message - Connection reset by peer
2018-01-05 23:34:50.263 ERROR (voldemort-niosocket-client-system-t1) [c:uniprot s:shard2 r:core_node7 x:uniprot_shard2_replica_n4] v.s.s.c.ClientRequestExecutorFactory$ClientRequestSelectorManager null
java.lang.ExceptionInInitializerError
at voldemort.serialization.VSlopProto$Slop.<clinit>(VSlopProto.java:495)
at voldemort.serialization.SlopSerializer.toBytes(SlopSerializer.java:41)
at voldemort.serialization.SlopSerializer.toBytes(SlopSerializer.java:35)
at voldemort.store.slop.HintedHandoff.sendHintParallel(HintedHandoff.java:113)
at voldemort.store.routed.action.PerformParallelPutRequests$1.requestComplete(PerformParallelPutRequests.java:191)
at voldemort.store.socket.clientrequest.NonblockingStoreCallbackClientRequest.invokeCallback(NonblockingStoreCallbackClientRequest.java:68)
at voldemort.store.socket.clientrequest.NonblockingStoreCallbackClientRequest.complete(NonblockingStoreCallbackClientRequest.java:87)
at voldemort.store.socket.clientrequest.ClientRequestExecutor.completeClientRequest(ClientRequestExecutor.java:430)
at voldemort.store.socket.clientrequest.ClientRequestExecutor.close(ClientRequestExecutor.java:250)
at voldemort.common.nio.SelectorManagerWorker.run(SelectorManagerWorker.java:125)
at voldemort.common.nio.AbstractSelectorManager.run(AbstractSelectorManager.java:243)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Generated message class "voldemort.serialization.VSlopProto$Slop" missing method "getStoreBytes".
at com.google.protobuf.GeneratedMessage.getMethodOrDie(GeneratedMessage.java:1971)
at com.google.protobuf.GeneratedMessage.access$1100(GeneratedMessage.java:61)
at com.google.protobuf.GeneratedMessage$FieldAccessorTable$SingularStringFieldAccessor.<init>(GeneratedMessage.java:2860)
at com.google.protobuf.GeneratedMessage$FieldAccessorTable.ensureFieldAccessorsInitialized(GeneratedMessage.java:2108)
at com.google.protobuf.GeneratedMessage$FieldAccessorTable.<init>(GeneratedMessage.java:2039)
at voldemort.serialization.VSlopProto$1.assignDescriptors(VSlopProto.java:531)
at com.google.protobuf.Descriptors$FileDescriptor.internalBuildGeneratedFileFrom(Descriptors.java:355)
at voldemort.serialization.VSlopProto.<clinit>(VSlopProto.java:539)
... 14 more
Caused by: java.lang.NoSuchMethodException: voldemort.serialization.VSlopProto$Slop.getStoreBytes()
at java.lang.Class.getMethod(Class.java:1786)
at com.google.protobuf.GeneratedMessage.getMethodOrDie(GeneratedMessage.java:1968)
... 21 more
2018-01-05 23:34:50.266 INFO (voldemort-niosocket-client-system-t1) [c:uniprot s:shard2 r:core_node7 x:uniprot_shard2_replica_n4] v.s.s.c.ClientRequestExecutor IOException from Destination: ves-oy-ec.ebi.ac.uk:6666(vp1) , Socket: Socket[addr=
ves-oy-ec.ebi.ac.uk/10.3.7.236,port=6666,localport=41358] with message - Connection reset by peer
2018-01-05 23:34:50.266 ERROR (voldemort-niosocket-client-system-t1) [c:uniprot s:shard2 r:core_node7 x:uniprot_shard2_replica_n4] v.s.s.c.ClientRequestExecutorFactory$ClientRequestSelectorManager Could not initialize class voldemort.serialization.VSlopProto$Slop
java.lang.NoClassDefFoundError: Could not initialize class voldemort.serialization.VSlopProto$Slop
at voldemort.serialization.SlopSerializer.toBytes(SlopSerializer.java:41)
at voldemort.serialization.SlopSerializer.toBytes(SlopSerializer.java:35)
at voldemort.store.slop.HintedHandoff.sendHintParallel(HintedHandoff.java:113)
at voldemort.store.routed.action.PerformParallelPutRequests$1.requestComplete(PerformParallelPutRequests.java:191)
at voldemort.store.socket.clientrequest.NonblockingStoreCallbackClientRequest.invokeCallback(NonblockingStoreCallbackClientRequest.java:68)
at voldemort.store.socket.clientrequest.NonblockingStoreCallbackClientRequest.complete(NonblockingStoreCallbackClientRequest.java:87)
at voldemort.store.socket.clientrequest.ClientRequestExecutor.completeClientRequest(ClientRequestExecutor.java:430)
at voldemort.store.socket.clientrequest.ClientRequestExecutor.close(ClientRequestExecutor.java:250)
at voldemort.common.nio.SelectorManagerWorker.run(SelectorManagerWorker.java:125)
at voldemort.common.nio.AbstractSelectorManager.run(AbstractSelectorManager.java:243)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2018-01-05 23:34:50.267 INFO (voldemort-niosocket-client-system-t1) [c:uniprot s:shard2 r:core_node7 x:uniprot_shard2_replica_n4] v.s.s.c.ClientRequestExecutor IOException from Destination: ves-oy-eb.ebi.ac.uk:6666(vp1) , Socket: Socket[addr=
ves-oy-eb.ebi.ac.uk/10.3.7.235,port=6666,localport=52260] with message - Connection reset by peer
2018-01-05 23:34:50.267 ERROR (voldemort-niosocket-client-system-t1) [c:uniprot s:shard2 r:core_node7 x:uniprot_shard2_replica_n4] v.s.s.c.ClientRequestExecutorFactory$ClientRequestSelectorManager Could not initialize class voldemort.serialization.VSlopProto$Slop
java.lang.NoClassDefFoundError: Could not initialize class voldemort.serialization.VSlopProto$Slop
at voldemort.serialization.SlopSerializer.toBytes(SlopSerializer.java:41)
at voldemort.serialization.SlopSerializer.toBytes(SlopSerializer.java:35)
at voldemort.store.slop.HintedHandoff.sendHintParallel(HintedHandoff.java:113)
at voldemort.store.routed.action.PerformParallelPutRequests$1.requestComplete(PerformParallelPutRequests.java:191)
at voldemort.store.socket.clientrequest.NonblockingStoreCallbackClientRequest.invokeCallback(NonblockingStoreCallbackClientRequest.java:68)
at voldemort.store.socket.clientrequest.NonblockingStoreCallbackClientRequest.complete(NonblockingStoreCallbackClientRequest.java:87)
at voldemort.store.socket.clientrequest.ClientRequestExecutor.completeClientRequest(ClientRequestExecutor.java:430)
at voldemort.store.socket.clientrequest.ClientRequestExecutor.close(ClientRequestExecutor.java:250)
at voldemort.common.nio.SelectorManagerWorker.run(SelectorManagerWorker.java:125)
at voldemort.common.nio.AbstractSelectorManager.run(AbstractSelectorManager.java:243)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
After this point, the voldemort client on SolrCloud nodes will cause the VM with high CPU usage.
From the JVM flight recording, we can see there are several voldemort-niosocket-client-system-t1 running, here is the whole trace:
Stack Trace
voldemort-niosocket-client-system-t1 [79] (RUNNABLE)
java.lang.Throwable.fillInStackTrace line: not available [native method]
java.lang.Throwable.fillInStackTrace line: 783
java.lang.Throwable.<init> line: 265
java.lang.Exception.<init> line: 66
java.io.IOException.<init> line: 58
java.io.EOFException.<init> line: 62
voldemort.store.socket.clientrequest.ClientRequestExecutor.read line: 262
voldemort.common.nio.SelectorManagerWorker.run line: 105
voldemort.common.nio.AbstractSelectorManager.run line: 243
java.util.concurrent.ThreadPoolExecutor.runWorker line: 1142
java.util.concurrent.ThreadPoolExecutor$Worker.run line: 617
java.lang.Thread.run line: 745
For me, it seems there is an infinite loop running over here voldemort.common.nio.AbstractSelectorManager.run line: 243.
Could somebody help me to check this problems?
Thanks.