Getting exception voldemort.store.InsufficientOperationalNodesException in production systems

68 views
Skip to first unread message

Anandh Kumar

unread,
Aug 9, 2017, 5:49:42 AM8/9/17
to project-...@googlegroups.com
We are getting following exption in production system some time examples out of 84mn call 10000 times we are getting this exception

voldemort.store.InsufficientOperationalNodesException: 1 get alls required, but 0 succeeded. Failing nodes : []

I tried to increase the Voldemort ClientConfig connection_timeout and sockettimeout but till getting this error.

Our config:

stores.xml:

<stores>
<store>
  <name>test</name>
  <persistence>bdb</persistence>
  <description>test</description>
  <owners>test</owners>
  <routing-strategy>consistent-routing</routing-strategy>
  <routing>client</routing>
  <replication-factor>1</replication-factor>
  <required-reads>1</required-reads>
  <required-writes>1</required-writes>
  <key-serializer>
    <type>string</type>
  </key-serializer>
  <value-serializer>
    <type>string</type>
  </value-serializer>
</store>
</stores>

Server.properties


# configs
admin.enable=true
admin.max.threads=40
bdb.cache.evictln=true
bdb.cache.size=12GB
bdb.checkpoint.interval.bytes=2147483648
bdb.checkpointer.off.batch.writes=true
bdb.cleaner.interval.bytes=15728640
bdb.cleaner.lazy.migration=false
bdb.cleaner.min.file.utilization=0
bdb.cleaner.threads=1
bdb.enable=true
bdb.evict.by.level=true
bdb.expose.space.utilization=true
bdb.lock.nLockTables=94
bdb.minimize.scan.impact=true
bdb.one.env.per.store=true
enable.server.routing=false
enable.verbose.logging=false
http.enable=true
nio.connector.selectors=100
num.scan.permits=2
request.format=vp3
restore.data.timeout.sec=1314000
scheduler.threads=24
socket.enable=true
storage.configs=voldemort.store.bdb.BdbStorageConfiguration, voldemort.store.readonly.ReadOnlyStorageConfiguration
stream.read.byte.per.sec=209715200
stream.write.byte.per.sec=78643200


cluster.xml

<cluster>
        <name>myprodcluster</name>
        <server>
                <id>0</id>
                <host>192.168.1.10</host>
                <http-port>8081</http-port>
                <socket-port>6666</socket-port>
                <admin-port>7777</admin-port>
                <partitions>0, 1</partitions>
        </server>
</cluster>

We have single node cluster


Can you please help me how we can fix this issue

Arunachalam

unread,
Aug 11, 2017, 5:26:58 PM8/11/17
to project-...@googlegroups.com
Try increasing the number of connections. From What I infer, there is no failing nodes, so it was just waiting in the client queue and timing out.

Other reason is you are throwing more load than the Client/Server could handle. You should monitor the p90/p99 response times and see if you see a spike and where is the contention happening from.

Thanks,
Arun.

On Wed, Aug 9, 2017 at 2:49 AM, Anandh Kumar <anand...@gmail.com> wrote:
We are getting following exption in production system some time examples out of 84mn call 10000 times we are getting this exception

voldemort.store.InsufficientOperationalNodesException: 1 get alls required, but 0 succeeded. Failing nodes : []

I tried to increase the Voldemort ClientConfig connection_timeout and sockettimeout but till getting this error.

Can you please help me how we can fix this issue

--
You received this message because you are subscribed to the Google Groups "project-voldemort" group.
To unsubscribe from this group and stop receiving emails from it, send an email to project-voldemort+unsubscribe@googlegroups.com.
Visit this group at https://groups.google.com/group/project-voldemort.
For more options, visit https://groups.google.com/d/optout.

Anandh Kumar

unread,
Sep 1, 2017, 2:01:22 AM9/1/17
to project-voldemort
Arun,

Still we are getting this exception after increased ConnectionTimedout,SocketTimedout.

Can you please help me here how we can fix this.

Regards,
-Anandh Kumar


On Saturday, August 12, 2017 at 2:56:58 AM UTC+5:30, Arun Thirupathi wrote:
Try increasing the number of connections. From What I infer, there is no failing nodes, so it was just waiting in the client queue and timing out.

Other reason is you are throwing more load than the Client/Server could handle. You should monitor the p90/p99 response times and see if you see a spike and where is the contention happening from.

Thanks,
Arun.
On Wed, Aug 9, 2017 at 2:49 AM, Anandh Kumar <anand...@gmail.com> wrote:
We are getting following exption in production system some time examples out of 84mn call 10000 times we are getting this exception

voldemort.store.InsufficientOperationalNodesException: 1 get alls required, but 0 succeeded. Failing nodes : []

I tried to increase the Voldemort ClientConfig connection_timeout and sockettimeout but till getting this error.

Can you please help me how we can fix this issue

--
You received this message because you are subscribed to the Google Groups "project-voldemort" group.
To unsubscribe from this group and stop receiving emails from it, send an email to project-voldem...@googlegroups.com.

Félix GV

unread,
Sep 1, 2017, 10:12:24 AM9/1/17
to project-voldemort
Did you try increasing the number of connections like Arun suggested?
Reply all
Reply to author
Forward
0 new messages