Hi,
We need to move our servers to production and was doing some load testing with Zookeeper as HA for Vault. The following script is used for the same.
counter1=1
while [ 1 ]
do
counter=1
while [ $counter -le 1000 ]
do
echo $counter1
((counter++))
((counter1++))
done
echo "sleeping for 1 min(s)..."
sleep 1m
done
echo All done
Everything works smooth till we hit around 1 million keys (1069045 ) after that i see various errors from zookeeper and vault.
{"errors":["1 error occurred:\n\t* zk: connection closed\n\n"]}
{"errors":["local node not active but active cluster node not found"]}
{"errors":["error performing token check: failed to read salt: zk: connection closed"]}
In the middle of these errors, key creation happens at random but very slow.
I have a cron Job to run ZkCleanup.sh every 4 hours to retain 3 snapshots, increased Java heap size to 1GB.
I also see a spike in CPU usage % to even 170%.
I have the following questions, is my testing approach right ? why does zookeeper fail post creation of 1 million keys ?