stress voldemort performance

113 views
Skip to first unread message

Paolo Forte

unread,
Jul 28, 2015, 6:28:54 AM7/28/15
to project-voldemort
My goal is to stress a voldemort store (v1.9.17) with a read-only workload. I already read a similar thread ("squeeze voldemort performance with protobuf").
Currently, I manage to achieve relatively few asynchronous readings per second (~70). I would like to raise this amount and I hope you can give me some guidance.
The keys stored in the store are 800K and the value size is 100K. Each reading picks a random key from the key space.

The store is composed of a single node interconnected with a client and a load generator. the machines are on the same LAN. They have 64 GB RAM, two 12-core AMD Opteron processors, a 500 GB hard disk, a 1 Gb network controller.

The store is configured as follows:
#server.properties

# configs
admin.enable=true
admin.max.threads=40

bdb.cache.evictln=true
bdb.cache.size=20GB
bdb.checkpoint.interval.bytes=2147483648
bdb.checkpoint.interval.ms=3600000
bdb.checkpointer.off.batch.writes=true
bdb.cleaner.interval.bytes=15728640
bdb.cleaner.minUtilization=50
bdb.cleaner.min.file.utilization=0
bdb.cleaner.lazy.migration=false
bdb.cleaner.threads=5
bdb.flush.transactions=true
bdb.enable=true
bdb.evict.by.level=true
bdb.expose.space.utilization=true
bdb.lock.nLockTables=47
bdb.minimize.scan.impact=true
bdb.one.env.per.store=false

http.enable=true
nio.connector.selectors=1000
socket.enable=true

enable.server.routing=false
client.max.queued.requests=10000
num.scan.permits=2
request.format=vp3
restore.data.timeout.sec=1314000
scheduler.threads=24
storage.configs=voldemort.store.bdb.BdbStorageConfiguration, voldemort.store.readonly.ReadOnlyStorageConfiguration
stream.read.byte.per.sec=209715200
stream.write.byte.per.sec=78643200
enable.verbose.logging=true

<store>
  <name>test</name>
  <persistence>bdb</persistence>
  <description>Test store</description>
  <routing-strategy>consistent-routing</routing-strategy>
  <routing>client</routing>
  <replication-factor>1</replication-factor>
  <required-reads>1</required-reads>
  <required-writes>1</required-writes>
  <key-serializer>
    <type>string</type>
  </key-serializer>
  <value-serializer>
    <type>string</type>
  </value-serializer>
</store>


In the following pics I present how the system reacts for a poisson workload following a sinusoidal pattern. Specifically, I present the number of requests sent per second from a load generator and the average response time achieved by a client.


Any help will be very appreciated.


Félix GV

unread,
Jul 28, 2015, 12:10:09 PM7/28/15
to project-voldemort
Hi Paolo,

Let's do some back of the hand calculation...

You have a 1Gb network card, so that means 125 MBps. One request round trip (key+value) is roughly 1 MB. At 70 QPS, that means you're already using up about 70 MBps. Not that bad...

Now, Voldemort under default settings is not very optimized for large payloads like that, so you might be able to squeeze out a few more QPS by tweaking buffer size, number of connections, timeout values and the like, but you will never be able to get more than 125 QPS or so, given the width of your network pipe and the size of your payload.

-F
--
You received this message because you are subscribed to the Google Groups "project-voldemort" group.
To unsubscribe from this group and stop receiving emails from it, send an email to project-voldem...@googlegroups.com.
Visit this group at http://groups.google.com/group/project-voldemort.
For more options, visit https://groups.google.com/d/optout.

Paolo Forte

unread,
Jul 31, 2015, 1:02:04 PM7/31/15
to project-voldemort, fel...@gmail.com
Thanks Felix.

Looking at iftop, it doesn't seem a network problem since it shows that the traffic is ~11MBps for a workload of ~70 queries per second.
Raising the workload to ~100 qps, the network usage shown by iftop is around ~18MBps and some exceptions are raised on the load generator: 
Exception in thread "Thread-670" voldemort.store.InsufficientOperationalNodesException: 1 gets required, but only 0 succeeded Original replication set :[0] Known failed nodes before operation :[] Estimated live nodes in preference list :[0] New failed nodes during operation :[]
 at voldemort.store.routed.action.PerformSerialRequests.execute(PerformSerialRequests.java:142)
 at voldemort.store.routed.Pipeline.execute(Pipeline.java:212)
  at voldemort.store.routed.PipelineRoutedStore.get(PipelineRoutedStore.java:352)
at voldemort.store.routed.PipelineRoutedStore.get(PipelineRoutedStore.java:258)
at voldemort.store.routed.PipelineRoutedStore.get(PipelineRoutedStore.java:81)
 at voldemort.store.logging.LoggingStore.get(LoggingStore.java:106)
     at voldemort.store.DelegatingStore.get(DelegatingStore.java:65)
at voldemort.store.stats.StatTrackingStore.get(StatTrackingStore.java:87)
      at voldemort.store.stats.StatTrackingStore.get(StatTrackingStore.java:40)
      at voldemort.store.serialized.SerializingStore.get(SerializingStore.java:107)
  at voldemort.store.DelegatingStore.get(DelegatingStore.java:65)
at voldemort.store.versioned.InconsistencyResolvingStore.get(InconsistencyResolvingStore.java:51)
      at voldemort.client.DefaultStoreClient.get(DefaultStoreClient.java:157)
at voldemort.client.DefaultStoreClient.get(DefaultStoreClient.java:314)
at voldemort.client.LazyStoreClient.get(LazyStoreClient.java:103)
      at voldemort.performance.benchmark.VoldemortWrapper$1.run(VoldemortWrapper.java:110)
   at java.lang.Thread.run(Thread.java:745)

At the same time, the server shows these messages:
[17:55:55,851 voldemort.server.niosocket.AsyncRequestHandler] INFO Connection reset from Socket[addr=/192.168.0.6,port=60849,localport=6666] with message - Broken pipe [voldemort-nio-socket-server-t297]

Furthermore, I noticed a very interesting thing. If I query a small subset of the 800K keys  (let's say the first 2k), this problem doesn't occur and I am able to issue a large number of operations. For example I tried with 300 qps.

Any idea of a solution?

Félix GV

unread,
Jul 31, 2015, 1:12:47 PM7/31/15
to Paolo Forte, project-voldemort
Ah, I misunderstood your original email. You are saying you have 800K distinct keys. I thought your key size was 800KB (which seemed abnormally high but I didn't question it...).

So if your keys are small (a few bytes) as they should be, and your values are 100KB, that means your 1Gbps pipe can theoretically go up to 1250 QPS. You will likely not reach that because of the overhead of doing lots of small requests, but at least it puts things in perspective.

There are likely some tunings you can do to accomodate you fairly large value size, like increasing buffer sizes... Arun or Brendan are likely the best in terms of advising client configs for your workload.

What version of Voldemort are you running? 1.9.17+ is recommended as it has lots of improvements in these areas.

-F

Paolo Forte

unread,
Jul 31, 2015, 1:32:23 PM7/31/15
to project-voldemort, forte....@gmail.com, fel...@gmail.com
The release note says 1.9.17.
So, I guess that 70 qps is a very strange number. I am starting to think that maybe it's a different kind of problem. I mean, how can it be that querying on a fraction of few thousands keys can raise the number of QPS?

Arunachalam

unread,
Jul 31, 2015, 2:11:01 PM7/31/15
to project-...@googlegroups.com, Paolo Forte, fel...@gmail.com
Where is the bottleneck on your server ? Are you running on spinning disks / SSDs ? Do you have the server side metrics of how long does it take to service each request ? Your number of selectors are way too high.

nio.connector.selectors=1000

Try to trim them down to 20. 

How are you instantiating the client (SocketStoreClientFactory) ? How many connections does client use ?

Thanks,
Arun.

Paolo Forte

unread,
Jul 31, 2015, 4:24:53 PM7/31/15
to project-voldemort, forte....@gmail.com, fel...@gmail.com, arunac...@gmail.com
Are you running on spinning disks / SSDs ?
- I have no physical access to the server. However cat /sys/block/sda/queue/rotational prints 1, so it has a spinning disk.

- Your number of selectors are way too high. Try to trim them down to 20.
Ok, Thanks. I thought I had to set as many selectors as the number of expected connections. 

Do you have the server side metrics of how long does it take to service each request ? 
- What do you mean exactly? I collected the system statistics with sysstat/sar, if that it's what you want to know.  This should be the graph of the "await time" for the hard disk: "The average time (in milliseconds) for I/O requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them." 

- Where is the bottleneck on your server? How are you instantiating the client (SocketStoreClientFactory) ? How many connections does client use ?


It may be the disk access. I am basically using your performance tool. The main change I made was to wrap line 64 of the VoldemortWrapper in a new thread, so that each reading should be asynchronous:
        new Thread(new Runnable() {
           
public void run() {
               
Versioned<Object> returnedValue = voldemortStore.get(key, transforms);
           
}
       
}).start();

 the parameter --num-connections-per-node is set to 500 but only 80 connections are established. However I have noticed they are not closed as I expected when the load is low:



monitoring the traffic with Wireshark I have found that the unused sockets are kept alive by a message exchange every exactly 10 seconds. The message exchange is composed of 3 packets, with size 128B + 120B + 66B. The message reconstructed by Wireshark is:
%voldsys$_metadata_version_persistencemetadata-versions
,LBstores.xml=0
cluster.xml=0
test=0
Perhaps that is the problem?

Thanks for your help. I really appreciate it.

Paolo Forte

unread,
Aug 1, 2015, 8:44:34 AM8/1/15
to project-voldemort, forte....@gmail.com, fel...@gmail.com, arunac...@gmail.com
Update: The problem seems to be correlated to the number of open tcp connection, though I don't know if that is the cause or just an effect.
Using "tcptrack" and "ss -s" on server-side, I see that:
1. For low workload, the number of open tcp connections is kept constant.  examples: ~40QPS -> 14 sockets ; ~50QPS -> 35 sockets; ~60 -> 67 sockets.
2. For slightly higher workload, after ~20 seconds the number of tcp sockets rapidly increases and the load generator prints tons of "InsufficientOperationalNodesException". examples: ~70QPS -> 229 sockets.
3. Two sockets are constantly kept open to check for metadata change on server every 10 seconds.
However, when the load decreases, the exceeding sockets are not closed. They simply stay idle.

Any idea?
...

Arunachalam

unread,
Aug 3, 2015, 3:58:37 PM8/3/15
to Paolo Forte, project-voldemort, Félix GV
These are all expected behavior. Two sockets are kept for polling the metadata change and the default is every 10 seconds.

I am still not sure where you bottleneck  is.  I wish I can see what is going on, but 70 seems like a way too low number by a factor of at least 100. Start with one client, one server and see what you can get to narrow down the problem. There is some configuration which is terribly off. 

Thanks,
Arun.

Paolo Forte

unread,
Aug 4, 2015, 8:27:50 AM8/4/15
to project-voldemort, forte....@gmail.com, fel...@gmail.com
The current configuration already consists of a single client and a single server.
The bottleneck may be one or more among wrong JVM parameters, wrong connection handling or hard disk too slow.

As I already wrote, If I query a smaller keyspace of the 800K keys , I achieve an higher number of qps, that is quite weird but make me suspect of the hard disk. However, 20 MBps should be a bearable load for an hard disk.
I don't know how to work it out.

arunac...@gmail.com

unread,
Aug 4, 2015, 9:11:51 AM8/4/15
to project-...@googlegroups.com, Paolo Forte, fel...@gmail.com
If you are on a spinning disk, if you are doing random access at worst it drops to few 100 kbps. You can start by trying to directly read from bdb and see what is the performance you are getting.

Sent from Windows Mail

--

Félix GV

unread,
Aug 4, 2015, 11:57:28 AM8/4/15
to arunac...@gmail.com, project-...@googlegroups.com, Paolo Forte
Yeah sounds to me like disk IO problems... Random access is very slow on spindles.

And the fact it goes faster on a smaller queried key range makes me think more of your data can stay in the OS page cache, hence the improved speed.

We have been running Voldemort exclusively on SSDs for several years now, so we haven't done much tuning for hard drives.

If you invest in a Voldemort deployment, I heavily suggest investing in SSDs for your boxes.
Reply all
Reply to author
Forward
0 new messages