what is the proper way to create more clients to test the reading performance?

50 views
Skip to first unread message

xi...@tune.com

unread,
Mar 6, 2015, 1:00:14 PM3/6/15
to project-...@googlegroups.com
Hi,

We want to test the voldemort cluster's maximum throughput(server side). So we are trying to open as many clients on our testing node to query the store. 

But the problem I am having right now is that I can only create up to 47 clients in the program. What is the proper setup to increase that number to 200 or 500?

Thanks,

Xinyu

Brendan Harris (a.k.a. stotch on irc.oftc.net)

unread,
Mar 6, 2015, 1:38:48 PM3/6/15
to project-...@googlegroups.com
Hi Xinyu,

First of all, how are you creating your clients?

Second, I recommend against trying to create a bunch of clients in one JVM instance or even on one host. You'll hit all kinds of limiting factors both from the JVM and from the host.

For ideal testing, I would not recommend creating more than 4 clients per JVM instance and no more than one or two JVMs per CPU core (depending on the CPU type). You'll need to set your client's max_connections and selectors a little high depending on how many QPS you do. Try starting out with max_connections 100 and selectors 50. You might need them higher than that, though. But the more connections you allow, the more memory you consume.

On the server side, you'll probably need to set the selectors higher than default too.

You'll also want to mix up your queries and have a large number of keys to work with to better ensure that you're not getting results purely from memory.

Keep an eye on both the client and server JVM GC activity and make sure that you're not falling behind on freeing up memory or failing to allocate memory. If you are, tune your JVM(s) appropriately.

Brendan

xi...@tune.com

unread,
Mar 6, 2015, 1:53:58 PM3/6/15
to project-...@googlegroups.com
Hi Brendan,

Thanks for the answer.  Currently I create one SocketStoreClientFactory in main program and create one client for each thread I am using to query the server. With 20 clients we can get about 30,000 request /s, which is pretty low. The server side is not busy at all. 

The testing node has 16 cores and 30GB memory.

Xinyu

Arunachalam

unread,
Mar 6, 2015, 2:41:50 PM3/6/15
to project-...@googlegroups.com
Xinyu,
         The SocketStoreClientFactory holds the resources to talk to Server. There are many parameters to tune, but the most interesting ones are connections per node and selectors. The default is 5, increase it to 200, so 200 parallel requests can happen. You also might need to tune the selectors on the server side.

But start with the client.

ClientConfig clientConfig = new ClientConfig().setBootstrapUrls(url)
                                                      .setMaxConnectionsPerNode(200)
                                                      .setSelectors(30)
        

new SocketStoreClientFactory(clientConfig);

Creating clients does not increase the resources, infact it does not make a difference if you use the same client or new client from the same factory. Try increaing the socketStoreClientFactory.  At some point, your network is going to become the bottle neck so you need to increase the number of clients.

Thanks,
Arun.


--
You received this message because you are subscribed to the Google Groups "project-voldemort" group.
To unsubscribe from this group and stop receiving emails from it, send an email to project-voldem...@googlegroups.com.
Visit this group at http://groups.google.com/group/project-voldemort.
For more options, visit https://groups.google.com/d/optout.

Felix GV

unread,
Mar 6, 2015, 3:07:32 PM3/6/15
to project-...@googlegroups.com
Hi,

I'd be curious to know how many client machines and server machines are you using for your test.

Thanks (:


--
 
Felix GV
Data Infrastructure Engineer
Distributed Data Systems
LinkedIn
 
f...@linkedin.com
linkedin.com/in/felixgv

From: project-...@googlegroups.com [project-...@googlegroups.com] on behalf of Arunachalam [arunac...@gmail.com]
Sent: Friday, March 06, 2015 11:41 AM
To: project-...@googlegroups.com
Subject: Re: [project-voldemort] Re: what is the proper way to create more clients to test the reading performance?

Brendan Harris (a.k.a. stotch on irc.oftc.net)

unread,
Mar 7, 2015, 3:41:57 AM3/7/15
to project-...@googlegroups.com
Hi Xinyu,

Like Arun said, and like I said earlier, you're going to hit both JVM and host (possibly even network) limitations if you don't spin up more individual JVM instances and spread them out. Just because you have a lot of RAM and a few multicore CPUs in one host does not mean that you're going to be able to drive a true stress-test workload out of one JVM running a bunch of clients. Also, like I mentioned before, you're going to need to watch your JVMs' GC activity and will likely have to do some JVM tuning to get past those limiting factors.

And like Felix said, it would also help do know how many hosts you're working with (looks like you're using just one client host? But how many voldemort servers?) Can you also past your server.xml, stores.xml and server.properties?


Brendan

xi...@tune.com

unread,
Mar 11, 2015, 2:06:52 PM3/11/15
to project-...@googlegroups.com
Hi Brendan,

I think I reached the bottleneck on the client side. I am using 4 r3.2xlarge nodes( each has 8 core 61GB RAM) for the server. The testing node to connect to the server is r3.4xlarge (16 core 30GB RAM)

I have tried to increase the connection on the client side to setMaxConnectionsPerNode(500).setSelectors(100). But this didn't improve the performance at all.


Thanks,
Xinyu
cluster.xml
<cluster>
        <name>dataflow</name>
        <server>
                <id>0</id>
                <host>ec2-54-224-29-151.compute-1.amazonaws.com</host>
                <http-port>8085</http-port>
                <socket-port>6666</socket-port>
                <admin-port>6667</admin-port>
                <partitions>0,4,8,12,16,20</partitions>
        </server>
        <server>
                <id>1</id>
                <host>ec2-54-166-210-210.compute-1.amazonaws.com</host>
                <http-port>8082</http-port>
                <socket-port>6668</socket-port>
                <admin-port>6669</admin-port>
                <partitions>1,5,9,13,17,21</partitions>
        </server>
        <server>
                <id>2</id>
                <host>ec2-54-162-63-13.compute-1.amazonaws.com</host>
                <http-port>8083</http-port>
                <socket-port>6670</socket-port>
                <admin-port>6671</admin-port>
                <partitions>2,6,10,14,18,22</partitions>
        </server>
        <server>
                <id>3</id>
                <host>ec2-54-224-81-104.compute-1.amazonaws.com</host>
                <http-port>8084</http-port>
                <socket-port>6672</socket-port>
                <admin-port>6673</admin-port>
                <partitions>3,7,11,15,19,23</partitions>
        </server>
</cluster>

# The ID of *this* particular cluster node

max.threads=200
enable.repair=true

############### DB options ######################

http.enable=true
socket.enable=true
jmx.enable=true
enable.readonly.engine=true
file.fetcher.class=voldemort.store.readonly.fetcher.HdfsFetcher

# BDB
#bdb.write.transactions=false
#bdb.flush.transactions=false
#bdb.cache.size=1000MB

# Mysql
#mysql.host=localhost
#mysql.port=1521
#mysql.user=root
#mysql.password=3306
#mysql.database=olap

Brendan Harris (a.k.a. stotch on irc.oftc.net)

unread,
Mar 11, 2015, 2:42:36 PM3/11/15
to project-...@googlegroups.com
Hi Xinyu,

What about your stores.xml? Can you paste that too?


On Wednesday, March 11, 2015 at 11:06:52 AM UTC-7, xi...@tune.com wrote:
I think I reached the bottleneck on the client side. I am using 4 r3.2xlarge nodes( each has 8 core 61GB RAM) for the server. The testing node to connect to the server is r3.4xlarge (16 core 30GB RAM)

Yeah, it kind of sounds like it. You'll need a lot of parallel clients.

Did you take a look at both your client and server JVM GC activity to see if your garbage collection is causing the app to hang periodically and if that is increasing as you increase the throughput?

Our servers have been able to steadily handle more than 20K qps per server sustained throughout multiple days. I'd be really surprised if you could not come close to that.

Let me show you what our server config looks like ...
ReadOnlyStorageEngine servers:
admin.streams.buffer.size=1024
bdb.enable=false
data.directory=${voldemort.data.dir}
enable.grandfather=false
enable.nio.connector=true
enable.readonly.engine=true
enable.repair=false
enable.server.routing=false
enable.verbose.logging=false
file.fetcher.class=voldemort.store.readonly.fetcher.HdfsFetcher
hdfs.fetcher.buffer.size=16MB
hdfs.fetcher.tmp.dir=${voldemort.home.dir}/voldemort
fetcher.retry.count=12
http.enable=false
jmx.enable=true
nio.connector.selectors=50
readonly.hadoop.config.path=${voldemort.home.dir}/config/hadoop
slop.enable=false
socket.buffer.size=65000
socket.enable=true
storage.configs=voldemort.store.readonly.ReadOnlyStorageConfiguration
voldemort.home=${voldemort.home.dir}

ReadOnly JVM config:
-server \ 
-Xms4096m \ 
-Xmx4096m \ 
-XX:NewSize=1024m \ 
-XX:MaxNewSize=1024m \ 
-XX:+UseConcMarkSweepGC \ 
-XX:+UseParNewGC \ 
-XX:CMSInitiatingOccupancyFraction=70 \ 
-XX:SurvivorRatio=2 \ 
-XX:+AlwaysPreTouch \ 
-XX:+UseCompressedOops \ 
-XX:+PrintTenuringDistribution \ 
-XX:+PrintGCDetails \ 
-XX:+PrintGCDateStamps \ 
-Xloggc:$LOG_DIR/gc.log \ 
-XX:+PrintGCApplicationStoppedTime \ 
-XX:+PrintGCTimeStamps \ 
-XX:+PrintGCApplicationConcurrentTime \ 
-Djava.net.preferIPv4Stack=true \ 

BdbStorageEngine servers:
admin.max.threads=40
bdb.cache.evictln=true
bdb.cache.size=20GB
bdb.checkpoint.interval.bytes=2147483648
bdb.cleaner.interval.bytes=15728640
bdb.cleaner.lazy.migration=false
bdb.cleaner.min.file.utilization=0
bdb.cleaner.threads=1
bdb.enable=true
bdb.evict.by.level=true
bdb.expose.space.utilization=true
bdb.lock.nLockTables=47
bdb.minimize.scan.impact=true
bdb.one.env.per.store=true
bdb.raw.property.string=je.cleaner.adjustUtilization=false
data.directory=${voldemort.data.dir}
enable.server.routing=false
enable.verbose.logging=false
http.enable=false
max.proxy.put.threads=50
nio.connector.selectors=50
num.scan.permits=2
restore.data.timeout.sec=1314000
retention.cleanup.first.start.hour=3
scheduler.threads=24
storage.configs=voldemort.store.bdb.BdbStorageConfiguration
stream.read.byte.per.sec=209715200
stream.write.byte.per.sec=78643200
voldemort.home=${voldemort.home.dir}

BDB JVM config:
-server 
-Xms32684m
-Xmx32684m
-XX:NewSize=2048m
-XX:MaxNewSize=2048m
-XX:+UseConcMarkSweepGC
-XX:+UseParNewGC
-XX:CMSInitiatingOccupancyFraction=70
-XX:SurvivorRatio=2
-XX:+AlwaysPreTouch
-XX:+UseCompressedOops
-XX:+PrintTenuringDistribution
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-Xloggc:$LOG_DIR/gc.log
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCApplicationConcurrentTime

We have separate dedicated configs for each storage engine we support, as each storage engine has vastly different behavior from the others. And we do not load more than one storage engine inside any one JVM, otherwise it causes the JVM to be too unstable and makes it almost impossible to tune.

Our servers are each hit by hundreds to thousands of clients, maintaining thousands to tens of thousands of connections and individual servers often handle tens of thousands of queries a second. Clusters can often serve more than half a million queries per second in our production infrastructure.

Brendan

xi...@tune.com

unread,
Mar 11, 2015, 4:07:59 PM3/11/15
to project-...@googlegroups.com
HI Brendan,

Here is the store.xml.  I will also go thought your sever setup and see if I can improve the performance.

Thanks,

Xinyu
<store>
  <name>clicks</name>
  <persistence>read-only</persistence>
  <routing>client</routing>
  <replication-factor>2</replication-factor>
  <required-reads>1</required-reads>
  <required-writes>1</required-writes>
  <key-serializer>
    <type>string</type>
  </key-serializer>
  <value-serializer>
    <type>string</type>
  </value-serializer>
  <retention-days>30</retention-days>
</store>
Reply all
Reply to author
Forward
0 new messages