what is the proper way to create more clients to test the reading performance?

xi...@tune.com

unread,

Mar 6, 2015, 1:00:14 PM3/6/15

to project-...@googlegroups.com

Hi,

We want to test the voldemort cluster's maximum throughput(server side). So we are trying to open as many clients on our testing node to query the store.

But the problem I am having right now is that I can only create up to 47 clients in the program. What is the proper setup to increase that number to 200 or 500?

Thanks,

Xinyu

Brendan Harris (a.k.a. stotch on irc.oftc.net)

unread,

Mar 6, 2015, 1:38:48 PM3/6/15

to project-...@googlegroups.com

Hi Xinyu,

First of all, how are you creating your clients?

Second, I recommend against trying to create a bunch of clients in one JVM instance or even on one host. You'll hit all kinds of limiting factors both from the JVM and from the host.

For ideal testing, I would not recommend creating more than 4 clients per JVM instance and no more than one or two JVMs per CPU core (depending on the CPU type). You'll need to set your client's max_connections and selectors a little high depending on how many QPS you do. Try starting out with max_connections 100 and selectors 50. You might need them higher than that, though. But the more connections you allow, the more memory you consume.

On the server side, you'll probably need to set the selectors higher than default too.

You'll also want to mix up your queries and have a large number of keys to work with to better ensure that you're not getting results purely from memory.

Keep an eye on both the client and server JVM GC activity and make sure that you're not falling behind on freeing up memory or failing to allocate memory. If you are, tune your JVM(s) appropriately.

Brendan

xi...@tune.com

unread,

Mar 6, 2015, 1:53:58 PM3/6/15

to project-...@googlegroups.com

Hi Brendan,

Thanks for the answer. Currently I create one SocketStoreClientFactory in main program and create one client for each thread I am using to query the server. With 20 clients we can get about 30,000 request /s, which is pretty low. The server side is not busy at all.

The testing node has 16 cores and 30GB memory.

Xinyu

Arunachalam

unread,

Mar 6, 2015, 2:41:50 PM3/6/15

to project-...@googlegroups.com

Xinyu,

The SocketStoreClientFactory holds the resources to talk to Server. There are many parameters to tune, but the most interesting ones are connections per node and selectors. The default is 5, increase it to 200, so 200 parallel requests can happen. You also might need to tune the selectors on the server side.

But start with the client.

ClientConfig clientConfig = new ClientConfig().setBootstrapUrls(url)

.setMaxConnectionsPerNode(200)

.setSelectors(30)

new SocketStoreClientFactory(clientConfig);

Creating clients does not increase the resources, infact it does not make a difference if you use the same client or new client from the same factory. Try increaing the socketStoreClientFactory. At some point, your network is going to become the bottle neck so you need to increase the number of clients.

Thanks,

Arun.

--
You received this message because you are subscribed to the Google Groups "project-voldemort" group.
To unsubscribe from this group and stop receiving emails from it, send an email to project-voldem...@googlegroups.com.
Visit this group at http://groups.google.com/group/project-voldemort.
For more options, visit https://groups.google.com/d/optout.

Felix GV

unread,

Mar 6, 2015, 3:07:32 PM3/6/15

to project-...@googlegroups.com

Hi,

I'd be curious to know how many client machines and server machines are you using for your test.

Thanks (:

--

Felix GV
Data Infrastructure Engineer
Distributed Data Systems
LinkedIn

f...@linkedin.com
linkedin.com/in/felixgv

From: project-...@googlegroups.com [project-...@googlegroups.com] on behalf of Arunachalam [arunac...@gmail.com]
Sent: Friday, March 06, 2015 11:41 AM
To: project-...@googlegroups.com
Subject: Re: [project-voldemort] Re: what is the proper way to create more clients to test the reading performance?

Brendan Harris (a.k.a. stotch on irc.oftc.net)

unread,

Mar 7, 2015, 3:41:57 AM3/7/15

to project-...@googlegroups.com

Hi Xinyu,

Like Arun said, and like I said earlier, you're going to hit both JVM and host (possibly even network) limitations if you don't spin up more individual JVM instances and spread them out. Just because you have a lot of RAM and a few multicore CPUs in one host does not mean that you're going to be able to drive a true stress-test workload out of one JVM running a bunch of clients. Also, like I mentioned before, you're going to need to watch your JVMs' GC activity and will likely have to do some JVM tuning to get past those limiting factors.

And like Felix said, it would also help do know how many hosts you're working with (looks like you're using just one client host? But how many voldemort servers?) Can you also past your server.xml, stores.xml and server.properties?

Brendan

xi...@tune.com

unread,

Mar 11, 2015, 2:06:52 PM3/11/15

to project-...@googlegroups.com

Hi Brendan,

I think I reached the bottleneck on the client side. I am using 4 r3.2xlarge nodes( each has 8 core 61GB RAM) for the server. The testing node to connect to the server is r3.4xlarge (16 core 30GB RAM)

I have tried to increase the connection on the client side to setMaxConnectionsPerNode(500).setSelectors(100). But this didn't improve the performance at all.

Thanks,

Xinyu

cluster.xml

<name>dataflow</name>

<host>ec2-54-224-29-151.compute-1.amazonaws.com</host>

<http-port>8085</http-port>

<socket-port>6666</socket-port>

<admin-port>6667</admin-port>

</server>

<host>ec2-54-166-210-210.compute-1.amazonaws.com</host>

<http-port>8082</http-port>

<socket-port>6668</socket-port>

<admin-port>6669</admin-port>

</server>

<host>ec2-54-162-63-13.compute-1.amazonaws.com</host>

<http-port>8083</http-port>

<socket-port>6670</socket-port>

<admin-port>6671</admin-port>

</server>

<host>ec2-54-224-81-104.compute-1.amazonaws.com</host>

<http-port>8084</http-port>

<socket-port>6672</socket-port>

<admin-port>6673</admin-port>

</server>

</cluster>

# The ID of *this* particular cluster node

node.id=1

max.threads=200

enable.repair=true

############### DB options ######################

http.enable=true

socket.enable=true

jmx.enable=true

enable.readonly.engine=true

file.fetcher.class=voldemort.store.readonly.fetcher.HdfsFetcher

# BDB

#bdb.write.transactions=false

#bdb.flush.transactions=false

#bdb.cache.size=1000MB

# Mysql

#mysql.host=localhost

#mysql.port=1521

#mysql.user=root

#mysql.password=3306

#mysql.database=olap

Brendan Harris (a.k.a. stotch on irc.oftc.net)

unread,

Mar 11, 2015, 2:42:36 PM3/11/15

to project-...@googlegroups.com

Hi Xinyu,

What about your stores.xml? Can you paste that too?

On Wednesday, March 11, 2015 at 11:06:52 AM UTC-7, xi...@tune.com wrote:

I think I reached the bottleneck on the client side. I am using 4 r3.2xlarge nodes( each has 8 core 61GB RAM) for the server. The testing node to connect to the server is r3.4xlarge (16 core 30GB RAM)

Yeah, it kind of sounds like it. You'll need a lot of parallel clients.

Did you take a look at both your client and server JVM GC activity to see if your garbage collection is causing the app to hang periodically and if that is increasing as you increase the throughput?

Our servers have been able to steadily handle more than 20K qps per server sustained throughout multiple days. I'd be really surprised if you could not come close to that.

Let me show you what our server config looks like ...

ReadOnlyStorageEngine servers:

admin.streams.buffer.size=1024

bdb.enable=false

data.directory=${voldemort.data.dir}

enable.grandfather=false

enable.nio.connector=true

enable.readonly.engine=true

enable.repair=false

enable.server.routing=false

enable.verbose.logging=false

file.fetcher.class=voldemort.store.readonly.fetcher.HdfsFetcher

hdfs.fetcher.buffer.size=16MB

hdfs.fetcher.tmp.dir=${voldemort.home.dir}/voldemort

fetcher.retry.count=12

fetcher.retry.delay.ms=10000

http.enable=false

jmx.enable=true

nio.connector.selectors=50

readonly.hadoop.config.path=${voldemort.home.dir}/config/hadoop

slop.enable=false

socket.buffer.size=65000

socket.enable=true

storage.configs=voldemort.store.readonly.ReadOnlyStorageConfiguration

voldemort.home=${voldemort.home.dir}

ReadOnly JVM config:

-server \

-Xms4096m \

-Xmx4096m \

-XX:NewSize=1024m \

-XX:MaxNewSize=1024m \

-XX:+UseConcMarkSweepGC \

-XX:+UseParNewGC \

-XX:CMSInitiatingOccupancyFraction=70 \

-XX:SurvivorRatio=2 \

-XX:+AlwaysPreTouch \

-XX:+UseCompressedOops \

-XX:+PrintTenuringDistribution \

-XX:+PrintGCDetails \

-XX:+PrintGCDateStamps \

-Xloggc:$LOG_DIR/gc.log \

-XX:+PrintGCApplicationStoppedTime \

-XX:+PrintGCTimeStamps \

-XX:+PrintGCApplicationConcurrentTime \

-Djava.net.preferIPv4Stack=true \

BdbStorageEngine servers:

admin.max.threads=40

bdb.cache.evictln=true

bdb.cache.size=20GB

bdb.checkpoint.interval.bytes=2147483648

bdb.cleaner.interval.bytes=15728640

bdb.cleaner.lazy.migration=false

bdb.cleaner.min.file.utilization=0

bdb.cleaner.threads=1

bdb.enable=true

bdb.evict.by.level=true

bdb.expose.space.utilization=true

bdb.lock.nLockTables=47

bdb.minimize.scan.impact=true

bdb.one.env.per.store=true

bdb.raw.property.string=je.cleaner.adjustUtilization=false

data.directory=${voldemort.data.dir}

enable.server.routing=false

enable.verbose.logging=false

http.enable=false

max.proxy.put.threads=50

nio.connector.selectors=50

num.scan.permits=2

restore.data.timeout.sec=1314000

retention.cleanup.first.start.hour=3

scheduler.threads=24

slop.frequency.ms=600000

storage.configs=voldemort.store.bdb.BdbStorageConfiguration

stream.read.byte.per.sec=209715200

stream.write.byte.per.sec=78643200

voldemort.home=${voldemort.home.dir}

BDB JVM config:

-server

-Xms32684m

-Xmx32684m

-XX:NewSize=2048m

-XX:MaxNewSize=2048m

-XX:+UseConcMarkSweepGC

-XX:+UseParNewGC

-XX:CMSInitiatingOccupancyFraction=70

-XX:SurvivorRatio=2

-XX:+AlwaysPreTouch

-XX:+UseCompressedOops

-XX:+PrintTenuringDistribution

-XX:+PrintGCDetails

-XX:+PrintGCDateStamps

-Xloggc:$LOG_DIR/gc.log

-XX:+PrintGCApplicationStoppedTime

-XX:+PrintGCApplicationConcurrentTime

We have separate dedicated configs for each storage engine we support, as each storage engine has vastly different behavior from the others. And we do not load more than one storage engine inside any one JVM, otherwise it causes the JVM to be too unstable and makes it almost impossible to tune.

Our servers are each hit by hundreds to thousands of clients, maintaining thousands to tens of thousands of connections and individual servers often handle tens of thousands of queries a second. Clusters can often serve more than half a million queries per second in our production infrastructure.

Brendan

xi...@tune.com

unread,

Mar 11, 2015, 4:07:59 PM3/11/15

to project-...@googlegroups.com

HI Brendan,

Here is the store.xml. I will also go thought your sever setup and see if I can improve the performance.

Thanks,

Xinyu

<store>

<name>clicks</name>

<routing>client</routing>

<replication-factor>2</replication-factor>

<required-reads>1</required-reads>

<required-writes>1</required-writes>

<key-serializer>

<type>string</type>

</key-serializer>

<value-serializer>

<type>string</type>

</value-serializer>

<retention-days>30</retention-days>

</store>

Reply all

Reply to author

Forward