Voldemort rebalance throughput

chinmay gupte

unread,

Mar 14, 2016, 5:22:13 PM3/14/16

to project-...@googlegroups.com

Hi,

We are testing whether we can increase the throughput of rebalance process, specifically for zone expansion and were looking into these parameters,

stream.read.byte.per.sec
stream.write.byte.per.sec

We observed that even after setting these values to non-default ones in server.properties and bouncing the cluster to pick up the changes, the rebalance throughput continues to stay at 10 MBps which is the default value.

Also, setting higher value of rebalance parallelism seems to deteriorate the performance.

We are running off voldemort version 1.6.9.

Is there something we are missing? Also, what other parameters might dictate the rebalance throughput? (I see some mentioned here http://www.project-voldemort.com/voldemort/rebalance.html but want to get an opinion anyways)

Thanks,

Chinmay

Arunachalam

unread,

Mar 14, 2016, 9:32:34 PM3/14/16

to project-...@googlegroups.com

How many number of admin/scheduler threads you have ? Rebalancer operation runs on a scheduler thread and there are 6 scheduler threads by default.

Admin threads does not affect to the same level as scheduler threads, but if you are running out of them streaming could be impacted as well.

Thanks,

Arun.

--
You received this message because you are subscribed to the Google Groups "project-voldemort" group.
To unsubscribe from this group and stop receiving emails from it, send an email to project-voldem...@googlegroups.com.
Visit this group at https://groups.google.com/group/project-voldemort.
For more options, visit https://groups.google.com/d/optout.

chinmay gupte

unread,

Mar 14, 2016, 10:02:15 PM3/14/16

to project-voldemort

Hi Arun,

Which of the following two settings should we give a try ?

admin.max.threads=

async_job_thread_pool_size

We tried bumping up the admin.max.threads but it did not help.

Thanks,

Chinmay

Arunachalam

unread,

Mar 15, 2016, 1:41:02 AM3/15/16

to project-...@googlegroups.com

I believe the setting is called scheduler.threads

Thanks,

Arun.

chinmay gupte

unread,

Mar 15, 2016, 7:26:35 PM3/15/16

to project-...@googlegroups.com

Hi Arun,

We bumped up the scheduler.threads to 24 but it did not help. Here is what out server.properties looks like currently,

admin.enable=true

admin.max.threads=40

bdb.cache.evictln=true

bdb.cache.size=15GB

enable.bdb.engine=true

bdb.checkpoint.interval.bytes=2147483648

bdb.checkpointer.off.batch.writes=true

bdb.cleaner.interval.bytes=15728640

bdb.sync.transactions=false

bdb.cleaner.lazy.migration=false

bdb.cleaner.min.file.utilization=0

bdb.cleaner.threads=1

bdb.enable=true

bdb.evict.by.level=true

enable.readonly.engine=false

bdb.expose.space.utilization=true

bdb.lock.nLockTables=47

bdb.minimize.scan.impact=true

bdb.one.env.per.store=true

enable.server.routing=false

enable.verbose.logging=false

http.enable=true

nio.connector.selectors=50

num.scan.permits=2

request.format=vp3

restore.data.timeout.sec=1314000

scheduler.threads=24

slop.frequency.ms=300000

socket.enable=true

storage.configs=voldemort.store.bdb.BdbStorageConfiguration

stream.read.byte.per.sec=209715200

stream.write.byte.per.sec=78643200

client.max.connections.per.node=100

As you can see we have tried playing around with most of the settings which we think affect rebalance process. But we are not getting a consistent throughput of greater than 10 MBps. In some cases, the throughput on the nodes having rebalance work seems to rise to 20-30 MBps but then it settles down to 10 MBps again as the jobs get completed. But still it is way off stream.write.byte.per.sec setting of 78 MBps. We have a network bandwidth of 10 GBps available for use. We would really like to know what we are missing here, so any help is highly appreciated.

Thanks,

Chinmay

Arunachalam

unread,

Mar 16, 2016, 4:25:04 PM3/16/16

to project-...@googlegroups.com

Chinmay,

How many partitions you have ? Are you running on SSDs or Spinning drives ?

Can you also increase the num.scan.permits to probably 5 ?

Thanks,

Arun.

cgu...@apple.com

unread,

Mar 16, 2016, 8:37:48 PM3/16/16

to project-voldemort

Hi Arun,

We are using 114 partitions for this test cluster and running on SSDs, RAID 1+0.

We bumped the num.scan.permits setting but it did not help. Any other setting you can think which we should give a try? Right now, it looks like we have exhausted all the known settings and slowly moving into performance profiling domain :)

Thanks,

Chinmay

Arunachalam

unread,

Mar 16, 2016, 8:53:44 PM3/16/16

to project-...@googlegroups.com

At this point, I don't know what is going on. Seeing where is the bottleneck is difficult. CPU/Profiling/GC/IO activity will help to narrow down the problem.

Also 1.6.9 is older version of the code base and around 3 years old. There are few inefficiencies in the Admin code, but I don't believe that would be the bottleneck here.

Thanks,

Arun.

cgu...@apple.com

unread,

Mar 16, 2016, 9:33:34 PM3/16/16

to project-voldemort

Ok. Thanks for your help Arun. Let me start digging deeper and get back with the cause of this bottleneck, if I find anything interesting.

As for version 1.6.9, yes, we are slowly moving to 1.10 and will likely see performance improvements over there. But the clusters which we are zone-expanding are on 1.6.9 and its not possible for us right now to upgrade them to 1.10.

Cheers,

Chinmay

Arunachalam

unread,

Mar 16, 2016, 9:47:35 PM3/16/16

to project-...@googlegroups.com

1.6.9 to 1.10 does not have any backward incompatible changes. You should be able to shut it down, upgrade the binary and restart it on 1.10.

But you might need to run one node for a week in this configuration to see if you are not running into issues, before you upgrade your entire cluster.