We bumped up the scheduler.threads to 24 but it did not help. Here is what out server.properties looks like currently,
admin.enable=true
admin.max.threads=40
bdb.cache.evictln=true
bdb.cache.size=15GB
enable.bdb.engine=true
bdb.checkpointer.off.batch.writes=true
bdb.cleaner.interval.bytes=15728640
bdb.sync.transactions=false
bdb.cleaner.lazy.migration=false
bdb.cleaner.min.file.utilization=0
bdb.cleaner.threads=1
bdb.enable=true
bdb.evict.by.level=true
enable.readonly.engine=false
bdb.expose.space.utilization=true
bdb.lock.nLockTables=47
bdb.minimize.scan.impact=true
bdb.one.env.per.store=true
enable.server.routing=false
enable.verbose.logging=false
http.enable=true
nio.connector.selectors=50
num.scan.permits=2
request.format=vp3
restore.data.timeout.sec=1314000
scheduler.threads=24
socket.enable=true
storage.configs=voldemort.store.bdb.BdbStorageConfiguration
stream.read.byte.per.sec=209715200
stream.write.byte.per.sec=78643200
client.max.connections.per.node=100
As you can see we have tried playing around with most of the settings which we think affect rebalance process. But we are not getting a consistent throughput of greater than 10 MBps. In some cases, the throughput on the nodes having rebalance work seems to rise to 20-30 MBps but then it settles down to 10 MBps again as the jobs get completed. But still it is way off stream.write.byte.per.sec setting of 78 MBps. We have a network bandwidth of 10 GBps available for use. We would really like to know what we are missing here, so any help is highly appreciated.