Thing cluster for large value

Mital Parmar

unread,

Mar 8, 2016, 4:57:55 PM3/8/16

to project-voldemort

Hi

My cluster average put & get size is 90KB. I am running on SSD based server and have 60 GB of jvm setting. I setting db.cache.evictln=false.

Is there any other setting that I need to change to optimize my puts ?

If anyone tried earlier and played with various settings?

Thanks

Mital

Arunachalam

unread,

Mar 8, 2016, 6:15:05 PM3/8/16

to project-...@googlegroups.com

What is the latency you are getting for writes ? What version are you using on client and server ?

We generally run with bdb.cache.evictln=true to get better performance.

Please read through the following to see what other settings you can try on.

https://github.com/voldemort/voldemort/blob/master/bin/PREUPGRADE_FOR_1_1_X_README

Ability to move data off disk. This is very GC friendly, relying on OS page

cache for the data and using the JVM heap only for index. This is achieved

by setting "bdb.cache.evictln" server parameter to "true"

-- Ability to evict data brought into the cache during scans, minimize impact

on online traffic (Restore, Rebalance, Retention). This is achieved by

setting "bdb.minimize.scan.impact" to "true"

-- Thinner storage layer. eg: BdbStorageEngine.put() does not incur the cost

of an additional delete()

Thanks,

Arun.

--
You received this message because you are subscribed to the Google Groups "project-voldemort" group.
To unsubscribe from this group and stop receiving emails from it, send an email to project-voldem...@googlegroups.com.
Visit this group at https://groups.google.com/group/project-voldemort.
For more options, visit https://groups.google.com/d/optout.

Brendan Harris (a.k.a. stotch on irc.oftc.net)

unread,

Mar 8, 2016, 9:44:04 PM3/8/16

to project-voldemort

Mital,

Like Arun said, you should set bdb.cache.evictln=true. Setting it to false is an optimization for slow spinning disks at the cost of increased JVM heap usage (and consequently longer GC times), but you're on SSD. So, you'll actually have better performance if you set this to true.

Can you also give us more details on your read and write patterns?

- put() per second

- get() per second

- getAll() per second

- delete() per second

- For pu(), ratio of updates versus creates (overwrite versus new key)

- For getAll(), average and max number of keys per call

- Average and max key size

- Average and max value size

- How many stores on the cluster

You're best off making the JVM as small as possible, but you will need to have enough bdb cache to hold the hotset of the indexes of the stores.

A quick performance enhancement out of the box is to remove the ReadOnlyStorageEngineConfiguration from the storage.configs parameter. That way you only have the BdbStorageEngine running in your app.

This is one of our cluster configurations that hosts 50 stores and gets about a peak of 70,000 writes a second against billions of keys:

admin.max.threads=40

bdb.cache.evictln=true

bdb.cache.size=20GB

bdb.checkpoint.interval.bytes=2147483648

bdb.cleaner.interval.bytes=15728640

bdb.cleaner.lazy.migration=false

bdb.cleaner.min.file.utilization=0

bdb.cleaner.threads=1

bdb.enable=true

bdb.evict.by.level=true

bdb.expose.space.utilization=true

bdb.lock.nLockTables=47

bdb.minimize.scan.impact=true

bdb.one.env.per.store=true

bdb.raw.property.string=je.cleaner.adjustUtilization=false

data.directory=${voldemort.data.dir}

enable.server.routing=false

enable.verbose.logging=false

http.enable=false

max.proxy.put.threads=50

nio.connector.selectors=50

num.scan.permits=2

restore.data.timeout.sec=1314000

retention.cleanup.first.start.hour=3

scheduler.threads=24

slop.frequency.ms=600000

storage.configs=voldemort.store.bdb.BdbStorageConfiguration

stream.read.byte.per.sec=209715200

stream.write.byte.per.sec=78643200

voldemort.home=${voldemort.home.dir}

Some of the settings, like bdb.cleaner.threads, bdb.checkpoint.interval.bytes and bdb.cleaner.interval.bytes depend heavily on how frequently you create new keys and overwrite existing keys and how large the average and peak write sizes are.

We host that config in a 31gb Xms/Xmx JVM heap with UseCompressedOops set.

Mital Parmar

unread,

Mar 9, 2016, 12:50:35 PM3/9/16

to project-voldemort

Thanks guys for your feedback.

Here is the changes that I did and response time ...

(1) Earlier I was running on slow storage in which my put was taking 70 millisec.

(2) Next, I moved to SSD and it reduced to 27 millisec

(3) Observed the GC pressure and bumped up the parnew setting. Write latency reduced from 27 to 10 millisec

(4) Bump up JVM from 40 to 60 GB (40 GB BDB cache) and disabled bdb.cache.evictln. Now, my write latency is 6 millisec.

I do not know how the Voldemort manages the 90KB writes. I think the default buffer size is 64KB. We are on 1.10.

My reads are also 80KB.

Do you think I should increase default buffer size to 128KB or higher? Is this source code changes or property changes ?

If anyone played with in the past and/or see any issue with this ?

Thanks

Mital

Arunachalam

unread,

Mar 9, 2016, 1:45:44 PM3/9/16

to project-...@googlegroups.com

Are the latency numbers you mention Average or a Percentile ? If Percentile, what is the number (90, 95, 99).

Are the clients running on the latest version too ? The clients should be preferably 1.10 as well for it to make use of many performance optimizations we did.

Voldemort writes normally involves two round trips to Server ( first it reads the version of the record) then writes a new version. So 6ms round trip on end to end latency is good.

If you are always doing read/modify/write, you will already have the version, which can avoid one round trip and cuts two round trips to 1.

Thanks,

Arun.

--

Mital Parmar

unread,

Mar 9, 2016, 4:40:59 PM3/9/16

to project-voldemort

Sorry Arun, I forgot to mentioned the client version. Yes, client is also running on 1.10.

The value that I mentioned was average. 95% 10 millisec and 99% 18.7 millisec.

In my other Voldemort implementation, I am getting in microsec, so I am trying if I can further improve my write performance. By the way, reads, I am seeing 0.66 milisec average response time.

Would you recommend to change any other setting that might be worth it to use for large value size ?

Thanks

Mital

On Tuesday, March 8, 2016 at 1:57:55 PM UTC-8, Mital Parmar wrote:

Arunachalam

unread,

Mar 9, 2016, 5:33:22 PM3/9/16

to project-...@googlegroups.com

Are you measuring the latency on the Server side or client side ? Did you try the parameters that Brendan mentioned ? What is your check pointing BDB interval ?

Also do you have high contention on write keys ? Brendan has more experience with BDB tuning than me.

Thanks,

Arun.

--

Mital Parmar

unread,

Mar 9, 2016, 9:05:25 PM3/9/16

to project-voldemort

Hi Arun

I am reviewing the parameters that Brendan suggested and comparing against my setup.

I am measuring server side put & get latency.

Thanks

Mital

On Tuesday, March 8, 2016 at 1:57:55 PM UTC-8, Mital Parmar wrote:

Mital Parmar

unread,

Mar 14, 2016, 12:57:30 PM3/14/16

to project-voldemort

Reviewed the parameter suggested by Brendan and found some differences:

I am not setting these properties in 1.10

enable.bdb.engine=true

bdb.sync.transactions=false ( I am setting bdb.write.transactions=false & bdb.flush.transactions=false )

bdb.enable=true

bdb.evict.by.level=true

enable.server.routing=false

restore.data.timeout.sec=1314000

scheduler.threads=24

stream.read.byte.per.sec=209715200

stream.write.byte.per.sec=78643200

Is the above changes safe to do in live cluster ? Since I am not specifying, seems like I am user server.routing which is not recommended ??

FYI ... here is my server.properties. This is read/write cluster.

max.threads=100

http.enable=true

socket.enable=true

# BDB

bdb.write.transactions=false

bdb.flush.transactions=false

bdb.cache.size=30g

bdb.lock.read_uncommitted=false

bdb.one.env.per.store=true

bdb.lock.nLockTables=47

bdb.checkpointer.off.batch.writes=true

bdb.cleaner.interval.bytes=15728640

bdb.cleaner.lazy.migration=false

bdb.cleaner.min.file.utilization=0

bdb.cleaner.threads=1

bdb.cache.evictln=false

bdb.minimize.scan.impact=true

enable.nio.connector=true

socket.keepalive=true

nio.connector.selectors=64

enable.readonly.engine=false

request.format=vp3

storage.configs=voldemort.store.bdb.BdbStorageConfiguration

Thanks

Mital

On Tuesday, March 8, 2016 at 1:57:55 PM UTC-8, Mital Parmar wrote:

Abhay Kumar Dwivedi

unread,

Jul 5, 2018, 9:45:23 AM7/5/18

to project-voldemort

Hi Zonia,

I am upgrading our production voldemort to 1.10.25. It will be very helpful if you answer my below queries:

1. How voldemort cleaning data from cache?

2. How voldemort is doing memery management ?

3. Can we increase/decrease number of partitions from 2048?

4. How to configure local cache and global cache in voldemort?
5. Can we mention replication factor 3(for example) for global strore while there is no relication factor for local store?

My server.properties configuration for 0.90.X version voldemort is :

node.id=0

max.threads=200

############### DB options ######################

http.enable=false

socket.enable=true

#jmx.enable=true

# BDB

bdb.write.transactions=false

bdb.flush.transactions=false

bdb.cache.size=2000MB

#New Token

#The number of threads to keep alive even when idle

core.threads = 100

#Essentially the amount of time to block on a low-level network operation before throwing an error.

socket.timeout.ms = 5000

#The total amount of time to wait for adequate responses from all nodes before throwing an error.

routing.timeout.ms = 5000

thanks in advance..

Regards,

Abhay

Reply all

Reply to author

Forward