Thing cluster for large value

70 views
Skip to first unread message

Mital Parmar

unread,
Mar 8, 2016, 4:57:55 PM3/8/16
to project-voldemort
Hi

My cluster average put & get size is 90KB.  I am running on SSD based server and have 60 GB of jvm setting.  I setting db.cache.evictln=false.

Is there any other setting that I need to change to optimize my puts ?

If anyone tried earlier and played with various settings?

Thanks

Mital

Arunachalam

unread,
Mar 8, 2016, 6:15:05 PM3/8/16
to project-...@googlegroups.com
What is the latency you are getting for writes ? What version are you using on client and server ? 

We generally run with bdb.cache.evictln=true to get better performance.

Please read through the following to see what other settings you can try on.


Ability to move data off disk. This is very GC friendly, relying on OS page 
   cache for the data and using the JVM heap only for index. This is achieved 
   by setting "bdb.cache.evictln" server parameter to "true"
-- Ability to evict data brought into the cache during scans, minimize impact
   on online traffic (Restore, Rebalance, Retention). This is achieved by
   setting "bdb.minimize.scan.impact" to "true"
-- Thinner storage layer. eg: BdbStorageEngine.put() does not incur the cost 
   of an additional delete()


Thanks,
Arun.

--
You received this message because you are subscribed to the Google Groups "project-voldemort" group.
To unsubscribe from this group and stop receiving emails from it, send an email to project-voldem...@googlegroups.com.
Visit this group at https://groups.google.com/group/project-voldemort.
For more options, visit https://groups.google.com/d/optout.

Brendan Harris (a.k.a. stotch on irc.oftc.net)

unread,
Mar 8, 2016, 9:44:04 PM3/8/16
to project-voldemort
Mital,

Like Arun said, you should set bdb.cache.evictln=true. Setting it to false is an optimization for slow spinning disks at the cost of increased JVM heap usage (and consequently longer GC times), but you're on SSD. So, you'll actually have better performance if you set this to true.

Can you also give us more details on your read and write patterns?
- put() per second
- get() per second
- getAll() per second
- delete() per second
- For pu(), ratio of updates versus creates (overwrite versus new key)
- For getAll(), average and max number of keys per call
- Average and max key size
- Average and max value size
- How many stores on the cluster

You're best off making the JVM as small as possible, but you will need to have enough bdb cache to hold the hotset of the indexes of the stores.

A quick performance enhancement out of the box is to remove the ReadOnlyStorageEngineConfiguration from the storage.configs parameter. That way you only have the BdbStorageEngine running in your app.

This is one of our cluster configurations that hosts 50 stores and gets about a peak of 70,000 writes a second against billions of keys:
admin.max.threads=40
bdb.cache.evictln=true
bdb.cache.size=20GB
bdb.checkpoint.interval.bytes=2147483648
bdb.cleaner.interval.bytes=15728640
bdb.cleaner.lazy.migration=false
bdb.cleaner.min.file.utilization=0
bdb.cleaner.threads=1
bdb.enable=true
bdb.evict.by.level=true
bdb.expose.space.utilization=true
bdb.lock.nLockTables=47
bdb.minimize.scan.impact=true
bdb.one.env.per.store=true
bdb.raw.property.string=je.cleaner.adjustUtilization=false
data.directory=${voldemort.data.dir}
enable.server.routing=false
enable.verbose.logging=false
http.enable=false
max.proxy.put.threads=50
nio.connector.selectors=50
num.scan.permits=2
restore.data.timeout.sec=1314000
retention.cleanup.first.start.hour=3
scheduler.threads=24
storage.configs=voldemort.store.bdb.BdbStorageConfiguration
stream.read.byte.per.sec=209715200
stream.write.byte.per.sec=78643200
voldemort.home=${voldemort.home.dir}

Some of the settings, like bdb.cleaner.threads, bdb.checkpoint.interval.bytes and bdb.cleaner.interval.bytes depend heavily on how frequently you create new keys and overwrite existing keys and how large the average and peak write sizes are.

We host that config in a 31gb Xms/Xmx JVM heap with UseCompressedOops set.

Mital Parmar

unread,
Mar 9, 2016, 12:50:35 PM3/9/16
to project-voldemort
Thanks guys for your feedback.

Here is the changes that I did and response time ...

(1) Earlier I was running on slow storage in which my put was taking 70 millisec.  

(2) Next, I moved to SSD and it reduced to 27 millisec   

(3) Observed the GC pressure and bumped up the parnew setting.  Write latency reduced from 27 to 10 millisec

(4) Bump up JVM from 40 to 60 GB (40 GB BDB cache) and disabled bdb.cache.evictln.  Now, my write latency is 6 millisec.

I do not know how the Voldemort manages the 90KB writes.  I think the default buffer size is 64KB.  We are on 1.10.  

My reads are also 80KB.

Do you think I should increase  default buffer size to 128KB or higher?  Is this source code changes or property changes ?

If anyone played with in the past and/or see any issue with this ?

Thanks

Mital

Arunachalam

unread,
Mar 9, 2016, 1:45:44 PM3/9/16
to project-...@googlegroups.com
Are the latency numbers you mention Average or a Percentile ? If Percentile, what is the number (90, 95, 99).

Are the clients running on the latest version too ?  The clients should be preferably 1.10 as well for it to make use of many performance optimizations we did.

Voldemort writes normally involves two round trips to Server ( first it reads the version of the record) then writes a new version.  So 6ms round trip on end to end latency is good.

If you are always doing read/modify/write, you will already have the version, which can avoid one round trip and cuts two round trips to 1.

Thanks,
Arun.

--

Mital Parmar

unread,
Mar 9, 2016, 4:40:59 PM3/9/16
to project-voldemort
Sorry Arun, I forgot to mentioned the client version.  Yes, client is also running on 1.10.  

The value that I mentioned was average.  95% 10 millisec and 99% 18.7 millisec.

In my other Voldemort implementation, I am getting in microsec, so I am trying if I can further improve my write performance.   By the way, reads, I am seeing 0.66 milisec average response time. 

Would you recommend to change any other setting that might be worth it to use for large value size ?

Thanks

Mital

On Tuesday, March 8, 2016 at 1:57:55 PM UTC-8, Mital Parmar wrote:

Arunachalam

unread,
Mar 9, 2016, 5:33:22 PM3/9/16
to project-...@googlegroups.com
Are you measuring the latency on the Server side or client side ? Did you try the parameters that Brendan mentioned ? What is your check pointing BDB interval ?

Also do you have high contention on write keys ? Brendan has more experience with BDB tuning than me. 

Thanks,
Arun.

--

Mital Parmar

unread,
Mar 9, 2016, 9:05:25 PM3/9/16
to project-voldemort
Hi Arun

I am reviewing the parameters that Brendan suggested and comparing against my setup.

I am measuring server side put & get latency.  

Thanks

Mital

On Tuesday, March 8, 2016 at 1:57:55 PM UTC-8, Mital Parmar wrote:

Mital Parmar

unread,
Mar 14, 2016, 12:57:30 PM3/14/16
to project-voldemort
Reviewed the parameter suggested by Brendan and found some differences:

I am not setting these properties in 1.10

enable.bdb.engine=true
bdb.sync.transactions=false ( I am setting bdb.write.transactions=false & bdb.flush.transactions=false )
bdb.enable=true
bdb.evict.by.level=true
enable.server.routing=false  
restore.data.timeout.sec=1314000
scheduler.threads=24
stream.read.byte.per.sec=209715200
stream.write.byte.per.sec=78643200

Is the above changes safe to do in live cluster ?  Since I am not specifying, seems like I am user server.routing which is not recommended ??

FYI ... here is my server.properties.  This is read/write cluster.  

max.threads=100
http.enable=true
socket.enable=true
# BDB
bdb.write.transactions=false
bdb.flush.transactions=false
bdb.cache.size=30g
bdb.lock.read_uncommitted=false
bdb.one.env.per.store=true
bdb.lock.nLockTables=47
bdb.checkpointer.off.batch.writes=true
bdb.cleaner.interval.bytes=15728640
bdb.cleaner.lazy.migration=false
bdb.cleaner.min.file.utilization=0
bdb.cleaner.threads=1
bdb.cache.evictln=false  
bdb.minimize.scan.impact=true
enable.nio.connector=true
socket.keepalive=true
nio.connector.selectors=64
enable.readonly.engine=false
request.format=vp3
storage.configs=voldemort.store.bdb.BdbStorageConfiguration


Thanks

Mital

On Tuesday, March 8, 2016 at 1:57:55 PM UTC-8, Mital Parmar wrote:

Abhay Kumar Dwivedi

unread,
Jul 5, 2018, 9:45:23 AM7/5/18
to project-voldemort
Hi Zonia,

I am upgrading our production voldemort to 1.10.25. It will be very helpful if you answer my below queries:

1. How voldemort cleaning data from cache?
2. How voldemort is doing memery management ? 
3. Can we increase/decrease number of partitions from 2048?
4. How to configure local cache and global cache in voldemort?
5. Can we mention replication factor 3(for example) for global strore while there is no relication factor for local store?

My server.properties configuration for 0.90.X version voldemort is :

max.threads=200

############### DB options ######################

http.enable=false
socket.enable=true
#jmx.enable=true

# BDB
bdb.write.transactions=false
bdb.flush.transactions=false
bdb.cache.size=2000MB

#New Token
#The number of threads to keep alive even when idle 
core.threads  = 100

#Essentially the amount of time to block on a low-level network operation before throwing an error.

#The total amount of time to wait for adequate responses from all nodes before throwing an error.

thanks in advance..

Regards,
Abhay
Reply all
Reply to author
Forward
0 new messages