Best Configuration for a Production Enviorment

169 views
Skip to first unread message

ctasada

unread,
Mar 1, 2012, 1:03:50 PM3/1/12
to project-...@googlegroups.com
Hi everyone,

Since some weeks ago I'm involved in a project that's using project-voldemort as a backend. We're having some performance issues due to an increment in the system load.

For what I've been reading in the group and checking the code I've some ideas, but I wanted to ask for your experience in the matter.

Current Configuration:
3 Nodes with replication factor 2, Hinted Handoff enabled and Read/writes 1. We're working with around 3 million keys and values around 1MB. The nodes are storing more than 60GB BDB data files (I need to check the BDB cleaner configuration since this seems quite big). The servers are 8CPU Linux 64b with 44GB memory, using 22GB in the JVM and 10GB BDB cache (which I known is really small right now)

I'm seeing BDB locks and high read/write loads in disk. My impression is that's caused by the values size and lack of enough cache for the BDB files, so that's what i'm planning to do:
- Rebalance the partitions in more nodes.
- Modify my code to decrease the value size to something smaller, probably between 100KB and 300KB (increasing the number of keys)

What I'm expecting to get are smaller BDB files that can fit better in cache and smaller values that can be read/write to disk faster.

What I'm not really sure about is:
- What's the best balance between JVM memory, BDB Cache and BDB size?
- Can you think of any problem running 2 Voldemort server instances in the same server? would be better to use different VM, when there're in the same physical server?
- Is there any kind of hardware recommendation or preferred configuration when working with Voldemort?

Your help is really appreciated.

Best Regards,
Carlos.

Mickey Hsieh

unread,
Mar 1, 2012, 1:49:47 PM3/1/12
to project-...@googlegroups.com
Given 3m (rows) * 1mb (data) * 2 (replica) /3  (node) = 2 TB /per node. How did you get 60GB  data size per node?  Am I missing some thing?

I can share with you my experience. We had using Voldemort Project for more 2 years neither BDB and MySql can not meet our requirement. We had developed our storage plug-in open source CacheStore (http://code.google.com/p/cachestore)  to handle the load and performance.

Here use case:
we have 600m (rows) * 1 kb (data) * 2 (replica) / 20 nodes = 60 GB / per node
1500 tps random read/write ( 500 read + 1000 write ) /  per seconds
on client side is around 5 - 7 ms/tps . on serve side is less than 1 ms/tps

If you are interested, I can chat and understand your requirement.

Mickey

--
You received this message because you are subscribed to the Google Groups "project-voldemort" group.
To view this discussion on the web visit https://groups.google.com/d/msg/project-voldemort/-/mEfpkMzAsQkJ.
To post to this group, send email to project-...@googlegroups.com.
To unsubscribe from this group, send email to project-voldem...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/project-voldemort?hl=en.

Liam Slusser

unread,
Mar 1, 2012, 2:14:04 PM3/1/12
to project-...@googlegroups.com
Have you looked into using SSD's for storage? We have around 5m keys
~18k per key using BDB and had performance issues with a 8 node
cluster setup. I replaced our cluster with 4 servers with 16g ram and
each with a raid10 SSD storage and haven't looked back.

liam

Mickey Hsieh

unread,
Mar 1, 2012, 3:11:08 PM3/1/12
to project-...@googlegroups.com
Definitely SSD is one possible solution, but a lot more expensive and growth capacity issue. We are using raid 10.

Mickey

Liam Slusser

unread,
Mar 1, 2012, 4:09:28 PM3/1/12
to project-...@googlegroups.com
This is true, SSD drives are expensive. We use 4 x 50g enterprise
Micron P300 SLC in a raid10 set on each of our V servers. Each drive
was around $400. Having 4 servers with an additional $1600 in SSD
storage is still cheaper then buying 12+ traditional servers, and
you'll have even more savings if you include power/hosting/maintenance
costs.

Honestly its pretty cheap performance. Micron lists each P300 as
being able to do sequential reads at 360 MB/s and up to 45K random
write IOPs. To get that many random IOPs performance from spinning
disks you would need a very large array. The power/space costs on
using a large traditional disk array vs a few SSDs over a year period
alone make it a smart choice.

And unless you're storing huge values or have billions of keys 100g
per server is a huge amount of space. We fit 5 million ~18k keys very
comfortably with lots of room to grow.

I've been using SSDs from both Intel and Micron for 3 years now in our
production enterprise environment with great results. The performance
is nothing short of outstanding. And as a bonus I've found them to be
more reliable then a spinning disk as well.

liam

ctasada

unread,
Mar 3, 2012, 11:04:01 AM3/3/12
to project-...@googlegroups.com
Hi Mickey,

You're right. I'm revisiting the numbers since they don't match.

I'll take a look to the CacheStore projects. Looks really interesting.
To post to this group, send email to project-voldemort@googlegroups.com.
To unsubscribe from this group, send email to project-voldemort+unsubscribe@googlegroups.com.

ctasada

unread,
Mar 3, 2012, 11:07:00 AM3/3/12
to project-...@googlegroups.com
Thanks to everyone for the answers. I'll check the possibility of SSD disk with the technical department.

What the maximum recommended size for the BDB data files in a single node? What's the maximum recommended BDB cache size?

I remember reading somewhere that's much better to have multiple small nodes. Asides from the replication advantages, is there any performance benefit?

Thanks again.

Lei Gao

unread,
Mar 15, 2012, 11:32:34 PM3/15/12
to project-...@googlegroups.com
Hi Carlos,

Our experience running voldemort at LinkedIn has been good. We have some GC issues after switching to SSD, which is affecting the 95th and 99th latency. But the average latency has improved greatly. And the throughput has improved a lot. We are still in the middle of evaluating the max. capacity for a node on SSD.

If you run on spindle disks, you don't want BDB files to be too big. We used to have 1GB jdb files and it didn't work well with BDB cleaners. 60MB seems working for us. We don't know the optimal jdb file size for SSD - but we have not seen any issues with 60MB jdb file currently.

Try 22GB heap with 2GB YoungGen and 10GB BDB cache size. But ultimately, you would want all your BDB index to fix in the cache. There is a BDB utility to compute the approximate index size given the # of entries.

Thanks,

Lei

ctasada

unread,
Mar 20, 2012, 12:06:12 PM3/20/12
to project-...@googlegroups.com
Thanks a lot Lei,

I'm double-checking my BDB files and the GC configuration.

I'll post the results as soon as I've some news.
Reply all
Reply to author
Forward
0 new messages