|Scaling up or out||Ken Perkins||12/5/12 4:33 PM|
We're seeing enough thrashing and low-memory on our production ring that we've decided to upgrade our hardware. The real question is should we scale up or out.
Currently our ring is 512 partitions. We know that it's a sub-optimal size but we can't easily solve that now. We're currently running a search-heavy app on 5 8GB VMs. I'm debating between moving the VMs up to 16GB, or adding a few more 8GB VMs.
Some of the talk in #riak has pushed me towards adding more machines (thus lowering the per node number of partitions) but I wanted to do a quick sanity check here with folks that it's better than scaling up my current machines.
|Re: Scaling up or out||Michael Johnson||12/5/12 4:47 PM|
There are a lot of things that go into this, but I would tend to suggest in a hosted VM senario, upping the ram is likely the right solution.
You mention thrashing, but what is that thrashing coming from? I assume all the boxes are thrashing and not just one or two of them? Is it due to swapping or is it just the raw disk access? Maybe you logging too aggressively?
Perhaps your are suffering from a bad neighbor effect. If this is the case, increasing the amount of ram will likely put you on a physical host with few customers and thus you would be less likely to have a bad neighbor.
Cost-wise in the VM world, you might be better off adding a few nodes rather than increasing the ram in your existing vm's.
But then we are talking VMs and thus it should be fairly painless to experiment. I would try adding ram first and if that doesn't work, add a few nodes. Someone else my have a different opinion, but that is my two cents.
|Re: Scaling up or out||Alexander Sicular||12/5/12 4:48 PM|
Ya, I would probably say more vm's mean less vnodes per vm. I would most likely go in that direction...
|Re: Scaling up or out||Ken Perkins||12/5/12 5:41 PM|
Yes, we're thrashing on all of the boxes, due to disk access when looking through merge_index. It's not noisy neighbors, given how consistent the thrashing is. We had a box with a corrupted index (we had to remove merge_index and rebuild) and that machine instantly went to 0% thrashing. So we have a pretty good indication of the source.
The cost for 10 8GB VMs is roughly equivalent to 5 16GB ones.
Thanks for your input Michael!
|Re: Scaling up or out||Sean Carey||12/5/12 7:45 PM|
Are your vms on different bare metal? Could they potentially be on the same bare metal?
Are you seeing any io contention?
|Re: Scaling up or out||Ken Perkins||12/5/12 9:15 PM|
VMs, not the same host, rackspace has VM affinity to protect against that. We do see a fair amount of IO Wait.
Rackspace has a new affinity based SSD block device service that I plan to evaluate, but I'm not ready for that in production.
|Re: Scaling up or out||Sean Carey||12/5/12 9:22 PM|
Fair amount? < 5% or > 20%
If there's iowait and memory issues, adding nodes could alleviate that. If there's almost no iowait or minimal iowait, adding memory will help. Also, tuning vm.dirty on linux might get you more memory and less iowait. Or at least more consistent iowait.
Which linux distro are you on and which scheduler are you using?
|Re: Scaling up or out||Ken Perkins||12/6/12 8:49 AM|
We're around ~20% Swapping, IO Wait in the 5-25% range, depending on the machine.
We're running Lucid, with the deadline scheduler. I'm strongly biasing towards adding a few more nodes, but I'm not married to it :)
|Re: Scaling up or out||Sean Carey||12/6/12 10:21 AM|
If your jumping between 5-25% iowait, I'd add nodes. Also tuning pdflush will help with the jumpy iowait.
Lucid by default uses up to 20% of your ram before flushing.
So say you have 10gb of ram, your system could be trying to flush 2gb of data causing huge iowait spikes.