Hey guys,
So I've been resisting using RAID in EC2 for a long time now, mostly
because of things people have told me (e.g. one slow drive in your
array can slow everything down), and various articles I've read
online. For example:
http://www.nevdull.com/2008/08/24/why-raid-10-doesnt-help-on-ebs/
So overall, it seemed like RAID just wasn't worth the extra effort and
complexity. But more recently, I've been reading more positive things
about using RAID on EC2. I know that MongoDB recommends a RAID 10
configuration on production clusters now, and since it looks like
we're starting to hit an IO bottleneck here, I figured I should at
least give it a try in our testing environment.
So first of all, I was curious about how people were configuring RAID
in their environments. Anyone care to share their experiences and/or
mdadm commands? Based on what I've read so far (e.g.
http://orion.heroku.com/past/2009/7/29/io_performance_on_ebs/),
something like this seems reasonable:
# mdadm --create /dev/md0 --level 10 --chunk 256 --raid-devices 4 /
dev/sdc /dev/sdd /dev/sde /dev/sdf
# blockdev --setra 65536 /dev/md0
Though it looks like there are actually a bunch of different ways of
setting up RAID 10 with mdadm, and I wasn't sure if any one way was
more correct than the others (at least as it relates to MongoDB). It
would be nice if the recommended commands were somewhere in the
MongoDB manual (if they are there, I couldn't find them).
So once the RAID device is created, is everyone using (or is it
recommended to use) LVM on top of that? I can see how that could be
useful to resize the volume. But then again, is it even possible to
resize the underlying RAID device once it's created? Excuse my
ignorance here. I've never actually tried to permanently add/remove
devices from a RAID device like this.
And lastly, when using RAID on EC2, is it necessary to keep track of
which volumes are attached as which devices? It seems like you'd need
to know this in case the instance ever got terminated or whatever. Or
can mdadm automatically reassemble a RAID somehow device given the
correct list of EBS volumes? I just want to make sure I'm keeping
track of everything I might need to know in case an instance
disappears.
This is great. Thanks! I have a couple questions though:
Is this really true? I've found that this command works fine *without*
"We need to create physical partitions on the volumes that we mapped
from
EBS"
partitioning the underlying RAID devices:
# mdadm --create /dev/md0 --force --metadata=1.1 --level 10 --chunk
256 --raid-devices 4 /dev/sdc /dev/sdd /dev/sde /dev/sdf
Is there some reason why everyone creates partitions first? I've
actually moved away from doing this when I know I'm going to use the
entire device and/or when I think I might expand it someday. For the
latter, this means I can always skip the partition expansion step. I
can just grow the filesystem and be done. But maybe I'm missing
something here?
Secondly, this guy reported that "Larger chunk sizes on the raid made
a (shockingly) HUGE difference in performance. The sweet spot seemed
to be at 256k."
This would mean your script should include the option "--chunk 256."
He also says "A larger read ahead buffer on the raid also made a HUGE
difference. I bumped it from 256 bytes to 64k." I believe the way to
set this is by running the following command:
Any comments on what this guy has to say on his blog?
# blockdev --setra 65536 /dev/md0
I also see a lot of people using the "--metadata=1.1" option. My
understanding is that this allows you to create much larger RAID
devices made up of many more phyical devices. Any reason why we
*shouldn't* use this? I noticed that your script doesn't....
this leads me to another question. At some point, your data is going to be
so large that you'll want to start sharding. So I'm guessing it
wouldn't really make sense to have a >2T disk when a single MongoDB
won't perform well with that much data. So are there any
recommendations on how big our disks should be? Maybe a quick rule of
thumb formula or some sort? At the moment, I'm creating 500GB volumes
for each MongoDB server (which has ~35GB of memory).
>For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
> read more »
--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
you mentioned that you tested both with and without LVM ... and I was
just curious to know if there were any performance benefits to using/
not using LVM.
Many thanks,
Dominik
On May 19, 1:46 pm, "Brendan W. McAdams" <bren...@10gen.com> wrote:
> Sorry for the delay, been making sure I have all the right info and check a
> few things. See answers inline, below.
>
> On Tue, May 17, 2011 at 5:55 PM, Michael Conigliaro <
>
> mike.conigli...@livingsocial.com> wrote:
> > Hey guys,
>
> > So I've been resisting using RAID in EC2 for a long time now, mostly
> > because of things people have told me (e.g. one slow drive in your
> > array can slow everything down), and various articles I've read
> > online. For example:
>
> >http://www.nevdull.com/2008/08/24/why-raid-10-doesnt-help-on-ebs/
>
> Context matters, and these kinds of benchmarks are definitely all over the
> place ( I mean there's lots of them not that they vary in content per se
> even if they do ).
>
> Unfortunately, I haven't been able to pull up the original benchmark he
> links to (or find a copy in google cache) --- partly to get a feel for what
> his "F2" variant of RAID 10 is (I suspect the one I usually build out isn't
> "F2"; I know in general how F2 looks but not how to get it setup safely on
> EBS). [Addendum after I wrote this bit] It was pointed out in discussion
> here at our office that one of the classic problems faced with any disk
> management software such as RAIDs on EBS is that they incorrectly assume a
> standard physical disk layout and profile, which EBS is definitely not. Our
> brief look through of F2 (and similar "far" configs) on this end looks like
> it is a software RAID variant designed very much to optimize for physical
> disk and spindle layouts, the concepts wouldn't hold up on EBS and are
> likely not a safe way to go.
>
> Many of these benchmarks push the *max throughput* numbers they see from
> various setups. The one linked in particular is pointing out that a single
> drive maxed out at 65 Mb/sec versus RAID 10 maxing out at 55 Mb/sec.
> Notably we are talking about *maxing out* as well as *single drive*. The
> reality of what we're most concerned with on EBS however is different:
> *inconsistent
> performance* and *failure tolerance*.
>
> That is to say --- the nature of RAID 10 is more likely to give us a
> consistently high *average throughput* versus a single disk. If the single
> disk slows down at all everything slows down. And of course, if that single
> disk fails any apprehensiveness about RAID on EBS quickly turns to regret
> for not having RAID ;)
>
> > So overall, it seemed like RAID just wasn't worth the extra effort and
> > complexity. But more recently, I've been reading more positive things
> > about using RAID on EC2. I know that MongoDB recommends a RAID 10
> > configuration on production clusters now, and since it looks like
> > we're starting to hit an IO bottleneck here, I figured I should at
> > least give it a try in our testing environment.
>
> Tying this into my previous block, keep in mind that Maximum Throughput is
> not comorbid with the needs of a typical Database Workload.
>
> What we've found, in general, is that for a Database Workload (and
> specifically MongoDB) RAID 10 on EBS makes the absolute most sense. The
> most important bit of RAID 10 (at least as done by Linux's software RAID
> tools) is that each disk access gets ultimately split into full speed disk
> accesses to different drives. You get a lot of the read/write performance
> of RAID 0 but you don't rely on the stripe being on both drives. This seems
> to fit particularly well with the EBS model and the question of "What does
> the physical layout of EBS' underlying disks look like in comparison to an
> actual physical raw, bought it at best buy and plugged it in disk". This
> paired with the fact that underneath the stripe we have mirroring gives you
> redundancy also, that parts of the stripe can fail or ... more important on
> EBS: *get slower.*
> backups *without locking MongoDB* and use the journal to safely use that
> backup, thereby having a non blocking reliable backup of MongoDB using the
> system level tools?‡". As part of that I spent a lot of time putting
> together scripts to bring up a full LVM RAID 10 quickly on top of 4 EBS
> volumes.
>
> I can't guarantee you they are the *most optimal *configuration but they're
> put together from a variety of best practices writeups on building RAID 10
> on Linux that I dug through online. These are using LVM, on top of MDADM
> but your mileage may vary. I have reused it recently to do testing of
> larger sharded MongoDB clusters for some new features in our Hadoop driver
> and been very pleased with the setup. I was able to migrate a disk array to
> a whole new instance at one point two when it was necessary because I fat
> fingered "terminate" instead of "stop" ;)
>
> I've attached my script, consider a disclaimer this:
>
> *Please don't use this script blindly, without reading through it and
> understanding it or assume it is the best way to do things. It is provided
> as a guide to something I've been using to test but not an "official 10gen
> RAID 10 LVM+MDADM" installer of any kind. It may become one later but for
> now just a pointer in the right direction. If you run my script without
> thinking about it or verifying the commands work on a test box, you'll make
> me very disappointed.*
>
> This was passed out to a few other people here to do a second round of
> testing where they ran some of the same tests I ran while I tested other
> things, so I rejiggered it (represented in its current form) to be slightly
> interactive.
>
> Here's the basic rundown.
>
> ***** *Remember that the capacity of RAID 10 is (N/2) * S(min) where N is
> the number of drives in the set and min is the smallest volume size. For 4
> 40 gigabyte volumes you'll get 20 gigs of capacity. See my note below on
> sizing your physical volume (The view of a 'real' disk that MDADM constructs
> from the EBS volumes) versus your logical volume (The actual filesystem
> bearing chunk of diskiness that LVM slaps on top of MDADM's physical). *
> slightly funky name. The...
>
> read more »
>
> ebs_lvm.sh
> 2KViewDownload
>
> RAID 10.png
> 73KViewDownload
- Mike
>>> - This script assumes& requires SFDISK is installed. SFDISK is a cute
- Mike
On 11/4/11 8:09 AM, Alexandre Fouché wrote:
>>> - This script assumes& requires SFDISK is installed. SFDISK is a cute
--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user+unsubscribe@googlegroups.com.
So you have an 8 stripe set.. when one of those drives hits a
bottleneck.. the entire stripe will slow down and make your database
unresponsive.
So.. even if your data is 100% safe somewhere deep down in the guts of
that EBS drive.. it's locked away at the long end of this very narrow
pipe and you don't know when that drive will become responsive again.
> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
Dominik