RAID0 / Initial Sync is very slow

progolferyo

unread,

Dec 14, 2011, 6:46:03 PM12/14/11

to mongodb-user

We have a replica set of 3 secondary servers and our primary. They
were on single drives and we were starting to become I/O bounded, so
we expanded to new boxes (3) and setup RAID0 under 4 drives each. The
new nodes were added to the primary and the initial sync began.

After finally catching up, we get this line in the logs:

replSet initial sync query minValid

Then it has to get caught up and do the final sync so it catches up.
Unfortunately, something seems to be wrong here because the DB's are
just syncing really slowly. I am not sure if it has something to do
with the box that its syncing from, the RAID0 setup or what. At its
current pace, it is just not catching up to the master, its falling
behind.

What I want to do is upgrade to 2.0.2, and try to get it caught up
again but I do not want to do this if the server is going to do an
entirely fresh sync again, because this has taken a long time
already. Will that happen or will it just try to catch up since the
'inital sync query minValid' has been reached?

Also, any ideas on what to do about our RAID0 setup. It's just weird,
the new servers are literally falling more and more behind and it just
seems like it shouldn't take this long to sync.

Scott Hernandez

unread,

Dec 14, 2011, 6:49:13 PM12/14/11

to mongod...@googlegroups.com

What does iostat -xm 2 look like on those machines? Are they doing
lots of disk ops?

What does mongostat --discover look like? Are there lots of ops on the
primary, if so that will show it.

What does rs.status() look like?

Please post all results to gist/pastie/pastebin so they are more readable.

> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>

Scott Hernandez

unread,

Dec 14, 2011, 7:48:10 PM12/14/11

to mongodb-user

[responding back to the list]

I assume 4/5/6 (id from rs.conf) are the new ones, is that correct?

Where is iostat from, and what devices are what in those stats? Where
are the db files (which devices and what config)? (I assume md0 is a
stripe of xvd*)

It looks like there are a decent number of faults and the disk is
showing a good bit of use from that iostat.

You have ~350GB of data files and only 24GB of memory (just guessing,
can you run 'free -ltm' and post those results as well for new
servers) it seems.

Please also run this:
db.printReplicationInfo()
db.printSlaveReplicationInfo()

On Thu, Dec 15, 2011 at 12:24 AM, progolferyo <sfa...@gmail.com> wrote:
> Scott,
>
> iostat:
>
> https://gist.github.com/b09c19e00a615b5696ca
>
> mongostat:
>
> https://gist.github.com/8efd20511d04aa185873
>
> rs.status()
>
> https://gist.github.com/180fc682e1f793d2632d

progolferyo

unread,

Dec 14, 2011, 8:12:51 PM12/14/11

to mongodb-user

I assume 4/5/6 (id from rs.conf) are the new ones, is that correct?

Yes

Where is iostat from, and what devices are what in those stats? Where
are the db files (which devices and what config)? (I assume md0 is a
stripe of xvd*)

iostat is coming from the new db server. the md0 is just a raid0
stripe of xvd*. i have the /dev/md0 mounted as:

/dev/md0 2.0T 378G 1.6T 19% /data3

and the db files are in /data3/mongo

free -ltm gives:

https://gist.github.com/c6270ee227967b9a5b26

the print commands give:

https://gist.github.com/b45d9fdbeb3f91df5bcf

Scott Hernandez

unread,

Dec 14, 2011, 8:30:51 PM12/14/11

to mongod...@googlegroups.com

It looks like many of them are catching up. For those which are
catching up you can stop and upgrade the binaries since they are just
applying the oplog changes now; it will not require a full re-sync.
I'd suggest testing one at a time just to verify that the upgrade goes
smoothly on each one.

progolferyo

unread,

Dec 14, 2011, 9:19:30 PM12/14/11

to mongodb-user

Ok thats good. The issue is just that they are falling more and more
behind. I'm wondering if my RAID0 setup with EBS on AWS is just not
working out like I expected it to work out (as in getting 4X more I/O
throughput).

I'm gonna try one and see if upgrading will do anything. Do you think
upgrading from 2.0 will have any added effect?

Scott Hernandez

unread,

Dec 14, 2011, 9:26:06 PM12/14/11

to mongod...@googlegroups.com

On Thu, Dec 15, 2011 at 2:19 AM, progolferyo <sfa...@gmail.com> wrote:
> Ok thats good. The issue is just that they are falling more and more
> behind. I'm wondering if my RAID0 setup with EBS on AWS is just not
> working out like I expected it to work out (as in getting 4X more I/O
> throughput).

In general you are limited to a max of 2gbs from your instance to the
EBS system (all volumes). Depending on your instance type that could
effectively be much worse.

You are basically getting the worst write performance of the worst EBS
volume I suspect.

> I'm gonna try one and see if upgrading will do anything. Do you think
> upgrading from 2.0 will have any added effect?

No, probably not in this case, but it can't hurt.

progolferyo

unread,

Dec 14, 2011, 9:51:04 PM12/14/11

to mongodb-user

Argh, shutdown timesout. Are you sure if I do a kill -9 on the
instance, I wont have to start again from scratch? Or is there a
better way to kill, update, start up?

On Dec 14, 6:26 pm, Scott Hernandez <scotthernan...@gmail.com> wrote:

Scott Hernandez

unread,

Dec 14, 2011, 9:57:53 PM12/14/11

to mongod...@googlegroups.com

Are you running with journaling, if not, don't kill -9 it.

Did you issue a db.shutdownServer() command from the mongo shell? What
does the log say?

progolferyo

unread,

Dec 15, 2011, 1:16:18 AM12/15/11

to mongodb-user

Yes, I am running with journaling. It did finally stop and I updated
and then reboot (and di not do a kill -9)

Unfortunately with RAID0, it is falling more and more behind, it
doesn't look like this is going to work. It's weird, I have two other
secondary servers in this replica set with single drives that can
catch up just fine. It sounds like there may be some issues with EBS
and RAID0. Both servers on RAID0 just cannot keep up.

Nathan D Acuff

unread,

Dec 15, 2011, 9:28:53 AM12/15/11

to mongodb-user

We ultimately abandoned RAID0 and went back to single EBS volumes,
after essentially the same experience as you.

We have it on our to-do list to try RAID1+0 sometime soonish, after
all, if everyone says it's better, it must be, right?

On Dec 15, 1:16 am, progolferyo <sfan...@gmail.com> wrote:
> Yes, I am running with journaling. It did finally stop and I updated
> and then reboot (and di not do a kill -9)
>

> Unfortunately withRAID0, it is falling more and more behind, it

> doesn't look like this is going to work. It's weird, I have two other
> secondary servers in this replica set with single drives that can
> catch up just fine. It sounds like there may be some issues with EBS

> andRAID0. Both servers onRAID0just cannot keep up.

>
> On Dec 14, 6:57 pm, Scott Hernandez <scotthernan...@gmail.com> wrote:
>
>
>
>
>
>
>
> > Are you running with journaling, if not, don't kill -9 it.
>
> > Did you issue a db.shutdownServer() command from the mongo shell? What
> > does the log say?
>
> > On Thu, Dec 15, 2011 at 2:51 AM, progolferyo <sfan...@gmail.com> wrote:
> > > Argh, shutdown timesout. Are you sure if I do a kill -9 on the
> > > instance, I wont have to start again from scratch? Or is there a
> > > better way to kill, update, start up?
>
> > > On Dec 14, 6:26 pm, Scott Hernandez <scotthernan...@gmail.com> wrote:
> > >> On Thu, Dec 15, 2011 at 2:19 AM, progolferyo <sfan...@gmail.com> wrote:
> > >> > Ok thats good. The issue is just that they are falling more and more

> > >> > behind. I'm wondering if myRAID0setup with EBS on AWS is just not

> > >> >> >> >> > we expanded to new boxes (3) and setupRAID0under 4 drives each. The

> > >> >> >> >> > new nodes were added to the primary and the initial sync began.
>
> > >> >> >> >> > After finally catching up, we get this line in the logs:
>
> > >> >> >> >> > replSet initial sync query minValid
>
> > >> >> >> >> > Then it has to get caught up and do the final sync so it catches up.
> > >> >> >> >> > Unfortunately, something seems to be wrong here because the DB's are
> > >> >> >> >> > just syncing really slowly. I am not sure if it has something to do

> > >> >> >> >> > with the box that its syncing from, theRAID0setup or what. At its

> > >> >> >> >> > current pace, it is just not catching up to the master, its falling
> > >> >> >> >> > behind.
>
> > >> >> >> >> > What I want to do is upgrade to 2.0.2, and try to get it caught up
> > >> >> >> >> > again but I do not want to do this if the server is going to do an
> > >> >> >> >> > entirely fresh sync again, because this has taken a long time
> > >> >> >> >> > already. Will that happen or will it just try to catch up since the
> > >> >> >> >> > 'inital sync query minValid' has been reached?
>

> > >> >> >> >> > Also, any ideas on what to do about ourRAID0setup. It's just weird,

progolferyo

unread,

Dec 15, 2011, 11:50:21 AM12/15/11

to mongodb-user

Yah. I have a RAID10 instance going, the throughput should be better
on this box, but again the new instance will take a while to catch up,
about half way done now.

Reply all

Reply to author

Forward