sequence to bring up replSet on EC2 from snapshots

80 views
Skip to first unread message

Michael

unread,
May 12, 2012, 2:04:56 PM5/12/12
to mongod...@googlegroups.com
Running into some problems doing what the subject says. One twist is that the snapshots may not have the same rs.conf which is needed on the new stack.

This is  a simple set with primary, secondary, arb.

To that end, I have the following sequence of operations:

1) create volumes from snapshots        <= works fine
2) attach volumes                       <= works fine
3) mdadm --assemble .....               <= works fine
4) mount /dev/mdXXX /mountpoint         <= works fine
5) rm -rf /path/to/data/local.*         <= works fine
6) rm -rf /path/to/data/mongod.lock     <= works fine
7) service mongodb start                <= works fine
8) mongo script_which_does_rs.initiate() <= Not so good

This gets me the message that db2 in the set 'has data'.  db2 shows:
> show dbs;
admin   (empty)
local   (empty)
test    0.203125GB

so for the sake of completeness, I stopped mongo, removed the test.* files and restarted mongodb. This gives:

> show dbs;
local   (empty)
>

and after waiting a few minutes, I see that the 'admin' database has appeared.

Trying to rs.initiate() from db1, it tells me that db2 is 'not ok'. After waiting a few more minutes I try again, and it works.....

So... from all this, how should I formulate an exact, programmatically
repeatable sequence of steps by which I can bring up a replSet from old
EC2 snapshots?

TIA.


Eliot Horowitz

unread,
May 12, 2012, 10:20:17 PM5/12/12
to mongod...@googlegroups.com
Not quite sure I follow.
When you did an initiate, you should have created a single node
replica set, is that correct?
What was in the dbpath at that point?
Can you share the log?
> --
> You received this message because you are subscribed to the Google Groups
> "mongodb-user" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/mongodb-user/-/e2ImFad8bIgJ.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to
> mongodb-user...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/mongodb-user?hl=en.

Michael

unread,
May 13, 2012, 1:34:04 AM5/13/12
to mongod...@googlegroups.com


On Saturday, 12 May 2012 22:20:17 UTC-4, Eliot Horowitz wrote:
Not quite sure I follow.
When you did an initiate, you should have created a single node
replica set, is that correct?
What was in the dbpath at that point?
Can you share the log?

Ah, no. This is an AWS cloudformation stack launch, so all three members are up and running, just not happy with their replSet configuration. The dbpath is fine, and the datafiles, etc. are all there. 


Eliot Horowitz

unread,
May 13, 2012, 9:45:37 AM5/13/12
to mongod...@googlegroups.com
What args did you call rs.initiate with ?
> --
> You received this message because you are subscribed to the Google Groups
> "mongodb-user" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/mongodb-user/-/uUJPa3NAGuYJ.

Michael

unread,
May 13, 2012, 4:51:46 PM5/13/12
to mongod...@googlegroups.com


On Sunday, 13 May 2012 09:45:37 UTC-4, Eliot Horowitz wrote:
What args did you call rs.initiate with ? 

The following: (domain name changed to protect the guilty)

 var c = {
        "_id" : "my_repl",
        "version" : 1,
        "members" : [
                {
                        "_id" : 0,
                        "host" : "db1.qa.example.com:27017"
                },
                {
                        "_id" : 1,
                        "host" : "db2.qa.example.com:27017"
                },
                {
                        "_id" : 2,
                        "host" : "db.arb.qa.example.com:27017",
                        "arbiterOnly" : true
                }
        ]
};

var res = rs.initiate(c);

Eliot Horowitz

unread,
May 14, 2012, 12:34:40 AM5/14/12
to mongod...@googlegroups.com
Ah, ok.
Can you send the full log?
Might be an issue with them all trying to initialize at the same
second and something odd happening.
Hard to tell without log.
> --
> You received this message because you are subscribed to the Google Groups
> "mongodb-user" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/mongodb-user/-/cxxoPqbXVQYJ.

Michael

unread,
May 15, 2012, 12:02:45 AM5/15/12
to mongod...@googlegroups.com


On Monday, 14 May 2012 00:34:40 UTC-4, Eliot Horowitz wrote:
Ah, ok.
Can you send the full log?
Might be an issue with them all trying to initialize at the same
second and something odd happening.
Hard to tell without log.

To the list or you personally? 

I rather doubt it is a timing issue (but what do I know?)

Consider that the snapshots applied to create the new RAID10 underneath the db has a different set of server names in it, and was taken from the secondary in a different deployment say, 24 hours earlier - hence the need to reconfig() - new server CNAMEs.

Perhaps this is an "odd" case one doesn't normally plan for :-)

 

Spencer T Brody

unread,
May 15, 2012, 11:52:17 AM5/15/12
to mongod...@googlegroups.com
If you can post the logs to the list (using something like pastebin), that'd be best.  If you aren't comfortable sharing your logs publicly, then you can create a ticket in 10gen's jira system (jira.mongodb.org) in the "Community Private" project and attach the logs there.

When you delete the local database files that should eliminate all references to the old server names.

Are you running rs.initiate() on all 3 nodes at the same time?  If so, what happens if you only run initiate() on one?

Michael

unread,
May 15, 2012, 9:44:26 PM5/15/12
to mongod...@googlegroups.com
On Tuesday, 15 May 2012 11:52:17 UTC-4, Spencer T Brody wrote:
If you can post the logs to the list (using something like pastebin), that'd be best.  If you aren't comfortable sharing your logs publicly, then you can create a ticket in 10gen's jira system (jira.mongodb.org) in the "Community Private" project and attach the logs there.

When you delete the local database files that should eliminate all references to the old server names.

Are you running rs.initiate() on all 3 nodes at the same time?  If so, what happens if you only run initiate() on one?



I'm fairly sure in this case, CloudFormation brought up all 3 instances and then I ran the rs.initiate() on db1 alone, not on all 3.

 
Reply all
Reply to author
Forward
0 new messages