a mess of compatibility version in Replica set... Impossible to recover.

749 views
Skip to first unread message

Olivier Hautecoeur

unread,
Oct 9, 2018, 10:25:22 AM10/9/18
to mongodb-user
I get a replica set: a primary + an arbiter + a secondary.
I got a full disk on the secondary so it crashed and I wanted to take that opportunity to upgrade to version 4.
In fact, I realized that I should first recover the whole replica set in 3.6.

on primary: I get.
rsamv:PRIMARY> db.adminCommand( { getParameter: 1, featureCompatibilityVersion: 1 } )
{
        "featureCompatibilityVersion" : {
                "version" : "3.4",
                "targetVersion" : "3.6"
        },
        "ok" : 1
}

on secondary when I try to start the server I get: 

2018-10-09T13:45:45.377+0000 I REPL     [replexec-1] Member localhost:27017 is now in state PRIMARY
2018-10-09T13:45:45.377+0000 I REPL     [replexec-2] Member localhost:27018 is now in state ARBITER
2018-10-09T13:45:46.055+0000 I NETWORK  [listener] connection accepted from 127.0.0.1:49112 #1 (1 connection now open)
2018-10-09T13:45:46.071+0000 I NETWORK  [conn1] received client metadata from 127.0.0.1:49112 conn1: { driver: { name: "NetworkInterfaceASIO-Replication", version: "3.6.2" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "17.10" } }
2018-10-09T13:45:46.402+0000 I REPL     [replication-1] sync source candidate: localhost:27017
2018-10-09T13:45:46.402+0000 I STORAGE  [replication-1] dropAllDatabasesExceptLocal 1
2018-10-09T13:45:46.403+0000 I ASIO     [NetworkInterfaceASIO-RS-0] Connecting to localhost:27017
2018-10-09T13:45:46.436+0000 I ASIO     [NetworkInterfaceASIO-RS-0] Successfully connected to localhost:27017, took 33ms (1 connections now open to localhost:27017)
2018-10-09T13:45:46.488+0000 I REPL     [replication-0] Initial sync attempt finishing up.
2018-10-09T13:45:46.488+0000 I REPL     [replication-0] Initial Sync Attempt Statistics: { failedInitialSyncAttempts: 0, maxFailedInitialSyncAttempts: 10, initialSyncStart: new Date(1539092745369), initialSyncAttempts: [] }
2018-10-09T13:45:46.488+0000 E REPL     [replication-0] Initial sync attempt failed -- attempts left: 9 cause: IncompatibleServerVersion: Sync source had unsafe feature compatibility version: upgrading to 3.6

after several attempts of server version... it fails and stops.

I tried the command  db.adminCommand( { setFeatureCompatibilityVersion: "3.4" } ) on the primary... but the command hangs and doesn't return. and there is nothing reported on the log.

the database is about 150 GB large.

how to fix this issue to recover the replicaset and to upgrade it to the latest version ?
I tried to find the same issue reported on the forums... but I still didn't see the same situation.

Thanks,
Olivier

Wan Bachtiar

unread,
Oct 10, 2018, 3:39:53 AM10/10/18
to mongodb-user

In fact, I realized that I should first recover the whole replica set in 3.6.

Hi Olivier,

To clarify, previous to the crash, the replica set was running MongoDB server version 3.6 ?

If so, the output of featureCompatibilityVersion from the primary member indicates that you have missed a prerequisite in prior upgrades: the replica set does not have featureCompatibilityVersion set to v3.6 yet. I would assume that the replica set was previously upgraded from MongoDB v3.4, and have not been enabled for backwards-incompatible features for version 3.6 yet.
Please review MongoDB: Backwards Incompatible Features in v3.6 for more information.

See also MongoDB: Upgrade a Replica Set to 3.6

how to fix this issue to recover the replica set and to upgrade it to the latest version ?

First complete the upgrade to 3.6-series, then please follow the upgrade steps on MongoDB: Upgrade a Replica Set to 4.0

In addition, please consider running a primary and two secondaries. This should provide two complete copies of the data set at all times in addition to the primary, and provide additional fault tolerance and high availability. See also Replica Set Deployment Architectures.

Regards,
Wan.

Olivier Hautecoeur

unread,
Oct 10, 2018, 1:10:26 PM10/10/18
to mongodb-user
Thanks Wan for your answer.

Since yesterday I got some progress.
Since the full recovery from scratch was crashing (out of memory). I started from a saved copy two months ago. The oplog is large enough to recover the period.
So the PSA servers are all on and the Secondary is trying to recover the lost period... but very very slowly. I am running this database since years and it was quite faster before.

I still get some compatibility version mismatched.
I can run the set command on the primary... so the primary answer 3.6 when we asked for the version while the arbiter and the secondary still answer 3.4. So the warning about the uncompleted upgrade remains.
The secondary logs report:
2018-10-10T16:49:24.669+0000 I ASIO     [NetworkInterfaceASIO-RS-0] Connecting to localhost:27017
2018-10-10T16:49:24.695+0000 I ASIO     [NetworkInterfaceASIO-RS-0] Successfully connected to localhost:27017, took 26ms (2 connections now open to localhost:27017)
2018-10-10T16:50:24.695+0000 I ASIO     [NetworkInterfaceASIO-RS-0] Ending idle connection to host localhost:27017 because the pool meets constraints; 1 connections to that host remain open
2018-10-10T16:50:52.848+0000 I ASIO     [NetworkInterfaceASIO-RS-0] Connecting to localhost:27017
2018-10-10T16:50:52.874+0000 I ASIO     [NetworkInterfaceASIO-RS-0] Successfully connected to localhost:27017, took 26ms (2 connections now open to localhost:27017)
2018-10-10T16:51:52.887+0000 I ASIO     [NetworkInterfaceASIO-RS-0] Ending idle connection to host localhost:27017 because the pool meets constraints; 1 connections to that host remain open
2018-10-10T16:54:04.351+0000 I CONTROL  [LogicalSessionCacheReap] Sessions collection is not set up; waiting until next sessions reap interval: Can not create config.system.sessions collection

I understand that the latest error is about version 3.6... but I ran the set command on the Primary and the secondary remained on 3.4
But to synchronize the optime on Primary and Secondary, it will need several days more. And I am still not in 3.6. And I would need to fix it before adding a new secondary node and to upgrade to version 4.0.

Best regards,
Olivier

Reply all
Reply to author
Forward
0 new messages