Hi guys,
We have a situation where we are trying to add replication to our existing deployment and running into some issues. The application is very write intensive. Some technical details:
sizeof all database - 1.8TB
oplog size - 50G ("logSizeMB" : 51200)
timeDiffHours - 16.65
I tried standing this up and letting mongo do its initial sync, which took a fair amount of time, roughly 10 days. After it finished it then went from STARTUP2 to RECOVERING state and I periodically get "RS102" errors in the logs:
2017-01-17T03:16:17.639+0000 I REPL [ReplicationExecutor] could not find member to sync from
2017-01-17T03:16:17.639+0000 I REPL [rsBackgroundSync] replSet error RS102 too stale to catch up
2017-01-17T03:16:17.639+0000 I REPL [rsBackgroundSync] replSet our last optime : Jan 5 07:15:36 586df298:721
2017-01-17T03:16:17.639+0000 I REPL [rsBackgroundSync] replSet oldest available is Jan 16 10:39:25 587ca2dd:437
If I were a betting man, I would imagine that there is a problem where if initial sync doesn't happen faster than oplog's $timeDiffHours, it will never succeed. Is this accurate? It isn't something that is explicitly called out anywhere I could find in the docs and seems like a pretty important detail, so this has me doubting myself a bit.
If that is true, is there any other way to do this? Using some back of the napkin math, doing the "seed secondary" approach of manually copying the files would result in a 3:45h downtime assuming we could linerate the 1G NIC. We have no recent backups to seed from - part of this effort is to get into a situation where we can have more backups, but I don't think is particularly relevant because if things ever got desync'd and the backup was outside the oplog window we would be right here again.
Alternatively, is the correct option here to size the oplog to something much larger for initial sync and just wait through it? With 50G giving 16h, it seems like to reach a safety of say 12d that would require roughly a 1TB oplog size (if I did the math correctly). And if my assumptions are correct, this is still a function of total size and so while 1TB might work now, a year from now if we had to do this it might take 2TB since the total datasize is growing.
Any thoughts and suggestions would be appreciated.
Thanks.