possible issue with long delays of primaries tracking replication position of secondaries?

17 views
Skip to first unread message

Zardosht Kasheff

unread,
May 19, 2013, 10:22:49 PM5/19/13
to mongo...@googlegroups.com
I think I see a potential issue with secondaries reporting the
position of other secondaries (through GhostSync), and I am wondering
if my understanding is correct.

Suppose we have a replication of chain S3->S2->S1->P. P is the
primary, and S1, S2, S3 are secondaries that replicate down the chain.
S1's GhostSync will be responsible for percolating the slave positions
of S3 and S2. The problem I see is that the GhostSync::percolate calls
are serialized, and are done via getMore tailable cursor queries that
will wait if there is no new data. I think I am seeing the following
happening:
- S2 is up to date, and is percolating this fact through S1
- S1 relays this fact by doing a getMore query that is hanging,
because it is a tailable cursor awaiting data
- while the query is hanging, S3 percolates slave position through
S2, and is now stuck on S1 behind the running query for S2.

As a result, S3's information is not immedietely delivered to P,
because S1 is busy waiting on a getMore query while percolating S2's
information. S3's information can be delayed by seconds (whatever the
getMore timeout is).

Is this issue valid?

On a side note, why does the GhostSync percolate code use a cursor to
update slave position through the chain and not use some custom
command? If it could use a command that said "my new position is X",
then this issue, along with something like SERVER-8073 would go away
(I think).

-Zardosht

Eric Milkie

unread,
May 20, 2013, 8:18:52 AM5/20/13
to mongo...@googlegroups.com
Hi Zardosht.
You're correct, this is an issue as you describe. Matt is working on SERVER-6071 for 2.6 to implement a command to update slave locations, which will reduce locking on the local database and also be more timely, as you noted.
-Eric
> --
> You received this message because you are subscribed to the Google Groups "mongodb-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-dev...@googlegroups.com.
> To post to this group, send email to mongo...@googlegroups.com.
> Visit this group at http://groups.google.com/group/mongodb-dev?hl=en-US.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

Zardosht Kasheff

unread,
May 20, 2013, 8:28:39 AM5/20/13
to mongo...@googlegroups.com
Thanks Eric.

Reading Server-6071 brings up another mystery I have not been able to
figure out. What is the purpose of the local.slaves collection
periodically storing the location of slaves? I see nothing that reads
this information. Why not just keep the data in the _slaves map? If it
is just informational, doesn't rs.status() serve the same purpose?

Thanks
-Zardosht
.

Eric Milkie

unread,
May 20, 2013, 8:31:36 AM5/20/13
to mongo...@googlegroups.com
At this point it is only for debugging. It can be handy to have this data when the server is no longer actively running.
Reply all
Reply to author
Forward
0 new messages