key seems to have moved in the index

80 views
Skip to first unread message

kaushik subramanian

unread,
Jun 24, 2015, 2:29:57 PM6/24/15
to mongod...@googlegroups.com
Hi,
We have a mongodb cluster where i see the following logs in the PRIMARY quite often.

[conn26285] key seems to have moved in the index, refinding. 5:f861000


I have tried looking up for any article that can help me better debug this issue but I have had no luck :(. Any help to make progress here would be greatly appreciated.


Some background on what's going on in the cluster:

- There is only one database that we use other than local database which is used by the cluster itself.

- A lot of writes into about 10-12 collections.

- Since there is a lot of data, there is a background job to delete data older than a specific time which runs periodically. The goal here was to keep removing data in smaller chunks so that deletes are not very heavy.

- There is a a lot of reads also going on in parallel for the purpose of real-time analytics.

- We do not use usepowerof2size setting.


With all of the above activity, there is this one moment in time where i see the "key seems to have moved in the index" show up in the log and when this happens hell just breaks lose. The Disk I/O shoots up to the point where the PRIMARY machine gets completely unresponsive having the following side effects,

- Clusters starts getting out of SYNC and PRIMARY starts logging 9001 socket exceptions

- inserts into the PRIMARY starts taking way more than 100ms and hence starts showing up in the logs.

- Existing connections to the PRIMARY starts getting timed out.

- Number of connections keep on increasing causing further load on the system.


Please let me know if there is any more information that I can provide to get further clarity into the issue. 


I use mongodb 2.4.9
Everything i have mentioned above is as observed from the PRIMARY logs.

Asya Kamsky

unread,
Jun 28, 2015, 12:37:31 PM6/28/15
to mongodb-user
Hi Kaushik:

First, the message itself: it is an informational message that
indicates that during an operation that was using an index traversal
when that operation yielded and then resumed, some key that it was
pointing at had moved within the index.

We have this issue in Jira
https://jira.mongodb.org/browse/SERVER-13131 which is tracking
improving the helpfulness of this message, to avoid the sort of
confusion that it currently can cause.

Now, regarding the *other* things that happen at the same time: I'm
quite certain that there is a common cause for both - occurrences of
this message and performance issues you see around the same time.

That common cause is likely contention or high load on the system
which is beyond what the system can handle from what you are
observing.

This message is likely the key: "The Disk I/O shoots up to the point
where the PRIMARY machine gets completely unresponsive" - so what can
be causing high disk IO? Well, not having enough RAM for the indexes
and/or documents which are being queries/operated on.

Does the delete job happen to be running when this sort of spike
happens? Are all the queries that delete commands use indexed? It
would be interesting to see what exactly is happening on the system
when this starts going on - the best way to debug it would be to look
at the logs from before and during this period - and best way to
visualize them would be using Mtools
https://github.com/rueckstiess/mtools/wiki in particular, mplotqueries
from it.

Last year I gave a talk about diagnostics and debugging performance
problems - I talk about how to use Mtools to help out - there's a
video of the talk here:
https://www.mongodb.com/presentations/diagnostics-and-debugging

Asya
> --
> You received this message because you are subscribed to the Google Groups
> "mongodb-user"
> group.
>
> For other MongoDB technical support options, see:
> http://www.mongodb.org/about/support/.
> ---
> You received this message because you are subscribed to the Google Groups
> "mongodb-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to mongodb-user...@googlegroups.com.
> To post to this group, send email to mongod...@googlegroups.com.
> Visit this group at http://groups.google.com/group/mongodb-user.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/mongodb-user/62af7faf-7430-47d2-a0a4-3ebe188f1aee%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
Message has been deleted

kaushik subramanian

unread,
Jun 29, 2015, 7:57:27 PM6/29/15
to mongod...@googlegroups.com
Hi Asya,
Thank you very much for your response. I have few follow up questions.

--This message is likely the key: "The Disk I/O shoots up to the point 
--where the PRIMARY machine gets completely unresponsive" - so what can 
--be causing high disk IO?   Well, not having enough RAM for the indexes 
--and/or documents which are being queries/operated on. 
Question 1: Can "key seems to have moved in the index" recover by finding the new location of the key within the RAM itself? or will it always hit the DISK?

Question 2: I never see "found" in response to "refinding" messages. Does this mean that those keys got removed? Or perhaps the operation got timed out? Or is this hinting at something.

Question 3: Frequent interleaved reads and writes affecting the same index will cause this issue to occur. How can this be avoided?

Question 4: Is it OK for me to conclude that writes and deletes of documents are the ONLY operations that can cause the key to either move within an index or get deleted from an index? Are there any other possibilities we should look out for?

Really appreciate your time. Looking forward to hear from you.

Regards,
Kaushik

PS: I will be running mtools on the logs. Hope that the questions I have asked above are the ones outside of my possible findings from mtools.

Asya Kamsky

unread,
Jun 30, 2015, 12:16:21 AM6/30/15
to mongodb-user

1. It just continues and if the key is in RAM, it'll get it from RAM and go to disk of the page it needs isn't in RAM.

2. There is no "found" - it never prints that when it finds an entry in the index.

3. Yes, that's exactly what happens - a change in indexes value, or deletion will cause the index entry to move in the index. Same for document location (which is what index points to) when document moves as a result of update (in mmap only).

4. Yes, updates or removes.

On Jun 29, 2015 7:26 PM, "kaushik subramanian" <kaush...@gmail.com> wrote:
Hi Asya,
Thank you very much for your response. I have few follow up questions.

--This message is likely the key: "The Disk I/O shoots up to the point 
--where the PRIMARY machine gets completely unresponsive" - so what can 
--be causing high disk IO?   Well, not having enough RAM for the indexes 
--and/or documents which are being queries/operated on. 
Question 1: Can "key seems to have moved in the index" recover by finding the new location of the key within the RAM itself? or will it always hit the DISK?

Question 2: I never see "found" in response to "refinding" messages. Does this mean that those keys got removed? Or perhaps the operation got timed out? Or is this hinting at something.

Question 3: Let's say that a read operation started that is doing an index traversal and a write operation which will affect the same index caused this read to yield. Is it possible that such occurrences can cause the key (which the read was operating on before yielding) to move within the index?

Question 4: Is it OK for me to conclude that writes and deletes of documents are the ONLY operations that can cause the key to either move within an index or get deleted from an index? Are there any other possibilities we should look out for?

Really appreciate your time. Looking forward to hear from you.

Regards,
Kaushik

PS: I will be running mtools on the logs. Hope that the questions I have asked above are the ones outside of my possible findings from mtools.

On Sunday, June 28, 2015 at 9:37:31 AM UTC-7, Asya Kamsky wrote:

kaushik subramanian

unread,
Jun 30, 2015, 12:49:17 PM6/30/15
to mongod...@googlegroups.com
Hi asya,
It slipped my mind to put one more question in the same context as the rest of them.
The issue we are talking about is something i see on the PRIMARY. based on the logs I could verify that PRIMARY is only having oplog query, some document updates and inserts. There no reads here.
A write operation consists of two parts, one is to insert the document and the other is to update the index. Is the index updating part something that can yield if there is a lot of writes coming in?
Where I am getting with this is to understand if continuous writes which affect same index can cause the key to move within an index or not?
I am suspecting that write heaviness alone is causing this but wanted to get it clarified.

Looking forward to hear from you.

Regards,
Kaushik.

Asya Kamsky

unread,
Jul 1, 2015, 10:27:49 AM7/1/15
to mongodb-user
Kaushik:

Most likely it's the *updates* that are causing this because to update a record there has to be a read (find) of the appropriate record, hopefully in the index.

Asya


--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
Reply all
Reply to author
Forward
0 new messages