Sporadically missing data from a query

26 views
Skip to first unread message

mdnorman

unread,
Jun 23, 2011, 4:00:52 PM6/23/11
to mongodb-user
I'm wondering if anyone has seen this issue as I think I've tried
everything, and we're still plagued by it. The behavior we are seeing
is that when we run a query, it will most of the time return a set of
results that make sense, and then once out of 1000 queries, it will be
missing a single record. I've tried sorting the results to see if
it's always the same time, using a snapshot query, setting the Java
client batch size to Integer.MAX_VALUE, and using the toArray() method
on the Java client DBCursor object to get it to quickly pull the
data. I've also removed any indexes that the queries might be using,
so it doesn't appear to be an issue with indexes. We are only querying
the master, though we saw this when querying slaves as well, though
the data missing was very old and it wouldn't have been a replication
issue.

I am only able to reproduce this issue on our production servers
(running Linux), and even with copying all of the files locally, I
have been unable to reproduce this issue on my local machine (using
replicates on Mac and Windows). All versions of the DB are 1.8.1.

Has anyone else seen this, or can someone explain how the data can
just be wrong?

We think we've tried everything, but I'd be happy to try other things
as well.

Scott Hernandez

unread,
Jun 23, 2011, 5:17:46 PM6/23/11
to mongod...@googlegroups.com
Can you reproduce this in the mongo javascript shell?

BTW. Setting the batch size to that value should have no effect.

> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>
>

mdnorman

unread,
Jun 23, 2011, 8:01:27 PM6/23/11
to mongodb-user
Setting the batch size means that all of the data comes in one
download to the client from the DB. Otherwise, the DB cursor stays
open and the next set of data is retrieved, because the batch size is
by default 4MB. Debugging into the driver code showed that for these
queries, the client would go back to the server at least 2 if not 3 or
more times. That said, it's very possible this shouldn't make any
difference in noticed behavior (and indeed, it hasn't).

It's harder to reproduce in the javascript shell. I've seen the
behavior once or twice, but not consistently.

Scott Hernandez

unread,
Jun 23, 2011, 8:12:13 PM6/23/11
to mongod...@googlegroups.com
Which version of the driver and server are you using? On the current
driver and server it should be something more along the lines of 32MB
by default. We can take a look at that if you are seeing more data
when you set the batch size to a large value. Yeah, either way it
shouldn't make a difference.

On Thu, Jun 23, 2011 at 8:01 PM, mdnorman <michael...@gmail.com> wrote:
> Setting the batch size means that all of the data comes in one
> download to the client from the DB.  Otherwise, the DB cursor stays
> open and the next set of data is retrieved, because the batch size is
> by default 4MB.  Debugging into the driver code showed that for these
> queries, the client would go back to the server at least 2 if not 3 or
> more times.  That said, it's very possible this shouldn't make any
> difference in noticed behavior (and indeed, it hasn't).
>
> It's harder to reproduce in the javascript shell. I've seen the
> behavior once or twice, but not consistently.

Do you have any sparse indexes? Or did you run journaling with 1.8.0
at any time?

When this happens do you have the server logs for the same period on
the affected server? If so those might help.

There are times when a cursor is killed on the server but you should
get an error in the java driver.

Is it possible that you are updating documents when the cursor is open?

mdnorman

unread,
Jun 23, 2011, 9:34:35 PM6/23/11
to mongodb-user


On Jun 23, 7:12 pm, Scott Hernandez <scotthernan...@gmail.com> wrote:
> Which version of the driver and server are you using? On the current
> driver and server it should be something more along the lines of 32MB
> by default. We can take a look at that if you are seeing more data
> when you set the batch size to a large value. Yeah, either way it
> shouldn't make a difference.
>
> On Thu, Jun 23, 2011 at 8:01 PM, mdnorman <michaeldnor...@gmail.com> wrote:
> > Setting the batch size means that all of the data comes in one
> > download to the client from the DB.  Otherwise, the DB cursor stays
> > open and the next set of data is retrieved, because the batch size is
> > by default 4MB.  Debugging into the driver code showed that for these
> > queries, the client would go back to the server at least 2 if not 3 or
> > more times.  That said, it's very possible this shouldn't make any
> > difference in noticed behavior (and indeed, it hasn't).
>
> > It's harder to reproduce in the javascript shell. I've seen the
> > behavior once or twice, but not consistently.
>
> Do you have any sparse indexes? Or did you run journaling with 1.8.0
> at any time?

No sparse indexes. This server was new with 1.8.1, and journaling has
never been on in the server.

>
> When this happens do you have the server logs for the same period on
> the affected server? If so those might help.

I've looked at them and they aren't very useful. I did turn on query
logging for a bit but didn't see much more in them than what I can get
out of an explain:

Thu Jun 23 20:17:29 [initandlisten] connection accepted from
10.200.12.211:65184 #40
Thu Jun 23 20:17:29 [initandlisten] connection accepted from
10.200.12.211:65185 #41
Thu Jun 23 20:17:29 [initandlisten] connection accepted from
10.200.12.211:65186 #42
Thu Jun 23 20:17:30 [conn40] end connection 10.200.12.211:65184
Thu Jun 23 20:17:30 [initandlisten] connection accepted from
10.200.12.211:65191 #43
Thu Jun 23 20:17:30 [initandlisten] connection accepted from
10.200.12.211:65192 #44

>
> There are times when a cursor is killed on the server but you should
> get an error in the java driver.
>
> Is it possible that you are updating documents when the cursor is open?
>

I know that documents are being updated all the time, so it's very
possible that a document is modified while the cursor is open. But I
was still able to reproduce the issue when using a snapshot query,
which is supposed to guard against that. We can't use a snapshot for
everything, but if it solved this problem, we would be willing to use
it here because it is critical these queries are correct every single
time. Also, I'm certain that the missing document is not being updated
when we run into this problem. In fact, we have seen many times when
the missing document hasn't been changed since soon after its creation
weeks ago.

Scott Hernandez

unread,
Jun 23, 2011, 9:39:06 PM6/23/11
to mongod...@googlegroups.com
Can you describe your server environment a bit? Since you can't
reproduce this anywhere but there let's try looking at it from that
angle.

mdnorman

unread,
Jun 23, 2011, 9:43:38 PM6/23/11
to mongodb-user
Sorry, I didn't answer the driver and server version question:

Driver: Java 2.6.3
Server: 1.8.1

On Jun 23, 7:12 pm, Scott Hernandez <scotthernan...@gmail.com> wrote:
> Which version of the driver and server are you using? On the current
> driver and server it should be something more along the lines of 32MB
> by default. We can take a look at that if you are seeing more data
> when you set the batch size to a large value. Yeah, either way it
> shouldn't make a difference.
>

mdnorman

unread,
Jun 23, 2011, 10:07:02 PM6/23/11
to mongodb-user
No problem. From /etc/redhat-release, it's "CentOS release 5.6
(Final)", running 2.6.22.19-vs2.3.0.34.1 x86_64. The server is a VM
running on the same physical hardware (though a different VM) as the
web application querying the DB. File system where the data is stored
is ufs with plenty of free space. Total system memory is 32GB, mostly
used by mongo with 1.5GB free.

Scott Hernandez

unread,
Jun 23, 2011, 10:18:17 PM6/23/11
to mongod...@googlegroups.com
What are you using for a hypervisor and virtualization?

mdnorman

unread,
Jun 23, 2011, 10:40:09 PM6/23/11
to mongodb-user
Not sure if this helps at all, but I turned on profiling logs and here
is the output for the 3 queries (full with 523 results, missing with
522 results, full again with 523 results):

{ "ts" : ISODate("2011-06-24T02:31:23.438Z"), "info" : "query
pacman.packages ntoreturn:2147483647 reslen:619920 nscanned:7110
\nquery: { query: { cluster: \"c1\", installableToEnvironment: { $in:
[ \"UAT\", \"STAGING\", \"PRODUCTION\" ] }, deployments: { $elemMatch:
{ environmentType: \"STAGING\", $or: [ { uninstallDate: null },
{ uninstallDate: { $gte: new Date(1308277883424) } } ] } } }, orderby:
{ _id: 1 } } nreturned:523 bytes:619904", "millis" : 17 }

{ "ts" : ISODate("2011-06-24T02:31:23.924Z"), "info" : "query
pacman.packages ntoreturn:2147483647 reslen:618733 nscanned:7110
\nquery: { query: { cluster: \"c1\", installableToEnvironment: { $in:
[ \"UAT\", \"STAGING\", \"PRODUCTION\" ] }, deployments: { $elemMatch:
{ environmentType: \"STAGING\", $or: [ { uninstallDate: null },
{ uninstallDate: { $gte: new Date(1308277883909) } } ] } } }, orderby:
{ _id: 1 } } nreturned:522 bytes:618717", "millis" : 19 }

{ "ts" : ISODate("2011-06-24T02:31:24.354Z"), "info" : "query
pacman.packages ntoreturn:2147483647 reslen:619920 nscanned:7110
\nquery: { query: { cluster: \"c1\", installableToEnvironment: { $in:
[ \"UAT\", \"STAGING\", \"PRODUCTION\" ] }, deployments: { $elemMatch:
{ environmentType: \"STAGING\", $or: [ { uninstallDate: null },
{ uninstallDate: { $gte: new Date(1308277884338) } } ] } } }, orderby:
{ _id: 1 } } nreturned:523 bytes:619904", "millis" : 19 }

mdnorman

unread,
Jun 23, 2011, 10:49:24 PM6/23/11
to mongodb-user
I'll have to find out for sure from the server admins tomorrow, but
we're using Linux-VServer for virtualization on most servers.

mdnorman

unread,
Jun 24, 2011, 12:35:55 PM6/24/11
to mongodb-user
As it appears that this is not widely seen, I've opened a ticket
https://jira.mongodb.org/browse/SERVER-3330.

mdnorman

unread,
Jun 27, 2011, 10:08:18 AM6/27/11
to mongodb-user
To close this out, as expected the problem was in our code. I was
finally able to reproduce the issue locally and I have found the root
cause: we were updating the data for an item in an array by removing
the item and inserting a new one in its place. Because that cannot be
done atomically, queries were coming into the middle of this
operation. In all of my repro scenarios, I was never touching the code
that did these updates because they were supposedly rarer than they
are.
Reply all
Reply to author
Forward
0 new messages