Phantom reads of count()

Showing 1-6 of 6 messages
Phantom reads of count() Giorgio Sironi 2/7/13 11:38 AM
I am using Mongo 2.2.0.
I populate a collection of 1M documents:
var documents = 1000000;
db.example.drop();
for (var i=0; i < documents; i++) {
    db.example.insert({_id:i, value:0});
}
Then I start to watch with an independent connection the number of documents inside the collection:
while (true) {
    print("find(filter).count(): " + db.example.find({value:0}).count());
    print("count(filter): " + db.example.count({value:0}));
    sleep(1000);
}
 Then I modify every single document, adding a new field. I'm simulating a migration here:
var affected = 0;
db.example.find().snapshot().forEach(function(d) {
    d.new_field = true;
    db.example.save(d);
    affected++;
});
print(affected);
I use .snapshot() to isolate the cursor from retrieving the documents it already has processed.
Its output is, correctly:
[20:19:45][giorgio@Desmond:~]$ mongo add_snapshot.js 
MongoDB shell version: 2.2.0
connecting to: test
1000000
However, the output of my watching script is:
MongoDB shell version: 2.2.0
connecting to: test
find(filter).count(): 1000000
count(filter): 1000000
find(filter).count(): 1000288
count(filter): 1008043
find(filter).count(): 1006630
count(filter): 1006200
find(filter).count(): 1006894
count(filter): 1006360
find(filter).count(): 1005347
count(filter): 1007082
find(filter).count(): 1004306
count(filter): 1008218
find(filter).count(): 1006303
count(filter): 1007049
find(filter).count(): 1007005
count(filter): 1006759
find(filter).count(): 1006258
count(filter): 1005788
find(filter).count(): 1005684
count(filter): 1003041
find(filter).count(): 1007427
count(filter): 1006692
find(filter).count(): 1006000
count(filter): 1006322
find(filter).count(): 1007484
count(filter): 1007345
find(filter).count(): 1007936
count(filter): 1004408
find(filter).count(): 1006841
count(filter): 1007516
find(filter).count(): 1007674
count(filter): 1007300
find(filter).count(): 1008179
count(filter): 1005257
find(filter).count(): 1006860
count(filter): 1006940
find(filter).count(): 1008112
count(filter): 1002686
find(filter).count(): 1005848
count(filter): 1006899
find(filter).count(): 1006854
count(filter): 1005871
find(filter).count(): 1006772
count(filter): 1007599
find(filter).count(): 1003431
count(filter): 1007896
find(filter).count(): 1005819
count(filter): 1007818
find(filter).count(): 1008379
count(filter): 1007536
find(filter).count(): 1007804
count(filter): 1007147
find(filter).count(): 1005792
count(filter): 1006068
find(filter).count(): 1007565
count(filter): 1006985
find(filter).count(): 1007409
count(filter): 1007327
find(filter).count(): 1007119
count(filter): 1007088
find(filter).count(): 1003996
count(filter): 1006486
find(filter).count(): 1006202
count(filter): 1007679
find(filter).count(): 1007214
count(filter): 1006594
find(filter).count(): 1005883
count(filter): 1004979
find(filter).count(): 1002196
count(filter): 1000000
find(filter).count(): 1000000
count(filter): 1000000
find(filter).count(): 1000000
count(filter): 1000000
// ...many identical rows...
find(filter).count(): 1000000
count(filter): 1000000
Since a modification to a single document is atomic by [1], I would think that there are always 1M documents in the collection: I modify a single document, save it, and it takes the place of its previous version.
Yet the output of the count operations is different: the number of document grows and eventually comes back to 1M.
What is happening?

Re: Phantom reads of count() Giorgio Sironi 2/8/13 11:40 AM
I reproduced the same behavior on 2.2.3 (Ubuntu x64 .deb), which afaik is the latest version.
Re: Phantom reads of count() Mirko Bonadei 2/9/13 10:07 AM
Yes, I confirm. Reproduced even on 2.2.3.

The strange thing is that if the count is performed without filter the number of documents is fixed at 1M. Only if we use a filter the strange behaviour happens.

Mirko
Re: Phantom reads of count() Rob Moore 2/9/13 11:08 AM

I think this is "expected" behavior.  I have not tested it but if you make the count()s also "snapshot" that may remove the phantom/double reads.

Here is what I think is happening.  

1) The 1M documents are inserted into the collection closely packed, i.e., no extra padding or space around the documents.
2) When the update is performed the document grows and does not fit in its original location.  That forces the document to be moved to the "end" of the collection's allocated space.
3) Since cursors that are not "snapshot" may return the same document more than once the result is the count is inflated as some documents are moved from the "start" of the collection to the "end" of the collection during the time to scan all 1M documents.  All of those moved documents are double counted.  There was a good discussion of this in one of Kristina Chodorow's MongoDB books but I don't have it handy to give a better reference.  Sorry.

The non-filter count works because it is probably using the _id index (or other optimization) which is not being mutated by the updates.

Rob.
Re: [mongodb-user] Re: Phantom reads of count() Giorgio Sironi 2/10/13 11:00 AM
On Sat, Feb 9, 2013 at 8:08 PM, Rob Moore <robert.a...@gmail.com> wrote:

I think this is "expected" behavior.  I have not tested it but if you make the count()s also "snapshot" that may remove the phantom/double reads.

I tried, but snapshotting a cursor of 1M documents blocked indefinitely.
 

Here is what I think is happening.  

1) The 1M documents are inserted into the collection closely packed, i.e., no extra padding or space around the documents.
2) When the update is performed the document grows and does not fit in its original location.  That forces the document to be moved to the "end" of the collection's allocated space.

This is our theory too.
 
3) Since cursors that are not "snapshot" may return the same document more than once the result is the count is inflated as some documents are moved from the "start" of the collection to the "end" of the collection during the time to scan all 1M documents.  All of those moved documents are double counted.  There was a good discussion of this in one of Kristina Chodorow's MongoDB books but I don't have it handy to give a better reference.  Sorry.
The non-filter count works because it is probably using the _id index (or other optimization) which is not being mutated by the updates.

I don't know how count() works internally, but it seems unlikely that it's doing a linear scan because it returns very quickly. It's possible that it has a similar "eventually consistent" behavior, though, for other reasons related to the movement of documents.

--
Giorgio Sironi (@giorgiosironi)
Re: [mongodb-user] Re: Phantom reads of count() Mirko Bonadei 2/11/13 12:52 AM

Here a possible solution of the problem. With Giorgio we have done this test and the results is that the find (and probably the count with the filter) suffer the same problem solved with the snapshot. So it seems that it fetches 1M documents but it counts some documents more than once. 
This is due probably because mongo is moving documents around and a documents counted the first time at the start of the collections is probably counted again later when it is moved at the end.

Loading documents:
var documents = 1000000;
for (var i=0; i<documents; i++) {
    db.bug.insert({_id:i, selected: true, value:1});
}

Setting up a connection to compute the sum of the value field of the documents (using snapshot):
while(true){
    var sum = 0;
    db.bug.find({selected:true}).snapshot().forEach(function(d) {
        sum = sum + d.value;
    });
    print(sum);
}

Setting up a connection to compute the sum of the value field of the documents (not using snapshot):
while(true){
    var sum = 0;
    db.bug.find({selected:true}).forEach(function(d) {
        sum = sum + d.value;
    });
    print(sum);
}

Launching an update which causes document size to increase:
db.bug.find({selected: true}).forEach(function(d) {
    d.newField = "Mongo DB 2.2.3";
    db.bug.save(d);
});

Output from sum performed with snapshot:
1000000
1000000
1000000
1000000
1000000
1000000
1000000
1000000
1000000
1000000
1000000
1000000
1000000
999924      <-------------- ???
1000000
1000000
1000000
1000000
1000000
1000000
1000000

Output from sum performed without snapshot:
1093797
1071762
1094572
1093719
1000000
1000000
1000000
1000000
1000000
1000000
1000000
1000000
1000000
1000000

Conclusion:
It seems clear now. The only thing I can't understand is the less than 1M sum performed with the snapshot cursor count. Where are my 76 documents with value:1?
Is it a race condition?

----
@MirkoBonadei