I've noticed a performance regression after upgrading from 1.6.5 to
2.0.2, and I'm having some difficulty pinpointing the precise problem.
I have 43 million record database of documents that are similar to a
contact record. The name field is an array of documents containing
keys for last name (n.l), first name (n.f), last name soundex (
n.ls),
first name soundex (n.fs).
1) After upgrading to 2.0.2, mongod seems to now have trouble loading
indexes into RAM and keeping them there. My RAM size is 16 GB and my
total index size is ~ 7.5 GB. I learned early on that I needed to pre-
load indexes into RAM to get any reasonable performance. With 1.6.5 I
ran a $ne query w/ count for each of the indexes, but that was taking
so long after upgrading to 2.0.2, I switched to this:
def self.precache_indexes
10.times do |i|
MongoMgr.extracts_collection.find({ 's4' => /^#{i}/ }).count
end
10.times do |i|
MongoMgr.extracts_collection.find({ 'dl' => /^#{i}/ }).count
end
('A'..'Z').each do |c|
MongoMgr.extracts_collection.find({ 'n.l' => /^#{c}/ }).count
end
end
I've also noticed an inexplicable drop in RAM usage occasionally (e.g.
from 12 GB down to 2 GB). It will then very slowly grow back up, but
in the meantime, performance is terrible.
2) I've noticed that explain shows only the last name is being used
(instead of last name and first name) when querying with last & first
names. For some common last names, this results in scanning nearly
100,000 documents. It's *possible* that this was also the case with
1.6.5, but if not, then it's likely the source of my performance
regression. If it was the same in 1.6.5, then 1.6.5 seems to be much
better at keeping indexes in RAM because the performance was
excellent.
Here are my indexes:
db.system.indexes.find({ns : 'nc1.extracts'})
{ "v" : 1, "key" : { "_id" : 1 }, "ns" : "nc1.extracts", "name" :
"_id_" }
{ "v" : 1, "key" : { "dl" : 1 }, "ns" : "nc1.extracts", "name" :
"dl_1" }
{ "v" : 1, "key" : { "s4" : 1 }, "ns" : "nc1.extracts", "name" :
"s4_1" }
{ "v" : 1, "key" : { "n.l" : 1, "n.f" : 1 }, "ns" : "nc1.extracts",
"name" : "n.l_1_n.f_1" }
{ "v" : 1, "key" : { "
n.ls" : 1, "n.fs" : 1 }, "ns" : "nc1.extracts",
"name" : "n.ls_1_n.fs_1" }
and here is an explain:
db.extracts.find( { n : { $elemMatch : { l : 'ADKINS', n :
'GEORGE' }}}).explain()
{
"cursor" : "BtreeCursor n.l_1_n.f_1",
"nscanned" : 14778,
"nscannedObjects" : 14778,
"n" : 0,
"millis" : 16978,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : true,
"indexOnly" : false,
"indexBounds" : {
"n.l" : [
[
"ADKINS",
"ADKINS"
]
],
"n.f" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
}
}
I have two questions:
1) What is the most efficient/effective way to get mongod to load all
indexes into RAM? Waiting for them to be cached as they are used
results in unacceptable performance (multi-minute response times).
2) Given my documents contain an array of name documents with last,
first, last_soundex, first_soundex fields, how can I query so that the
full index (last & first) is used instead of only (last) ? In other
words, in the explain() above, instead of using $minElement/
$maxElement, I need that to be 'GEORGE' in the example.
Thanks,
Brian
Here is some more diagnostic info:
> version()
version: 2.0.2
> db.version()
2.0.2
> db.stats()
{
"db" : "nc1",
"collections" : 8,
"objects" : 43529343,
"avgObjSize" : 354.8403949032725,
"dataSize" :
15445969260,
"storageSize" :
17605328848,
"numExtents" : 50,
"indexes" : 18,
"indexSize" : 7524021232,
"fileSize" : 29984030720,
"nsSizeMB" : 16,
"ok" : 1
}
> db.extracts.stats()
{
"ns" : "nc1.extracts",
"count" : 43528842,
"size" :
15444986452,
"avgObjSize" : 354.8219006607159,
"storageSize" :
17602879440,
"numExtents" : 38,
"nindexes" : 5,
"lastExtentSize" :
2146426864,
"paddingFactor" : 1.0099999998671518,
"flags" : 0,
"totalIndexSize" : 7523898592,
"indexSizes" : {
"_id_" : 2657666032,
"dl_1" : 1015115808,
"s4_1" : 865871104,
"n.l_1_n.f_1" : 1457363824,
"n.ls_1_n.fs_1" : 1527881824
},
"ok" : 1
}
$ uname -a
Linux myxyz-P55-USB3 2.6.35-30-generic #61-Ubuntu SMP Tue Oct 11
17:52:57 UTC 2011 x86_64 GNU/Linux
$ free -m
total used free shared buffers
cached
Mem: 16073 15988 85 0 3
14279
-/+ buffers/cache: 1705 14367
Swap: 12401 1 12400