How many reads per second are you able achieve? It may be possible that you're hitting the current open file descriptor limit. Are there any exceptions or errors being logged? You can also run JConsole on ycsb to determine size and growth of your eden, survivor, tenured space and how often the garbage collector runs on these spaces.
Hi,
I'm running some throughput tests with mongo. I used yahoo's cloud benchmark tool (ycsb) to drive a 100% read-only workload.
A single mongod is running on a system with 32 cores/190 GB of memory - the working set about 10 million documents are all in memory.
Each document is about 1k bytes.
The load generator, the ycsb client is running on a separate server also with 32 cores/190 gb of memory.
I'm running the client with 20 threads for 5 minutes and I see some heavy heap usage and garbage collection going on.
(using java 1.7.0_03)
So for example, if I set heap to 150GB with these options:
-Xms150G -Xmx150G -XX:+PrintGCTimeStamps -verbosegc
starttime, 07:14:38
34.443: [GC 39321600K->12117138K(150732800K), 19.7706060 secs]
101.010: [GC 51438738K->24023043K(150732800K), 26.4009280 secs]
190.959: [GC 63344643K->42406579K(150732800K), 34.5489460 secs]
293.656: [GC 81728179K->61900347K(150732800K), 41.8107320 secs]
endtime, 07:19:38
So 2 minutes in a 5 minute run is spent garbage collecting impacting the throughput.
[ I used the same ycsb client against a mysql database using a jdbc driver and the heap usage is very low using the same default jvm options and very little GC activity ]
If I use the G1 GC, results are not very different (-Xms150G -Xmx150G -XX:+PrintGCTimeStamps -verbosegc -XX:+UseG1GC)
starttime, 07:24:25
106.665: [GC pause (young) 51200M->35976M(153600M), 76.4086490 secs]
278.610: [GC pause (young) 80776M->61051M(153600M), 63.5580800 secs]
endtime, 07:29:25
The read implementation is quite simple:
public int read(String key)
{
try {
DBObject q = new BasicDBObject().append("_id", key);
DBObject queryResult = null;
int retVal;
queryResult = collection.findOne(q);
retVal = queryResult != null ? 0 : 1;
q = null;
queryResult = null;
return retVal;
} catch (Exception e) {
System.err.println(e.toString());
return 1;
}
}
Why is there so much heap consumption ? Are there tuning guidelines when using the mongo java driver ?
I see one report of a performance regression because of increased memory use which is supposed to be fixed in 2.8 which is what I'm using.
https://jira.mongodb.org/browse/JAVA-505
I saw some discussions around java driver performance here - not clear what the takeaway from it was.
http://groups.google.com/group/mongodb-user/browse_thread/thread/48a604703d9ffc61
32 cores/190 GB of physical memory.
java version "1.7.0_03"
Java(TM) SE Runtime Environment (build 1.7.0_03-b04)
Java HotSpot(TM) 64-Bit Server VM (build 22.1-b02, mixed mode)
mongod version: mongodb-linux-x86_64-2.0.6
mongo java driver version: Version 2.8.0
Here's the collection stats and sample document:
> db.usertable.stats()
{
"ns" : "ycsb.usertable",
"count" : 10000000,
"size" : 11559996120,
"avgObjSize" : 1155.999612,
"storageSize" : 12892401648,
"numExtents" : 38,
"nindexes" : 1,
"lastExtentSize" : 2146426864,
"paddingFactor" : 1,
"flags" : 1,
"totalIndexSize" : 561143408,
"indexSizes" : {
"_id_" : 561143408
},
"ok" : 1
}
>
> db.usertable.find()
{ "_id" : "user0", "field5" : BinData(0,"NDsmISo0Nj8oOT08JTw8KyIwNzotMyoqPCkyIzMiKys0Lz0uOiAoLzMlNS09MSosKCkiIDQ6KywqJS8vJC8oKjw0OT87PCAvKygiJzI4KCw6PjYpIj08NSYqICUyNSsxOi0iMw=="), "field4" : BinData(0,"NDonLjM0Ny0kISYhPy4mMzs/JjEgNyctPygkLz00JSspKSQgJTYkIS8sLD03OSQ7PSgmMz8uOjkrKCo4PDQwITwgKz09JyM2LCw+NT86KSY5Kjo6NTQtLT47NzI/Lzg8NSc7JQ=="), "field3" : BinData(0,"PSwtLCkyIy0zODglLDYlLj48MTU9MTkzKjsqMzotJjU5MjQmJiM0KTItLj0tMzMjJiwmPTI3LCYzPS8jLyopJTEzMzsyOz02NjwzOjMkNDEhMjQuODIzMiMvISs8JSIkLDwwIg=="), "field2" : BinData(0,"MyY6PS8xLCAgJis8OzEhMDAqMDY5PTk9OywmNDA4Iig5Jyw+Jy8kISMwPTUmKiAhOS42KzI0Mys6PiwzNSs7OTo1LiI+OT85PjYxJT89KyArPDEwICsxIDEwODg3Izw8IT87KA=="), "field9" : BinData(0,"IyAqIy0uNCA/OyY/Ozs/LTk7IzE5Lz8vLyM7Pzk2MCUsODYyNCA1PSYpNDoqISM7NCY9JSwsLDA7IjMgNTMvODA4NTEoNzQ9PTc9OCI1PS8xPzw3ODguODw5Pis5Oyw0LjAuMA=="), "field8" : BinData(0,"NyY3PzImMCQ4JCwuISgqPyAqOTYyKCo2MykqJSo+IDAnIzwqLigiIyAjMCsoLzczMT86MTY5MSY7Ozg+OSA7OS40Oj8iKic7ICQpNz42Pyw6NSA4OjQ2LCo9KjY8Ii48PCohIg=="), "field7" : BinData(0,"MyMiLSApNCk8JTszLS0yIywnOCYrOzwtNyE2PDc7OSg0ICUsIzQwLDYtPTApNDwmJisiIiMzLDUsITYmMCglOz0qMjM9IS8gIiArKjMqJzM/JTMrMyA8KCsmPTUuMysuKyY2Mg=="), "field6" : BinData(0,"Pj8jMjM+ODw5JCE2KSomNCQ3LS0uIjIhJD4hOSMlKiQ9LSgyKD4iOzwnOSspLDstLjIzKSw3Kyw2PC4lPCwxIj82ODYwMi0wNTIrLjYgKCggPz8jMTIuIzQnOzM5IC8uICcqKA=="), "field1" : BinData(0,"JyQqIjkuLiMiNy8gNyQoLCoxIj05JigqIyo7Kzs0LSMzMzsmMCwrMTY5IC08IyUhOS8jKDs8JiAoMCg4NC8vNSE6NjAhOS8rJDA9LT0sMy0yPScxKi0yKC8oOywwNCMtJy05Kw=="), "field0" : BinData(0,"MCM4Jyc/KjM4ISwjIyw8MzUhKT0yOC49Iz45KDsuLysuOCg8JCwuLTwmJyEpKyspOz8pKSYvKyMzKSomOzw7PTchJSU8LzI0PDUxLiUsOy07Lz4+MDE0JDwoJSsnKjw5Oj04PQ==") }
Thanks Much !
100K reads is pretty fast. It would be interesting to see how this rate compares to the MySQL database. The amount of work the garbage collector has to do is going to be (roughly) proportional the query rate. Higher loads will mean the GC has more objects to clean up per unit of time. Also, the time to complete a garbage collection cycle will grow proportionally with the size of the heap. So, although you want the heap to as large as needed, make sure -Xmx is not too large. If you stop your load test, while monitoring on jConsole, does the garbage collector catch up and recover heap?
Another one with the -histo:live
Jeff, how critical is the finalize() in com.mongodb.DBApiLayer$Result? It seems to be a performance-degrading method, plus as far as I am aware finalize is not considered reliable anyway. Can we safely remove it?A.