Slow JAVA DBCursor

877 views
Skip to first unread message

Tom

unread,
Aug 19, 2013, 12:25:03 PM8/19/13
to mongod...@googlegroups.com
I am currently evaluating MongoDB and using the mongo-java-driver-2.9.3.jar file running against the Mongo DB version 2.4.5.

I have a collection of about a million documents.  When I run the query with the explain command, it shows it is using the index on the collection.  When I run the find query on the shell, I get results immediately after submitting the query...it is extremely fast.  However, when i run the same query using the Java driver, it takes about 2 seconds to iterate the cursor.  The entire collection fits in memory, and the shell query is instant, no delays in outputting the results.  It is only the java driver DBCursor which is slow.  I am have set the limit and batchsize on the Java query to 10. Hoping to increase the cursor results, however no luck.  Any suggestions on improving the Java cursor performance?

Here are the collection stats:

testperson
{
"ns" : "TEST.testperson",
"count" : 1000000,
"size" : 496000000,
"avgObjSize" : 496,
"storageSize" : 547500032,
"numExtents" : 15,
"nindexes" : 4,
"lastExtentSize" : 153874432,
"paddingFactor" : 1,
"systemFlags" : 1,
"userFlags" : 0,
"totalIndexSize" : 808025904,
"indexSizes" : {
"_id_" : 32458720,
"alias_1_unique_" : 47968592,
"email_1_unique_" : 61189184,
"personAttributes_index" : 666409408
},
"ok" : 1
}

Rob Moore

unread,
Aug 19, 2013, 2:17:31 PM8/19/13
to mongod...@googlegroups.com

The shell is _not_ iterating/receiving all of the documents.  It outputs a few of the documents and then stop.  Recent versions of the shell print a message indicating there are more results available.

2 seconds for a million documents (496MB of data) is not that unreasonable.

Lowering the batch size might have actually hurt performance since the overhead of each request for a batch of documents is spread over fewer documents. You might see better results by increasing the batch size but the law of deminishing returns applies. (e.g., a batch size increase of 10 to 20 might see a 5% improvement but 20 to 30 will only see a 0.1% improvement: These numbers are completely made up but based on experience...)

There is also the "EXHAUST" query option which might be useful depending on your use case. The major down side is that the request will tie up a socket until all of the documents are consumed.  That is fine for a "utility" only using a single thread but for a running service it might cause periodic, unexplained, issues/slowdowns.

The only other way to remove the overhead of the synchronous batch requests is to use asynchronous requests.  The Asynchronous Java Driver will submit a request for the next batch as soon as the client code starts to consume the previous batch. This all but removes the overhead of the batch requests if you can tune/balance the time to process documents and the time to request the next batch of document.

Rob.

Jeff Yemin

unread,
Aug 19, 2013, 4:08:01 PM8/19/13
to mongod...@googlegroups.com
Are you sure you are not including socket connection time?  Unlike the shell, the Java driver lazily opens connections to the server.  Can you run this in a loop and throw out the first attempt?

Also, just to confirm, you are seeing multi-second response times just to iterate 10 results?


Regards,
Jeff


--
--
You received this message because you are subscribed to the Google
Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com
To unsubscribe from this group, send email to
mongodb-user...@googlegroups.com
See also the IRC channel -- freenode.net#mongodb
 
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Tom

unread,
Aug 20, 2013, 2:48:34 PM8/20/13
to mongod...@googlegroups.com
Jeff,

Yes, it is not including the socket connection time.  Yes about 2 seconds to iterate over 10 results.  I have uploaded the test eclipse project to github if you want to take a look.  It contains the class to generate test documents, and the class to search the test documents.  The project can be found at https://github.com/tmurray3/testmongodb

Jeff Yemin

unread,
Aug 20, 2013, 3:14:53 PM8/20/13
to mongod...@googlegroups.com
I changed the code a bit to remove the overhead of logging, and also to make sure that the connection has already been established. I also removed the timing of the chained call to find, since the driver doesn't actually start executing the query until the first call to DBCursor.hasNext().:

        persons.findOne();

        long startquery = System.currentTimeMillis();

        logger.info("START FIND QUERY");

        cursor = persons.find(document)
                .sort(new BasicDBObject()
                        .append(PersonFields.ACCOUNT_STATUS, 1)
                        .append(PersonFields.LAST_ACTIVITY, -1))
                .limit(10);

        while (cursor.hasNext()) {
            cursor.next();
        }

        long endquery = System.currentTimeMillis();

        logger.info("Count:" + cursor.count() + "  Cursor Iterate Time (ms): " + (endquery - startquery));

In my tests, it's completing in around 10ms.  

Can you try re-running with these changes?


Regards,
Jeff





Tom

unread,
Aug 20, 2013, 5:03:43 PM8/20/13
to mongod...@googlegroups.com
Jeff,

Thanks for taking the time to look at my issue!!!  

I did implement your suggested changes and found no significant improvement.  Still hovering around 2 seconds.  

Here is the updated code:

package test;

import java.net.UnknownHostException;

import org.apache.log4j.Logger;
import org.joda.time.DateMidnight;

import com.mongodb.BasicDBList;
import com.mongodb.BasicDBObject;
import com.mongodb.DB;
import com.mongodb.DBCollection;
import com.mongodb.DBCursor;
import com.mongodb.DBObject;
import com.mongodb.Mongo;
import com.mongodb.QueryBuilder;

public class TestSearchMongoPerson {
private Mongo mongo =null;
private static final Logger logger = Logger.getLogger(TestSearchMongoPerson.class);

public TestSearchMongoPerson() {
try {
mongo = new Mongo();
DB db = mongo.getDB(Constants.DB_NAME);
DBCollection persons = db.getCollection(Constants.COLLECTION_NAME);
DBCursor cursor = null;

//Build a query
BasicDBObject document = new BasicDBObject();
BasicDBList bodylist = new BasicDBList();
bodylist.add(1);
bodylist.add(2);
bodylist.add(3);
bodylist.add(5);
BasicDBList ethnicityList  = new BasicDBList();
ethnicityList.add(1);
ethnicityList.add(2);
ethnicityList.add(5);
ethnicityList.add(3);
BasicDBList lookingfor  = new BasicDBList();
lookingfor.add(1);
lookingfor.add(5);
BasicDBList genderList  = new BasicDBList();
genderList.add(1);
genderList.add(2);
genderList.add(3);
BasicDBList zipcodes = new BasicDBList();
zipcodes.add(28214);
zipcodes.add(28036);
zipcodes.add(28269);
zipcodes.add(28233);
QueryBuilder qb = new QueryBuilder();
qb.and(new QueryBuilder().put(PersonFields.ZIPCODE).in(zipcodes).get());

DateMidnight dt = new DateMidnight();
DateMidnight latest = dt.minusYears(20);
DateMidnight earliest = dt.minusYears(40);
qb.and(new QueryBuilder().put(PersonFields.DOB).greaterThanEquals(earliest.toDate()).get());
qb.and(new QueryBuilder().put(PersonFields.DOB).lessThanEquals(latest.toDate()).get());
qb.and(new QueryBuilder().put(PersonFields.GENDER).in(genderList).get());
qb.and(new QueryBuilder().put(PersonFields.ETHNICITY).in(ethnicityList).get());
qb.and(new QueryBuilder().put(PersonFields.LF_STATUS).in(lookingfor).get());
qb.and(new QueryBuilder().put(PersonFields.DRINK).is(1).get());
qb.and(new QueryBuilder().put(PersonFields.SMOKE).is(1).get());
logger.info("Query:"+qb.toString());
document.putAll(qb.get());
logger.info(document.toString());
persons.findOne();
long startquery=System.currentTimeMillis();

logger.info("START FIND QUERY");
cursor = persons.find(document)
.sort(new BasicDBObject()
.append(PersonFields.ACCOUNT_STATUS, 1)
.append(PersonFields.LAST_ACTIVITY, -1))
.limit(10);
 
while(cursor.hasNext()) {
cursor.next();
}
 
long endquery = System.currentTimeMillis();
logger.info("END FIND QUERY...Response Time (ms): "+ (endquery - startquery));

 
 
} catch (UnknownHostException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
mongo.close();
}
public static void main (String[] args) {
new TestSearchMongoPerson();
}

}

Jeff Yemin

unread,
Aug 20, 2013, 5:29:54 PM8/20/13
to mongod...@googlegroups.com
Just to make sure we're doing the same thing, can you drop the collection, run the Insert again, then re-time the query?  

Jeff

Tom Murray

unread,
Aug 21, 2013, 10:36:02 AM8/21/13
to mongod...@googlegroups.com
Hi Jeff,

I dropped the collection and ran the TestMongoPojoInsert.java.  It only creates a 1000 records.  I was testing against 1 million records. Modify the Constants.CREATE_NUMBER_OF_RECORDS variable and set to 1 million and rerun the TestMongoPojoInsert.java class.  After the first search is complete, the records are in memory ( or load into memory with the touch command).  Run the search again, and it takes ~2 seconds to iterate over the cursor.


Here is the command I used to load the collection and index into memory prior to running the search:
> use TESTdb
>db.runCommand({touch:"testperson", data:true, index:true})


Thanks again for your help!!

--Tom




You received this message because you are subscribed to a topic in the Google Groups "mongodb-user" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mongodb-user/l7x9bM8rJuc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mongodb-user...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages