MongoDB C# driver blocks after fetching 100 records

1,934 views
Skip to first unread message

mycom...@gmail.com

unread,
May 1, 2013, 6:04:07 PM5/1/13
to mongodb...@googlegroups.com
Hi

I'm trying to fetch some records based on a simple query. The code snippet is posted below. The query is supposed to return 3000K records...but when I execute, the application blocks after getting 100 records. Then after couple of minutes it fetches the remaining records. Is there a config setting I'm missing? Has anyone experienced similar issue?

Running MongoDB 2.4.3
I have the right indexes and verified using explain that the query is using the same index.

            MongoClient client = new MongoClient("mongodb://myserver:27017"); // connect to localhost
            MongoServer server = client.GetServer();
            MongoDatabase omni = server.GetDatabase("MyDB");

            MongoCollection<BsonDocument> coll = omni.GetCollection<BsonDocument>("MyCollection");

            var query = Query.And(Query.Matches("category", "foo"),
                                  Query.Matches("refid", "bar"),
                                  Query.Exists("Topic_ID")
                                  );

            int count = 0;
            var cursor = coll.FindAll();//coll.Find(query);
            cursor.SetLimit(1000);

            foreach (var doc in cursor)
            {
                Console.WriteLine("i=" + (++count));
            }

Thanks
MyCom

mycom...@gmail.com

unread,
May 1, 2013, 6:06:07 PM5/1/13
to mongodb...@googlegroups.com
Forgot to mention that I'm using C# driver 1.8.1 (I tried with 1.7 and 1.8 but I've the same problem).

mycom...@gmail.com

unread,
May 1, 2013, 6:08:42 PM5/1/13
to mongodb...@googlegroups.com
I posted the wrong code....here's the actual code:

            MongoClient client = new MongoClient("mongodb://myserver:27017"); // connect to localhost
            MongoServer server = client.GetServer();
            MongoDatabase omni = server.GetDatabase("MyDB");

            MongoCollection<BsonDocument> coll = omni.GetCollection<BsonDocument>("MyCollection");

            var query = Query.And(Query.Matches("category", "foo"),
                                  Query.Matches("refid", "bar"),
                                  Query.Exists("Topic_ID")
                                  );

            int count = 0;
            var cursor = coll.Find(query);

craiggwilson

unread,
May 2, 2013, 10:23:56 AM5/2/13
to mongodb...@googlegroups.com
MongoDB operates in batches.  When you issue a query, the server will provide back a batch of documents (101 by default).  The driver will then let you iterate over them and once you have completed iterating the batch, we issue a GetMore request that will get the next batch.  This sounds exactly like your behavior.

I'm not sure what you mean by "a couple of minutes".  The code you provided doesn't wait "a couple of minutes" to get the next batch for me.  Are you doing something with those documents that could take "a couple of minutes" to iterate over the 100 documents you have received?

mycom...@gmail.com

unread,
May 2, 2013, 11:26:00 AM5/2/13
to mongodb...@googlegroups.com
Thanks for the response! If you look at the code I'm simply looping and incrementing the count. But every "getmore" call is taking ~ 30 sec - 60 secs (sometimes couple of minutes).
I tried the same code in Java (using the Java driver) and hitting the same MongoDB server and the loop finishes without any pause.

craiggwilson

unread,
May 2, 2013, 1:38:17 PM5/2/13
to mongodb...@googlegroups.com
Hmm.  I can't reproduce this at all.  I can drop your code in and it simply runs start to finish without any issues...  Can you provide a sample of how your documents look and the average size of each???  

This gist is the program I used.  If you can modify it to make it fail like you are talking about, please do so... https://gist.github.com/craiggwilson/5503913

mycom...@gmail.com

unread,
May 2, 2013, 2:16:49 PM5/2/13
to mongodb...@googlegroups.com
I updated the code on github. Please try it and let me know if you see the issue. 
The previous code was limiting the number of records to 1000 and that works as expected.

Thanks
Mahesh

craiggwilson

unread,
May 2, 2013, 7:59:00 PM5/2/13
to mongodb...@googlegroups.com
Not sure what you mean when you say you updated the code on github... could you provide a link?

mycom...@gmail.com

unread,
May 3, 2013, 11:20:30 AM5/3/13
to mongodb...@googlegroups.com

craiggwilson

unread,
May 3, 2013, 8:35:42 PM5/3/13
to mongodb...@googlegroups.com
Still not having any issues.  Perhaps you could provide your mongod logs when you are running this process and we could see what is going on on the server?

mycom...@gmail.com

unread,
May 6, 2013, 11:23:00 AM5/6/13
to mongodb...@googlegroups.com
Here's the mongod logs from the server. Notice the 4-5 sec lag between each "getmore" call:

Mon May 06 14:28:21.885 [initandlisten] connection accepted from 207.138.251.2:13950 #39 (1 connection now open)
Mon May 06 14:28:23.728 [conn39] getmore omni.omnidata cursorid:1740362591354944787 ntoreturn:0 keyUpdates:0 numYields: 1095 locks(micros) r:45112 nreturned:7051 reslen:4194524 1095ms
Mon May 06 14:28:29.190 [conn39] getmore omni.omnidata cursorid:1740362591354944 787 ntoreturn:0 keyUpdates:0 numYields: 1098 locks(micros) r:45451 nreturned:7096 reslen:4194449 1097ms
Mon May 06 14:28:34.650 [conn39] getmore omni.omnidata cursorid:1740362591354944787 ntoreturn:0 keyUpdates:0 numYields: 1103 locks(micros) r:45723 nreturned:7169 reslen:4194682 1102ms
Mon May 06 14:28:40.110 [conn39] getmore omni.omnidata cursorid:1740362591354944787 ntoreturn:0 keyUpdates:0 numYields: 1099 locks(micros) r:46066 nreturned:7078 reslen:4194724 1098ms
Mon May 06 14:28:43.966 [initandlisten] connection accepted from 207.138.251.2:16335 #40 (2 connections now open)
Mon May 06 14:28:45.541 [conn39] getmore omni.omnidata cursorid:1740362591354944787 ntoreturn:0 keyUpdates:0 numYields: 1098 locks(micros) r:46480 nreturned:7076 reslen:4194615 1097msMon May 06 14:28:50.965 [conn39] getmore omni.omnidata cursorid:1740362591354944787 ntoreturn:0 keyUpdates:0 numYields: 1101 locks(micros) r:52684 nreturned:7075 reslen:4194860 1105ms
Mon May 06 14:28:56.419 [conn39] getmore omni.omnidata cursorid:1740362591354944787 ntoreturn:0 keyUpdates:0 numYields: 1105 locks(micros) r:46543 nreturned:7086 reslen:4194747 1105ms
Mon May 06 14:29:01.845 [conn39] getmore omni.omnidata cursorid:1740362591354944787 ntoreturn:0 keyUpdates:0 numYields: 1099 locks(micros) r:45408 nreturned:7090 reslen:4194684 1098ms
Mon May 06 14:29:07.274 [conn39] getmore omni.omnidata cursorid:1740362591354944787 ntoreturn:0 keyUpdates:0 numYields: 1102 locks(micros) r:45617 nreturned:7086 reslen:4194736 1101ms
Mon May 06 14:29:12.773 [conn39] getmore omni.omnidata cursorid:1740362591354944787 ntoreturn:0 keyUpdates:0 numYields: 1099 locks(micros) r:46651 nreturned:7116 reslen:4194623 1098ms
Mon May 06 14:29:18.234 [conn39] getmore omni.omnidata cursorid:1740362591354944787 ntoreturn:0 keyUpdates:0 numYields: 1100 locks(micros) r:46038 nreturned:7110 reslen:4194571 1099ms

craiggwilson

unread,
May 6, 2013, 12:44:02 PM5/6/13
to mongodb...@googlegroups.com
Couple things I see here.  

1 - how big are your documents?  You are returning ~7000 documents per getmore and documents take time to transfer across the network which could account for the delay.  In addition, every single query takes at least 1 second for server to respond.  So, you'll need to subtract that second of time spent at the server as time spent on the client.

2 - The 5 seconds is far below your original claim of a couple minutes.  Has something changed?

mycom...@gmail.com

unread,
May 6, 2013, 2:08:01 PM5/6/13
to mongodb...@googlegroups.com
Hi

The avg object size as per the "stats" command is 623 bytes:

        "ns" : "omni.omnidata",
        "count" : 382598,
        "size" : 238637280,
        "avgObjSize" : 623.7285087742225,
        "storageSize" : 304115712,
        "numExtents" : 12,
        "nindexes" : 2,
        "lastExtentSize" : 85716992,
        "paddingFactor" : 1,
        "systemFlags" : 1,
        "userFlags" : 0,
        "totalIndexSize" : 31731056,
        "indexSizes" : {
                "_id_" : 12427520,
                "Report_Suite_nm_1_Business_Reference_cd_1" : 19303536
        },
        "ok" : 1

Regarding #2 : The original query that was taking minutes was as follows:

var query = Query.And(Query.Matches("Report_Suite_nm", "webmdp1corevista"),
                                  Query.Matches("Business_Reference_cd", "vitamin monograph"),
                                  Query.Exists("Topic_ID")
                                  );

            int count = 0;
            var cursor = coll.Find(query);

I have an index on "Report_Suite_nm" and "Business_Reference_cd". 

I wanted to simplify the example to illustrate the slowness issue so I posted the sample with a simple "findall()" (which takes 5 secs between each "getmore")

Thanks
Mahesh

craiggwilson

unread,
May 6, 2013, 6:15:16 PM5/6/13
to mongodb...@googlegroups.com
Ok, just to be clear, regardless of if you have an index on these fields, there is no index that will be used for this query because you are using Query.Matches.  This results in a regular expression query that is not anchored at the beginning of the word.  Hence, we cannot use an index for it.  This in turn results in a full collection scan everytime you issue the query.  So, to build up those ~7000 document batches, it might be comparing hundreds of thousands of documents to get this out.

Have you attempted to run the sample code I provided?  Do you get the same results you are suggesting here, or does everything go straight through?  I'm trying to figure out if this is related to your data or to the driver itself...

mycom...@gmail.com

unread,
May 7, 2013, 10:56:40 AM5/7/13
to mongodb...@googlegroups.com
When I use your code to create the objects I don't see the 4-5 sec delay. Does it mean that the delay is proportional to the document size? My document size is very small (620 bytes). So I'm really surprised that it would cause this delay.

Thanks
Mahesh

craiggwilson

unread,
May 7, 2013, 11:25:45 AM5/7/13
to mongodb...@googlegroups.com
I'm surprised it causes that delay as well.  I'm also not suggesting that yet, although I won't rule it out.  I simply wanted to confirm that the difference between my sample and yours is indeed the data, not the driver.

So, what is different about the data? 

1) Yes, it's bigger.  How much does this matter?  
  a) Well, ~600bytes * 7000 documents per batch means you are transferring about 4MB per batch across the wire.  I assume your network can handle 4MB without much issue, so we can ignore network latency.
  b) Because you are running an un-indexed query, it means that the server needs to go through (potentially) a lot of documents to find the matches.  In your test logs, it showed that this was taking roughly ~1 second to do this per batch. So that is the cause of about 20% of your delay.

2) Could you post how your data looks?  I can't imagine there is something there that would be throwing us off, but you never know.

mycom...@gmail.com

unread,
May 7, 2013, 3:52:33 PM5/7/13
to mongodb...@googlegroups.com
This is my document structure:
{
       "_id" : ObjectId("517fd4b4eb183ad2f7b6f594"),
       "Week_ID" : 381,
       "Week_desc" : "2013-04-14 to 2013-04-20",
       "Report_Suite_cd" : 4,
       "Report_Suite_nm" : "foooobaaaaar",
       "US_Only" : "All",
       "Health_Channel_nm" : "blahblah",
       "Business_Reference_cd" : "someBusRefId",
       "Topic_ID" : 6067,
       "Subject_cd" : "Skin Problems & Treatments",
       "Sponsor_Program_nm" : "ntc",
       "Num_Visitors" : 14639,
       "Num_Visits" : 14996,
       "Num_Pageviews" : 194308,
       "Num_Visits_seo" : 31,
       "Num_Pageviews_seo" : 36,
       "Num_Visits_nl" : 0,
       "Num_Pageviews_nl" : 0,
       "Num_Visits_organic" : 0,
       "Num_Pageviews_organic" : 0,
       "Added" : "2013-04-26 12:59:17.810000000"

craiggwilson

unread,
May 8, 2013, 10:57:03 AM5/8/13
to mongodb...@googlegroups.com
Ok, I have some more questions.  I know this is becoming large and arduous, so I'm still gathering information.  I've gone back over some of your earlier statements and want to clarify a few things.  First, I want to reiterate something that might be the basis for your problems.

You indicated this was the query you are using:

var query = Query.And(Query.Matches("category", "foo"),
   Query.Matches("refid", "bar"),
   Query.Exists("Topic_ID"));

I'm not sure what you expect the actual query to look like, but this is using a regular expression underneath to perform this query.  While regular expressions can be used in queries, they can only be used by an index when anchored to the beginning of a string.  Your query above will get translated to the below query in the shell:

var query = { "category", /foo/, "refid", /bar/, "Topic_ID" : { $exists: true } }

You mentioned in your first post that you ran this in the shell with explain and that it was using an index.  I would like to know what query you ran explain on in the shell.  If you could, please run the above query in the shell with .explain() and copy your output.

In addition, and very similarly, you mentioned that you ran this same query in the Java Driver.  Could you please post your java code so we can compare?

Thanks...

mycom...@gmail.com

unread,
May 8, 2013, 5:12:21 PM5/8/13
to mongodb...@googlegroups.com
Thanks for following up so far. I truly appreciate it. I'll try to summarize the whole thread:

I created an index on "category_refid". The query on the shell is db.collection.find( {category:"foo"}, {refid:"bar"} )
I verified that the above query is using the index "category_refid".
In the C# code I translated the above query to :

            var query = Query.And(
                Query.Matches("category", "foo"),
                Query.Matches("refid", "bar")
                //Query.Exists("Topic_ID")
            );
(Based on your response it looks like that's not the right way as it's doing a regular expression instead of a "select" query.)

When I run the code using the above C# query (using Matches) I see  2-3 minute wait.

I changed the C# code to this:
            var query = Query.And(
                Query.EQ("category", "foo"),
                Query.EQ("refid", "bar")
                //Query.Exists("Topic_ID")
            );
And now I see the ~ 3-4 sec pause between the loops. So the performance has improved greatly. I should have used EQ instead of matches.

For the Java comparison I simplified the code; I removed the Query and replaced it with "findall()".

C# code:
             var cursor = coll.FindAll();

            foreach (var doc in cursor)
            {
                Console.WriteLine("i=" + (++count));
            }

The equivalent Java code is :
      DB omni = mongoClient.getDB( "omni" );
      DBCursor cursor = omni.getCollection("omnidata").find();

        try {
            int i=0;
           while(cursor.hasNext()) {
               i++;
               System.out.println("i=" + i + " -- " + cursor.next());
           }
        } finally {
           cursor.close();
        }

I don't see any pauses in Java code. But I see the 3-4 sec pause in C# when using C# driver.

Hope this helps.

Thanks
Reply all
Reply to author
Forward
0 new messages