Bulk read from mongoDB

1,503 views

Skip to first unread message

Annu Mittal

unread,

Jul 31, 2017, 8:43:46 AM7/31/17

to mongodb-user

I have simple query (without any where clause). It returns around 150K documents each document has around 5 fields. To get all the records in JSON array format, it takes around 8-10 seconds. This performance is okay with single request, but my code is running inside a Java RESTful service. If I send 10-15 requests then performance degrades, not it takes more than a minute to get the response.

On investigation, I set the batch size with the find query to 5000. With this there is improvement by 50% but it still not acceptable

I am using MongoDB 3.4.4 with Java driver.

Following is the code snippet.

I have following questions.

1. How to find the optimized batch size.

2. In MongoDB APIs I could not any way to get bulk data, I could only see the Find method which can return cursor, I have to iterate over all the documents one by one. Is there any way I can do the bulk read.

3. With one request performance is good, why performance degrades heavily if I make concurrent requests? Is there any other way/parameter which can optimize the performance for me.

Annu

Kevin Adistambha

unread,

Aug 3, 2017, 12:59:27 AM8/3/17

to mongodb-user

Hi Annu

It returns around 150K documents each document has around 5 fields. To get all the records in JSON array format, it takes around 8-10 seconds

If I send 10-15 requests then performance degrades

I believe you are asking an identical question in StackOverflow: https://stackoverflow.com/questions/45413557/bulk-read-from-mongodb

How to find the optimized batch size.

Batch size is only part of the solution. The ideal batch size will depend on your use case (e.g. your document sizes), which apparently you have found through trial and error. I believe the main issue is point #3 you describe below.

In MongoDB APIs I could not any way to get bulk data, I could only see the Find method which can return cursor, I have to iterate over all the documents one by one. Is there any way I can do the bulk read.

I’m not aware of any method to do a bulk read like you described. In a query, it is a reasonable assumption that you want to examine the query results document-by-document, so having a cursor to iterate through the result set is the typical result of a database query. What you are describing is not a find() query result, but rather similar to a mongodump result.

With one request performance is good, why performance degrades heavily if I make concurrent requests? Is there any other way/parameter which can optimize the performance for me.

This is likely because you are doing multiple collection scans simultaneously. You are forcing MongoDB to load the whole collection into memory, but it’s likely that your collection size is larger than your memory. If you are sending only one request, the collection scan can proceed in a relatively linear fashion. However, once you send 10-15 requests at the same time, MongoDB is forced to juggle requests from 15 different threads at the same time.

As an illustration, MongoDB will try to load the whole collection to respond to request #1. But since there is not enough memory, it is forced to eject documents that were sent to answer request #2 from memory, then try to load the same documents it just ejected from memory to respond to request #3, etc. multiplied by 15 requests and 150K documents. Disk thrashing is normally the result of this. Unless your memory is large enough to contain the whole collection, this “degradation” will occur every time, and is typically the natural result of not having enough RAM.

Could you post some example documents? Also, could you elaborate on why the query need to return the whole collection every time?