Java Driver Insert Batch Size

2,255 views
Skip to first unread message

Vitaly Peressada

unread,
Feb 7, 2011, 8:43:03 AM2/7/11
to mongodb-user
mongo 1.6.5, java driver 2.4, morphia 0.98

Our app inserts large number of records (400K+) using java driver.
Here is the relevant code.

public List<DBObject> toDBObjectCollection(Collection<T> records)
throws Exception
{
Mapper mapper = ((DatastoreImpl)getDatastore()).getMapper();
List<DBObject> dbObjects = new ArrayList<DBObject>();
for(T record : records) {
dbObjects.add(mapper.toDBObject(record));
}

return dbObjects;
}

public void saveAll(List<T> list) throws Exception {
...
getCollection().insert(toDBObjectCollection(list));
}

Will batching inserts in 10K, 50K, etc. improve performance? Our
avgObjSize is 872.

Eliot Horowitz

unread,
Feb 7, 2011, 8:46:35 AM2/7/11
to mongod...@googlegroups.com
Its likely.
The easiest way to figure that out is to try.

> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>
>

Scott Hernandez

unread,
Feb 7, 2011, 9:40:33 AM2/7/11
to mongod...@googlegroups.com
There is no need to write all that code. Morphia (Advanced)Datastore
has a batch insert method:

AdvancedDatastore ads = ...;
ads.insert(listOfEntities);

http://code.google.com/p/morphia/source/browse/trunk/morphia/src/test/java/com/google/code/morphia/TestPerf.java#312


Yes, batching does speed things up. Doing this in parallel, in
multiple threads, can help too; there is considerable time (cpu) spent
in preparing each batch into binaries packets to send to the server.

Vitaly Peressada

unread,
Feb 7, 2011, 1:18:09 PM2/7/11
to mongodb-user
Thanks, Scott. Parallelizing inserts will help.

But my question was geared more towards java driver's DBCollection
insert(List<DBObject> list). Having average object size of 1K, will it
be faster, from mongo java driver point of view, to insert list of
500K or 10 inserts with 50K objects?

And if there were considerable time saving using parallelized inserts,
I would think that the right place to do it will be in mongo java
driver and not in morphia.

What I have now is a recursive batch insert with configurable batch
size.
public void saveAll(Collection<T> col) throws Exception {
if (col == null || col.size() == 0) {
return;
}

List<T> list = (List<T>) col;
if (list.size() > mongoBatchSize) {
saveAll(list.subList(0, mongoBatchSize));
saveAll(list.subList(mongoBatchSize, list.size()));
return;
}

getCollection().insert(toDBObjectCollection(list));

return;
}


On Feb 7, 9:40 am, Scott Hernandez <scotthernan...@gmail.com> wrote:
> There is no need to write all that code. Morphia (Advanced)Datastore
> has a batch insert method:
>
> AdvancedDatastore ads = ...;
> ads.insert(listOfEntities);
>
> http://code.google.com/p/morphia/source/browse/trunk/morphia/src/test...

Scott Hernandez

unread,
Feb 7, 2011, 1:22:55 PM2/7/11
to mongod...@googlegroups.com
On Mon, Feb 7, 2011 at 10:18 AM, Vitaly Peressada <vit...@ufairsoft.com> wrote:
> Thanks, Scott. Parallelizing inserts will help.
>
> But my question was geared more towards java driver's DBCollection
> insert(List<DBObject> list). Having average object size of 1K, will it
> be faster, from mongo java driver point of view, to insert list of
> 500K or 10 inserts with 50K objects?

Using the morphia insert method does bulk inserts in the driver; so it
is up to you how much code you want to write.

> And if there were considerable time saving using parallelized inserts,
> I would think that the right place to do it will be in mongo java
> driver and not in morphia.

I would do it at the top level, in your application, on top of
morphia; you can do either but at the driver it will require you
writing more code.

Vitaly Peressada

unread,
Feb 7, 2011, 1:58:35 PM2/7/11
to mongodb-user
> I would do it at the top level, in your application, on top of
> morphia; you can do either but at the driver it will require you
> writing more code.

I could certainly do that but having a faster large size inserts will
benefit my app only. I would imagine having this as a feature would be
beneficial to at least morphia?

Scott Hernandez

unread,
Feb 7, 2011, 2:24:32 PM2/7/11
to mongod...@googlegroups.com
On Mon, Feb 7, 2011 at 10:58 AM, Vitaly Peressada <vit...@ufairsoft.com> wrote:
>> I would do it at the top level, in your application, on top of
>> morphia; you can do either but at the driver it will require you
>> writing more code.
>
> I could certainly do that but having a faster large size inserts will
> benefit my app only. I would imagine having this as a feature would be
> beneficial to at least morphia?

So, morphia already does batch inserts, the same as the driver...
since it just calls the driver.

To break up the inserts into parallel batches (using multiple threads)
you would have to do that in your app. I don't think the driver or
morphia has that kind of feature on the (short) list. Although, doing
this on reads (queries) would really have to happen in the driver, and
across multiple threads.

Scott Hernandez

unread,
Feb 6, 2012, 3:44:45 PM2/6/12
to mongodb-user
There are two different loops in that code to compare performance, and
one batch insert command (see the last link, line 325).

http://code.google.com/p/morphia/source/browse/trunk/morphia/src/test/java/com/google/code/morphia/TestPerf.java#311
http://code.google.com/p/morphia/source/browse/trunk/morphia/src/test/java/com/google/code/morphia/TestPerf.java#325

On Mon, Feb 6, 2012 at 3:31 PM, Hung Lin <hun...@gmail.com> wrote:
> Hi Scott,
>
> I read the code from the link you posted, the batch mode is a loop to
> insert one by one:
>
>                for (DBObject doc : batchPush)
>                        c.insert(doc);
>
> So, can I say as the mongodb server still receives each insert command
> one by one, therefore, it not really a batch operation as mongodb
> server per se.  Please correct me if I'm wrong.  Thanks for your time.
>
>
>
>
> Best,
> Hung
>
>
> On Feb 7 2011, 9:40 am, Scott Hernandez <scotthernan...@gmail.com>


> wrote:
>> There is no need to write all that code. Morphia (Advanced)Datastore
>> has a batch insert method:
>>
>> AdvancedDatastore ads = ...;
>> ads.insert(listOfEntities);
>>

>> http://code.google.com/p/morphia/source/browse/trunk/morphia/src/test...

vermoid

unread,
Mar 13, 2012, 5:34:16 PM3/13/12
to mongodb-user
one thing which is really confusing is that the bulkInsert is being
used through the AdvanceDatastore.
there is a regular datastore which has a save(T... entities ) api, but
that Api underneath is just doing the loop.

On Feb 6, 1:44 pm, Scott Hernandez <scotthernan...@gmail.com> wrote:
> There are two different loops in that code to compare performance, and
> one batch insert command (see the last link, line 325).
>
> http://code.google.com/p/morphia/source/browse/trunk/morphia/src/test...http://code.google.com/p/morphia/source/browse/trunk/morphia/src/test...

Scott Hernandez

unread,
Mar 13, 2012, 7:17:36 PM3/13/12
to mongod...@googlegroups.com
Insert/save are very different things and one can be done in batch
with the server, while the other is just a client-side helper.
Reply all
Reply to author
Forward
0 new messages