what is better insert or upsert

574 views
Skip to first unread message

Yulias Stolin

unread,
May 1, 2012, 3:38:06 PM5/1/12
to mongodb-user
I should implement the user_profile collection.
I have a rutime component that always reads (~10000tps), and once a
day i get the new file for user profiles and should load it.

the collection look likes {userId: "123", profile: [{"prof1":"p"},
{"prof2":"p2"}]
the collection size could be very big.
I use sharding so my sharding key probably should be "userId"?

What is the better way to treat such requirement:
1. drop collection
create collection
insert all data into it
(i'm aware that for some period of time i do not have any data)
2. load the new data by using upsert for each document

In case of insert another question: what is better to insert the
documents one by one or withing a bulk?
And how can I calculate the most good bulk size?

Eliot Horowitz

unread,
May 2, 2012, 12:35:17 AM5/2/12
to mongod...@googlegroups.com
What % of the data is going to change every day?
If its a small %, the updates is better.
If its most, and you can take the time to create an entire new
collection, they inserts should be faster.
> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>

Yulias Stolin

unread,
May 2, 2012, 3:52:10 AM5/2/12
to mongodb-user
I think that the % of changed data could be 50% or 80%.
I have also very similar problem (other profiles ) and there the % of
changed data will be much lower about 10%.

But in any case, when I receive that data in some new file and I'm not
aware if it new or changed user

Kyle Banker

unread,
May 2, 2012, 11:18:54 AM5/2/12
to mongod...@googlegroups.com
For the 50% - 80% case, it'll probably be faster to drop and re-insert. For the 10%, you can probably update. You should benchmark both techniques to see which is fastest.

Bulk inserts are usually faster is the documents are small (e.g., < 10KB).

Yulias Stolin

unread,
May 4, 2012, 5:51:22 PM5/4/12
to mongodb-user
I have a benchmark or reinserting and the result is terrible. I have
tested it with replica set m+s+a and also with 2 and 3 shards, where
each shard is replica set m+s+a and while the loading my tps is
falling down to more than half of what it used to be. I have tried
insert bulk of 1000 and 500 documents. The total amount of documents
is 5 million ( not so much).
I probably missing something, but it seems that mongo simply Has a
very big impact on performance because of it's lock. Thus my queries
are much more solver during the I loading with insert, that impact the
whole performance ( tps, latency, ...)
Reply all
Reply to author
Forward
0 new messages