Insert perfomance issue

290 views
Skip to first unread message

Hanson Lu

unread,
Sep 29, 2012, 4:08:39 AM9/29/12
to mongod...@googlegroups.com
I did some tests for insert performance of MongoDB. It shows  insert performance is slower when collection have  index and SAFE write concern is set.

 Here is the test result 

   *  Inserting with no index and  safe write concern  not set 
       threads count      average speed 
       1                        10161
       2                        17519
         3                                15859
         (thread count means the concurrent inserting thread count)
        The speed seem is higher .
  *  Inserting with index and safe write concern not set 
       threads count      average speed 
       1                         6320
         2                                  6240

 * Inserting with index and safe write concern  set 
        threads count      average speed 
        1                       972
        2                       1530 
        5                       2573
 
 The data listed  seems normal?  the speed of inserting with index and safe write concern, is only 1/5 of that with  no index and no safe write concern.
 Is any way to speed up the  performance for inserting with  index and safe write concern condition?
 I only create two index for that collection.  

Regards
Hanson






Rob Moore

unread,
Sep 29, 2012, 2:44:13 PM9/29/12
to mongod...@googlegroups.com

Hanson,

Yes - there will be a huge impact to performance when moving from "NONE" to "SAFE" write concern. 

The reason for the impact is simple.  In the "NONE" case the driver simply gets the message on the wire and returns to the caller.  In the "SAFE" case you need to wait for the driver to respond to the insert.  The round trip latency is usually orders of magnitude longer than pushing the bytes into a socket even when connected to a MongDB server on the same machine.

My team has written a new Java driver to allow developers to overcome this exact problem by using asynchronous method calls.  To get the full benefit the developer needs to do some work to either handle responses via a callback or defer processing results via a Future. 
     http://www.allanbank.com/mongodb-async-driver/

We have run a custom micro-benchmark to show the relative performance on using the different write concerns.  (For our driver Durability is the same as the 10gen's WriteConcern and we call "SAFE", "ACK" since we feel that is a more descriptive name.)
    http://www.allanbank.com/mongodb-async-driver/performance/performance.html

In addition, the way the 10gen driver only allows a single thread use a connection causes scalability problems as demonstrated using the YCSB.  Our driver does not have the same scalability problems since multiple threads can be using a single connection concurrently.
    http://www.allanbank.com/mongodb-async-driver/performance/ycsb.html

If you have a specific question about using our driver feel free to ask on this list or via direct email.

Rob.

Osmar Olivo

unread,
Oct 1, 2012, 3:34:00 PM10/1/12
to mongod...@googlegroups.com
So a few different things are going on when you perform an insert, and it is also very important to know the details of write concern.

1. Write Concern
When you set a safe write concern, the application blocks and waits for a response from the server confirming that it received the insertion command. When you do not set a write concern, the application just sends the message and returns, not awaiting any response from the server (you can think of it as fire and forget from an application perspective). So, to be clear, it is not that inserting without a write concern is "faster" but rather that it does not wait for confirmation that data has been written and returns so it does not have to deal with server processing time and network latency of the messages.

2. Inserting with an index

Now, if you do not have an index and perform an insert, then all that happens is your insert goes to the server and gets written in the next allocated block of the disk. Things like fragmentation and growing file sizes can complicate this process as you may need to move files on disk in order to accommodate their size, so things like the state of your disk and the size of the files can invalidate many of these "performance tests".  But that aside, there is only one write operation and then the server returns a response if there is a write concern.

Now then, If you perform an insert with an index than you are actually performing 2 write operations and a read. 1 write to insert your new document, 1 read to page the index into memory if it is not there already, and another write to add this new document to the index. And then if you are running with a write concern it will return a message to the application confirming its been completed. So yes, having an index means slower write operations in exchange for much faster reads. This will get even slower if you have multiple indexes, because you will have to update all of them on every insert. 


3. What does this all mean: 

What this effectively means is that if you run application side with no write concern, then your tests will always result consistant values whether there as an index or not because you send out the write message but never actually wait for it to complete and get confirmation. However if you ARE running with a write concern of 1, then you must wait for the operation to actually be performed and receive a response. With an index you are performing more operations on insert than without one, so it will naturally be slower for you to complete and receive a confirmation. Even more so if you run with higher write concerns for replication. 

There really isn't any real way to speed up inserts from an application side if you want to wait for confirmation from the server that the insert was successful. And there's no real way to make inserting with an index any faster because you cant get around the fact that all of these updates on the index need to be performed with every insertion. Hopefully why this is, makes sense now. 

For more on write concern and Indices, you should check out the mongodb documentation. http://docs.mongodb.org/manual/applications/replication/#replica-set-write-concern

-Osmar

Bruce Zhao

unread,
Oct 7, 2012, 9:44:26 PM10/7/12
to mongod...@googlegroups.com
Hi all:

In the shell, I can use db.getCollectionNames() get a array of
collection's name, but I don't know how to get collectionNames by c
driver, can somebody help me ?



Bruce Zhao
2012-10-08

Roman Alexis Anastasini

unread,
Oct 7, 2012, 9:58:04 PM10/7/12
to mongod...@googlegroups.com
Morning Bruce,

here is an two year old post about this:
https://groups.google.com/forum/?fromgroups=#!topic/mongodb-user/Iu7noJL6LbA

In short: there is no getCollectionName in the c drivers. Dunno if the drivers have been updated with this function yet. But as you can read in the post, it is quite easy to implement it at you own. :-)

"There is no getCollectionNames command. If you type db.getCollectionNames in the shell (note no parens) it will print out the source and you can see how it's implemented there."

Cheers,
Roman





Bruce Zhao
2012-10-08

--
You received this message because you are subscribed to the Google
Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com
To unsubscribe from this group, send email to
mongodb-user+unsubscribe@googlegroups.com
See also the IRC channel -- freenode.net#mongodb

Hanson Lu

unread,
Oct 7, 2012, 11:06:30 PM10/7/12
to mongod...@googlegroups.com
Hi Osmar,
  I want to note that the inserting speed is seen from client side and server side(mongostat). The both is almost same.
 
  BTW, the result of insering with safe write conern , and no index.
       Thread        average speed 
        5                  4156
       10                 4898
 
 So if the speed with index and safe concern of my test  is normal, is  MongoDB  not suitable for apps that require high inserting speed? 
  It seems that there is no big improvement than Relational Db when inserting with index and  safe conern.

Hanson Lu 

Bruce Zhao

unread,
Oct 7, 2012, 11:43:50 PM10/7/12
to mongod...@googlegroups.com
Roman,
    Thank you very much!! It's helpful!!

See also the IRC channel -- freenode.net#mongodb
--
You received this message because you are subscribed to the Google
Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com
To unsubscribe from this group, send email to

Roman Alexis Anastasini

unread,
Oct 7, 2012, 11:48:54 PM10/7/12
to mongod...@googlegroups.com
Hey Bruce,

you're welcome!
If you mind asking, are you located in or near Beijing?

I am asking, because I was asked by 10gen to organize the official Beijing MongoDB office hours going to start in November. For now I am looking for interested MongoDB users. :-)

Best,
Roman
Roman Alexis Anastasini
Senior Web Developer
CMUNE

http://uberstrike.cmune.com

Hanson Lu

unread,
Oct 7, 2012, 11:54:00 PM10/7/12
to mongod...@googlegroups.com, ro...@cmune.com
Is google groups bug? I start the thread, but your post seems not related.

Roman Alexis Anastasini

unread,
Oct 8, 2012, 1:10:20 AM10/8/12
to mongod...@googlegroups.com, ro...@cmune.com
Well, that's odd...
I just replied via mail to another thread... dunno why it ended up in this thread here...

Sorry for that.

Max Schireson

unread,
Oct 8, 2012, 1:17:59 AM10/8/12
to mongod...@googlegroups.com

A few comments:

1. You might try a few more threads to see if you keep getting more insert performance.

2. That said, if your just counting inserts/second vs same with a single table in an RDBMS I would not expect an advantage for mongo. Performance-wise I'd expect the advantage for mongo in two places:
  I) If the data requires child tables in the rdbms, for example orders with order headers and lines in an rdbms that can be stored as one object in mongo. This is quite common.
  II) Built in sharding which makes it easy to scale out. Need to do a few hundred thousand writes per second? No problem.

You might start a thread about tuning the insert with index and safe mode; I'm sure folks smarter than me about that would be happy to look at monvostat, iostat, settings etc and give some advice.

-- Max

Bruce Zhao

unread,
Oct 8, 2012, 8:59:09 AM10/8/12
to mongod...@googlegroups.com
Hey Roman:
    I'm in ShenZhen, not in Beijing.

Sorry to Hanson Lu, may be is my fault. :)

Hanson Lu

unread,
Oct 8, 2012, 9:39:14 PM10/8/12
to mongod...@googlegroups.com
Hi Max
   I did test more threads, but it did not getting more insert performance, sometimes it get  more worse.
   
Hanson

Max Schireson

unread,
Oct 8, 2012, 9:42:39 PM10/8/12
to mongod...@googlegroups.com

Sounds like the right approach, add threads til you see a peak in throughput, glad you found the peak.

Feel free to post some additional details if you want some tuning help. New thread is probably best.

-- Max

Rob Moore

unread,
Oct 8, 2012, 11:01:21 PM10/8/12
to mongod...@googlegroups.com


On Monday, October 8, 2012 9:39:15 PM UTC-4, Hanson Lu wrote:
   I did test more threads, but it did not getting more insert performance, sometimes it get  more worse.

As I stated before the issue is you are blocking on each thread for a reply and also blocking other threads from using the connection to send more requests.  If you profile the application you will see the application is always asleep waiting for a response or waiting to get a connection from the pool.

You can clearly see this scalability wall in the YSCB graphs I pointed to before:
   http://www.allanbank.com/mongodb-async-driver/performance/ycsb.html
   e.g., http://www.allanbank.com/mongodb-async-driver/images/ycsb/YCSB-2012-09_workload_a_throughput.jpg

Note that the 10gen (legacy) driver's throughput (black bars) fairly quickly reaches a maximum for each of the connection counts.  If you need more throughput you have to increase both the thread count _and_ the connection count using the 10gen provided drivers.  If you have a small application deployment this scales fairly well.  If you need this to scale across even a medium size cluster serving a number of clients the connection explosion can start to cause scheduler and other problems.

The other option is to find a driver that doesn't block the connection waiting for the response.  The only production ready version I know of is the asynchronous Java driver my team created.  You can see in the chart above it scales fairly well[1] and tops out at ~13K operations/second with a "SAFE" write concern.  Much better than the 2K you are seeing.

I encourage you to give it a try. Feel free to ask questions directly to me or on this group.

If you are not using Java you can try increasing the thread and connection count and see if that gets you where you need to be.

Rob.

[1] Remember this was a benchmark so the client threads are doing no real work so I would expect for a real workload the scaling to be even better.

Hanson Lu

unread,
Oct 9, 2012, 5:10:56 AM10/9/12
to mongod...@googlegroups.com
Rob 
   Thanks. I will try it as your suggest.

Hanson Lu 

Hanson Lu

unread,
Oct 9, 2012, 8:44:59 AM10/9/12
to mongod...@googlegroups.com
Hi Rob.
  I have tried with your driver, the most fast speed i saw is ~11k writes, with 200 threads.
  The strange thing is the performance is achieved when max connecttion count is set to 1(mongo.getConfig().setMaxConnectionCount(1)) 
   if i  change the max connection count to 1000, the performance is about ~6k, slower than max connection count 1.
   
Hanson 




On Tuesday, October 9, 2012 11:01:21 AM UTC+8, Rob Moore wrote:

Rob Moore

unread,
Oct 9, 2012, 8:38:28 PM10/9/12
to mongod...@googlegroups.com


On Tuesday, October 9, 2012 8:44:59 AM UTC-4, Hanson Lu wrote:

  I have tried with your driver, the most fast speed i saw is ~11k writes, with 200 threads.
  The strange thing is the performance is achieved when max connecttion count is set to 1(mongo.getConfig().setMaxConnectionCount(1)) 
   if i  change the max connection count to 1000, the performance is about ~6k, slower than max connection count 1.

Hanson,

Without knowing the specifics I'm not surprised.  1000 is a lot of connections.  Part of the goal with the driver was to keep the connection count low[1].  When sending a request it does a linear search for an idle connection to try and spread the requests as evenly as possible.  That search, with 1000 connections, is probably causing the slow down.  Sitting here, there are probably some things that can be done to reduce the over head.  I'll get the team to look into some of those.

Suggestion to get the very best throughput:
  • For a application that is not doing much processing and is feeding the driver to capacity I would keep the thread to connection ratio around 10.  e.g, 20 threads ==> 2 connections.
  • Cut the number of threads down, if possible.  The driver ties to keep the running threads running and not blocked.  If it works then the extra thread contention is hurting performance.
  • If you are willing to trade some CPU for shorter latency switch the lock type:
    • mongo.getConfig().setLockType(LockType.LOW_LATENCY_SPIN);
    • This must be done before a connection is created to MongoDB (query, command, list, etc.), only connections created after setting the lock type will use the new lock type.

Your now able to push 11K inserts instead of 2.5K.  11K is in the ballpark of what you saw with the NONE write concern and no index using the legacy driver.  What is your goal? Can you provide some details on what you are actually doing so I can give more concrete suggestions?  Are you only ever going to insert documents?

Rob.

[1] My personal goal is to saturate a mongod instance with a single connection.  No one here thinks we can do it.  I like a good challenge.

mathgl

unread,
Oct 10, 2012, 3:51:35 AM10/10/12
to mongod...@googlegroups.com


Hi

    Due to mongodb still use per thread per request model so it's better to use connections sparingly.  I put a proxy based on twisted in front of my mongodb instances. All operations go to proxy firstly and will be dispatched smoothly. Because of the async nature of twisted, I don't need too much connections (I use 10 or 30).

    I hope one day mongodb team may implement async model.

Regards

gelin yan

Hanson Lu

unread,
Oct 10, 2012, 5:07:40 AM10/10/12
to mongod...@googlegroups.com
Hi Rob

I  just want to know the max insert performance of mongodb for my app.

 I made a test with 2 connection, different threads, the result is 
Thr   Insert ops/per seconds
1   ~400
2   ~700
5   ~2k
10  ~5k
20  ~8k
40  ~9k

The single thread performance is 400 ops/sec,  is it normal.
Hanson

Osmar Olivo

unread,
Oct 10, 2012, 5:19:21 PM10/10/12
to mongod...@googlegroups.com
ops/sec is very dependent on your hardware, network, and disk. I wouldn't be able to tell you if that is normal or not as the numbers by themselves don't really mean anything with the context of your machine. Though I can tell you the general trend you are seeing with increasing threads seems correct.
Reply all
Reply to author
Forward
0 new messages