How to mange 20k inserts per second with mongodb cluster

2,126 views
Skip to first unread message

xiao mo

unread,
Sep 29, 2011, 10:44:44 PM9/29/11
to mongodb-user
I'm dealing with the storage of 20k documents per second data stream.
Average document size is about 400Bytes. Several keys are indexed.
Documents are never updated. DB stores a month's documents.
The system should be capable for a long term running

I have tried on an auto-sharding cluster with mongodb 2.0. The test
collection contains 80 million documents.
Document in the test collection is simplified to 200Bytes. Two keys
are indexed.
The shard key is uniformly distributed from 1 to 31
Insert client is coded in c++
A comparison between 2, 3, 4 and 9 node clusers is as follow

node Time consumed (minute)
2 1120
3 629
4 612
9 495

I have also tried turning off the balancer. The consumed time
decreases to 379 minutes.
It's still far from what I need.

Thanks in advance

Sergei Tulentsev

unread,
Sep 29, 2011, 11:05:30 PM9/29/11
to mongod...@googlegroups.com
MongoDB can do much better. Right now in my app it is performing at ~40k writes/sec. Although it is a standalone server with journaling (no cluster) and documents are somewhat smaller and there are no additional indexes.

I was wondering, how do you define "one month's worth of documents"? Is it rolling 30-day window or calendar month (that is, on Aug 1st you wipe data for June)? How do you delete old data?

How does your shard key look like? Did you pre-split and pre-move chunks?


--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.




--
Best regards,
Sergei Tulentsev

xiao mo

unread,
Sep 29, 2011, 11:46:44 PM9/29/11
to mongodb-user
I'm meaning a rolling 30-day window. I plan to split the collection by
day. That is, the collection's name will look like "doc20110930". The
earliest collection will be dropped.

The shard key is a number ranging from 1 to 31 in consideration that
the cluster will contain less than 31 nodes.
Pre-split and pre-move chunks makes a 3500 writes/sec in average.

The performance of your standalone server is interesting. My result is
2110 writes/sec in average. Additional Journaling made it further
slower. Even the peak speed is less than 20k/sec. What's your hardware
configuration?

On 9月30日, 上午11时05分, Sergei Tulentsev <sergei.tulent...@gmail.com>
wrote:

Nat

unread,
Sep 29, 2011, 11:49:13 PM9/29/11
to mongod...@googlegroups.com
Can you try with more unique shard key and presharding?

Sergei Tulentsev

unread,
Sep 30, 2011, 12:29:32 AM9/30/11
to mongod...@googlegroups.com
My server is not a bad one :-) 2 Xeons (16 cores), 48Gb RAM, 3 SATA disks in a RAID-0 array.

And just a thought: why don't you try hash of your data as a shard key. This should give you quite distributed writes and you won't run into problems with chunks which are unable to split (I think you have such problems with your current shard key).

xiao mo

unread,
Sep 30, 2011, 1:21:02 AM9/30/11
to mongodb-user
It's hard to find a well distributed unique key in my app,
unfortunately.
I'm trying manual sharding, hope it works :)

xiao mo

unread,
Sep 30, 2011, 1:31:40 AM9/30/11
to mongodb-user
Your server is much more powerful than mine :)

As you metioned, my sharding key will have that probelm. However, It's
not the final design. I will use compound sharding keys later.
I'm still doing feasibility study. Now I'm sure the data is well
distributed among the nodes, because the shard key evenly split the
whole collection. I want to know whether mongo can work.

On 9月30日, 下午12时29分, Sergei Tulentsev <sergei.tulent...@gmail.com>
wrote:

Sergei Tulentsev

unread,
Sep 30, 2011, 1:36:35 AM9/30/11
to mongod...@googlegroups.com
Now, what hardware do you have? :-)
Have you tried finding a bottleneck? Maybe your disks are saturated?

2011/9/30 xiao mo <mxaz...@gmail.com>

xiao mo

unread,
Sep 30, 2011, 2:03:51 AM9/30/11
to mongodb-user
I suppose the disk IO is a bottleneck.

Only SCSI HDD with no RAID.
one with 16Gb RAM, others are 4Gb
Not sure about CPU, one with Intel 8 cores, others are AMD 2 cores.



On 9月30日, 下午1时36分, Sergei Tulentsev <sergei.tulent...@gmail.com>
wrote:
> Now, what hardware do *you* have? :-)
> Have you tried finding a bottleneck? Maybe your disks are saturated?
>
> 2011/9/30 xiao mo <mxazka...@gmail.com>

Sergei Tulentsev

unread,
Sep 30, 2011, 2:07:42 AM9/30/11
to mongod...@googlegroups.com
Run iostat -x 2
What does it show?

2011/9/30 xiao mo <mxaz...@gmail.com>

xiao mo

unread,
Sep 30, 2011, 2:43:35 AM9/30/11
to mongodb-user
Oops, my redhat linux didn't install sysstat package. Can't use iostat
command
I'll check it later

Thank you for your help :)


On 9月30日, 下午2时07分, Sergei Tulentsev <sergei.tulent...@gmail.com>
wrote:
> Run iostat -x 2
> What does it show?
>
> 2011/9/30 xiao mo <mxazka...@gmail.com>
Reply all
Reply to author
Forward
0 new messages