How I learned to stop worrying and love YCSB, part 1

Asya Kamsky

unread,

Mar 17, 2015, 1:57:42 PM3/17/15

to mongodb-user

Hi guys,

We just published the first of several planned blog posts about
MongoDB 3.0 and various improvements in it. The first part is live
here:

http://www.mongodb.com/blog/post/performance-testing-mongodb-30-part-1-throughput-improvements-measured-ycsb

Much more to come. This is basically what I've been working on for
the last few *cough*mumbles*cough*. Let me know if you have any
questions - though many will be answered by the later posts in the
series.

Asya

--
MongoDB World is back! June 1-2 in NYC. Use code ASYA for 25% off!

s.molinari

unread,

Mar 18, 2015, 2:43:31 AM3/18/15

to mongod...@googlegroups.com

Thanks Asya. An interesting read. Looks like WiredTiger is simply a great storage engine and catapults Mongo's performance considerably for the majority of use cases.

I understand the goal of the whole benchmark exercise, but what I'd personally like to see are benchmarks with a more complicated data and index landscape with much more extensive querying. I would think 1 field documents are a non-existent use case.;-)

Scott

MARK CALLAGHAN

unread,

Mar 19, 2015, 10:43:44 AM3/19/15

to mongod...@googlegroups.com

Asya,

The results and the writeup are excellent.

Do you have any advice on micro-benchmark tools -- write my own, use mongo-perf, etc? Writing my own means I have to choose the programming language - use Python and get the code written fast but maybe the client will be the bottleneck, use Java and have great docs for the client library but worry about GC stalls as the source for lousy p99 response time, or use C/C++.

--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/6bf6d19d-6289-4e5c-b73b-a227d3af4acd%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

Mark Callaghan
mdca...@gmail.com

Asya Kamsky

unread,

Mar 20, 2015, 1:33:35 AM3/20/15

to mongod...@googlegroups.com

Hi Mark,

Great question.

I used to use benchRun a lot - it's a builtin that comes with the mongo shell - a function that allows you to send a bunch of operations to the server including specifying how long it should run, and how many threads should be doing the work. It has some limited support for random number and random string generation and of course it's all C++ as it's part of the server.

Where it's limited is (a) reporting (you get very coarse grained metrics like ops/sec and avg. latency which is not very useful as I want 95/99, etc) and (b) generating "interesting" data.

Of course if you want microbenchmark, you're probably not that interested in "complex" or "interesting" data :)

Asya

To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/CAFbpF8NieWmSyRrXKwRXevmn-g0%3DzbtQ6ni2qck4tuCNnt_%3DLg%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

--

Asya Kamsky

unread,

Mar 20, 2015, 1:37:03 AM3/20/15

to mongod...@googlegroups.com

Hi Scott:

I will cover a lot more use cases - there are a couple of very specific reasons that I had to use a single field (I suppose I could have made it larger, but I wanted to make sure that IO wouldn't immediately become a bottleneck on the writes as that might amplify the disadvantage of MMAP if it only impacted uncompressed data, and it would minimize the difference if it was so limited that it impacted all test configurations) :)

There will be some future blog post specifically addressing some limitations of YCSB (or how it tends to get used) that will probably clarify some of the reasoning.

Asya

--

You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/6bf6d19d-6289-4e5c-b73b-a227d3af4acd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

s.molinari

unread,

Mar 20, 2015, 9:44:34 AM3/20/15

to mongod...@googlegroups.com

Thanks Asya. As always, I love it when you share your knowledge like you do. Looking forward to more blogs. :-)

Scott

Milind Shah

unread,

Jun 23, 2015, 7:24:09 PM6/23/15

to mongod...@googlegroups.com

Excellent post, Asya. Do you have any more insights/numbers on a distributed (sharded) cluster benchmarks, which also have replica sets?

I am currently in a planning phase to run YCSB benchmark on a MongoDB cluster. I have 11 machines to run this benchmark, each with following configurations:

RAM: 128GB

Disks: 8 disks, 7200 RPM, 1TB - in a RAID 0 array

CPU: Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz - 16 physical cores, 32 HT cores

Network: 1x10gbe

OS: CentOS 6.5

JVM: 1.7.0_67

I am thinking about deploying a 10 node sharded cluster with replica sets (1 primary, 2 secondary) and planning to keep 1 node for config servers. I am planning to run YCSB workloads in following order:

1) Load 2 TB of data, all unique keys.
2) Run zipfian get, read all fields
3) Run range scan
4) Run mixed workload of 95% of get and 5% of update operations
5) Run mixed workload of 50% of get and 50% of update operations.

I am going to use the default 1K row size, with 10 fields/columns with 100 bytes of payload and the sharding will be done based on the key which will be in format user+'long int'.

As per mongodb documentation, http://docs.mongodb.org/manual/tutorial/deploy-replica-set/ and http://docs.mongodb.org/manual/tutorial/deploy-shard-cluster/, I found that defining replica set and sharding adds a manual overhead during the setup.

I would appreciate if you could provide a guidance on what is the best way to setup the MongoDB cluster for this exercise and if you have done similar testing in-house, what type of numbers (throughput/latency) should I expect as an outcome?

Thanks in advance.

Milind

Reply all

Reply to author

Forward