Mongodb in production and working great

53 views
Skip to first unread message

tony tam

unread,
Apr 2, 2010, 2:44:50 PM4/2/10
to mongodb-user
A big thank you to the 10gen folks--we have moved the entire text
corpus of http://www.wordnik.com into MongoDB version 1.4 and pushed
it live this week. That's 1.2TB of data in over 5 billion records,
and the speed to query the corpus has been cut to 1/4 the time prior
to the migration.

We use the java driver and are really happy with how this has gone.
Thank you all again!

Tony

Eliot Horowitz

unread,
Apr 2, 2010, 3:16:11 PM4/2/10
to mongod...@googlegroups.com
Great!
Can we add you to http://www.mongodb.org/display/DOCS/Production+Deployments

> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>
>

tony tam

unread,
Apr 2, 2010, 3:17:29 PM4/2/10
to mongodb-user
Absolutely, thank you for asking

On Apr 2, 12:16 pm, Eliot Horowitz <eliothorow...@gmail.com> wrote:
> Great!

> Can we add you tohttp://www.mongodb.org/display/DOCS/Production+Deployments


>
>
>
> On Fri, Apr 2, 2010 at 2:44 PM, tony tam <feh...@gmail.com> wrote:
> > A big thank you to the 10gen folks--we have moved the entire text

> > corpus ofhttp://www.wordnik.cominto MongoDB version 1.4 and pushed

Ryan Angilly

unread,
Apr 2, 2010, 4:24:19 PM4/2/10
to mongod...@googlegroups.com
Hey Tony,

What did you migrate from?

Thank!
Ryan

tony tam

unread,
Apr 2, 2010, 4:32:14 PM4/2/10
to mongodb-user
Hi Ryan, was using Mysql 5.1 in master/slave mode for this data.

On Apr 2, 1:24 pm, Ryan Angilly <r...@angilly.com> wrote:
> Hey Tony,
>
> What did you migrate from?
>
> Thank!
> Ryan
>
>
>
> On Fri, Apr 2, 2010 at 3:17 PM, tony tam <feh...@gmail.com> wrote:
> > Absolutely, thank you for asking
>
> > On Apr 2, 12:16 pm, Eliot Horowitz <eliothorow...@gmail.com> wrote:
> > > Great!
> > > Can we add you tohttp://
> >www.mongodb.org/display/DOCS/Production+Deployments
>
> > > On Fri, Apr 2, 2010 at 2:44 PM, tony tam <feh...@gmail.com> wrote:
> > > > A big thank you to the 10gen folks--we have moved the entire text

> > > > corpus ofhttp://www.wordnik.comintoMongoDB version 1.4 and pushed


> > > > it live this week.  That's 1.2TB of data in over 5 billion records,
> > > > and the speed to query the corpus has been cut to 1/4 the time prior
> > > > to the migration.
>
> > > > We use the java driver and are really happy with how this has gone.
> > > > Thank you all again!
>
> > > > Tony
>
> > > > --
> > > > You received this message because you are subscribed to the Google
> > Groups "mongodb-user" group.
> > > > To post to this group, send email to mongod...@googlegroups.com.
> > > > To unsubscribe from this group, send email to

> > mongodb-user...@googlegroups.com<mongodb-user%2Bunsubscribe@google groups.com>


> > .
> > > > For more options, visit this group athttp://
> > groups.google.com/group/mongodb-user?hl=en.
>
> > --
> > You received this message because you are subscribed to the Google Groups
> > "mongodb-user" group.
> > To post to this group, send email to mongod...@googlegroups.com.
> > To unsubscribe from this group, send email to

> > mongodb-user...@googlegroups.com<mongodb-user%2Bunsubscribe@google groups.com>

Grig Gheorghiu

unread,
Apr 2, 2010, 7:03:08 PM4/2/10
to mongod...@googlegroups.com
Tony -- congrats! Would you care to go into some detail regarding your architecture? Things such as how beefy is your master server, have you used sharding (manual or auto), have you split writes to the master and reads to the slaves, etc....

Thanks,

Grig
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.

tony tam

unread,
Apr 2, 2010, 11:50:13 PM4/2/10
to mongodb-user
Hi Grig, would be happy to. I'll post something in the next couple
days.

Tony

On Apr 2, 4:03 pm, "Grig Gheorghiu" <grig.gheorg...@gmail.com> wrote:
> Tony -- congrats! Would you care to go into some detail regarding your architecture? Things such as how beefy is your master server, have you used sharding (manual or auto), have you split writes to the master and reads to the slaves, etc....
>
> Thanks,
>
> Grig
>
>
>
> -----Original Message-----
> From: tony tam <feh...@gmail.com>
> Date: Fri, 2 Apr 2010 13:32:14
> To: mongodb-user<mongod...@googlegroups.com>
> Subject: [mongodb-user] Re: Mongodb in production and working great
>
> Hi Ryan, was using Mysql 5.1 in master/slave mode for this data.
>
> On Apr 2, 1:24 pm, Ryan Angilly <r...@angilly.com> wrote:
> > Hey Tony,
>
> > What did you migrate from?
>
> > Thank!
> > Ryan
>
> > On Fri, Apr 2, 2010 at 3:17 PM, tony tam <feh...@gmail.com> wrote:
> > > Absolutely, thank you for asking
>
> > > On Apr 2, 12:16 pm, Eliot Horowitz <eliothorow...@gmail.com> wrote:
> > > > Great!
> > > > Can we add you tohttp://
> > >www.mongodb.org/display/DOCS/Production+Deployments
>
> > > > On Fri, Apr 2, 2010 at 2:44 PM, tony tam <feh...@gmail.com> wrote:
> > > > > A big thank you to the 10gen folks--we have moved the entire text

> > > > > corpus ofhttp://www.wordnik.comintoMongoDBversion 1.4 and pushed

Andrew Kalek

unread,
Apr 3, 2010, 9:17:05 AM4/3/10
to mongodb-user
How or What did you use to migrate your data?

On Apr 2, 4:32 pm, tony tam <feh...@gmail.com> wrote:
> Hi Ryan, was using Mysql 5.1 in master/slave mode for this data.
>
> On Apr 2, 1:24 pm, Ryan Angilly <r...@angilly.com> wrote:
>
> > Hey Tony,
>
> > What did you migrate from?
>
> > Thank!
> > Ryan
>
> > On Fri, Apr 2, 2010 at 3:17 PM, tony tam <feh...@gmail.com> wrote:
> > > Absolutely, thank you for asking
>
> > > On Apr 2, 12:16 pm, Eliot Horowitz <eliothorow...@gmail.com> wrote:
> > > > Great!
> > > > Can we add you tohttp://
> > >www.mongodb.org/display/DOCS/Production+Deployments
>
> > > > On Fri, Apr 2, 2010 at 2:44 PM, tony tam <feh...@gmail.com> wrote:
> > > > > A big thank you to the 10gen folks--we have moved the entire text

> > > > > corpus ofhttp://www.wordnik.comintoMongoDBversion 1.4 and pushed

tony tam

unread,
Apr 3, 2010, 12:26:34 PM4/3/10
to mongodb-user
Hi Andrew--migrating data ended up being a cakewalk for our scenario.
There were two types of data structures. One was essentially mirrored
between mysql and mongodb, the other is highly hierarchal and was
splattered across over a dozen mysql tables.

Getting the similarly structured data was easy--we had a data access
object written to grab the data in mysql. This is what we used in our
application server for production use so there was nothing to
rewrite. We simply wrote a new version of the dao to access mongodb,
both implemented the same interface. So read from mysql dao and
called save with mongodb dao. It was really that simple. Our dao is
pretty lightweight with some logic to do batch inserts--so a
background queue would fill up with objects and we called
DBCollection.insert(List<DBObject>) where the List<DBObject> was less
than 2M characters. This gave us an insert rate of 100,000 records
per second which over 4.5B records is still a chunk of time, but it
worked well and I really can't complain--I think most of the CPU was
on the Java side, not mongodb.

The hierarchal data was a joy to migrate, since this let us take
advantage of the document features supported in mongodb. We loaded
the data from mysql into our internal object structure, which happens
to be annotated so that it can be serialized into a REST web service.
We then used the jackson JSON mapper to turn the object into a JSON
string and called JSON.parse() to create a new DBObject and bam, it
went right into mongo.

There are some additional details to this process, like how to query
the documents but it really wasn't too complicated. I'm happy to
share more details if it helps you or others.

Tony

On Apr 3, 6:17 am, Andrew Kalek <andrew.ka...@gmail.com> wrote:
> How or What did you use to migrate your data?
>
> On Apr 2, 4:32 pm, tony tam <feh...@gmail.com> wrote:
>
>
>
> > Hi Ryan, was using Mysql 5.1 in master/slave mode for this data.
>
> > On Apr 2, 1:24 pm, Ryan Angilly <r...@angilly.com> wrote:
>
> > > Hey Tony,
>
> > > What did you migrate from?
>
> > > Thank!
> > > Ryan
>
> > > On Fri, Apr 2, 2010 at 3:17 PM, tony tam <feh...@gmail.com> wrote:
> > > > Absolutely, thank you for asking
>
> > > > On Apr 2, 12:16 pm, Eliot Horowitz <eliothorow...@gmail.com> wrote:
> > > > > Great!
> > > > > Can we add you tohttp://
> > > >www.mongodb.org/display/DOCS/Production+Deployments
>
> > > > > On Fri, Apr 2, 2010 at 2:44 PM, tony tam <feh...@gmail.com> wrote:
> > > > > > A big thank you to the 10gen folks--we have moved the entire text

> > > > > > corpus ofhttp://www.wordnik.comintoMongoDBversion1.4 and pushed

Joseph Wang

unread,
Apr 3, 2010, 12:53:08 PM4/3/10
to mongod...@googlegroups.com
Thanks for sharing such information.

To get 100,000 inserts/sec, do you run multiple insert/update threads
in Java? If so, how many connections/threads do you use? Is there
a data chunk size that would give high performance?

tony tam

unread,
Apr 3, 2010, 6:34:06 PM4/3/10
to mongodb-user
Hi Joseph, those numbers were with a single connection to the database
(one DB object) with the default connection threads, which I think is
10. We ran one program to migrate the data. I tested loading a
single document at a time vs. 2M characters in the insert statement--I
think the difference in speed between one record/insert and the max
insert statement was about 5x, which I think is most all tcp
overhead. I didn't tune much more than that, since we easily beat our
goal of 50k records a second. Java was running 100% CPU though, so it
probably could have loaded faster. During the loading we assigned
unique _id values as well.

We did another load into mongo using multiple clients--I think we ran
up to 30 clients. The loading process was a little different though,
there was a lot of data processing going on as records were loaded.
That had other issues (not related to mongo) due to how we generate
our _id field so we decided to run a single thread for the mass
migration.

Tony

tony tam

unread,
Apr 3, 2010, 6:45:44 PM4/3/10
to mongodb-user
Hi Grig, we use master/slave for the corpus data and will look into
replica pairs as we move the remainder of our data over to mongo. The
master server is a 2 x 4 core intel-based physical IBM blade server
with 32GB RAM and fiber-channel SAN storage, which is where all the
mongo data lives. Slaves are the same server class. We have sharded
the data manually across 100 partitions and use a simple home-grown
connection pooling layer which returns a singleton DB object for read/
write (master only), read-only (slave only) or any server (master,
slave via round-robin, with a 2x bias on the slave). We're looking at
bigger machines (64GB ram) with DAS storage.

I might also add--since I know people have a lot of interest in
running servers in the cloud, that the best performance I could get
out of an EC2 large instance was less than 1/5th that of this
machine. It was a very, very frustrating process trying to get the
EC2 VM to perform :(

Tony

On Apr 2, 4:03 pm, "Grig Gheorghiu" <grig.gheorg...@gmail.com> wrote:

> Tony -- congrats! Would you care to go into some detail regarding your architecture? Things such as how beefy is your master server, have you used sharding (manual or auto), have you split writes to the master and reads to the slaves, etc....
>
> Thanks,
>
> Grig
>
>
>
> -----Original Message-----
> From: tony tam <feh...@gmail.com>
> Date: Fri, 2 Apr 2010 13:32:14
> To: mongodb-user<mongod...@googlegroups.com>
> Subject: [mongodb-user] Re: Mongodb in production and working great
>
> Hi Ryan, was using Mysql 5.1 in master/slave mode for this data.
>
> On Apr 2, 1:24 pm, Ryan Angilly <r...@angilly.com> wrote:
> > Hey Tony,
>
> > What did you migrate from?
>
> > Thank!
> > Ryan
>
> > On Fri, Apr 2, 2010 at 3:17 PM, tony tam <feh...@gmail.com> wrote:
> > > Absolutely, thank you for asking
>
> > > On Apr 2, 12:16 pm, Eliot Horowitz <eliothorow...@gmail.com> wrote:
> > > > Great!
> > > > Can we add you tohttp://
> > >www.mongodb.org/display/DOCS/Production+Deployments
>
> > > > On Fri, Apr 2, 2010 at 2:44 PM, tony tam <feh...@gmail.com> wrote:
> > > > > A big thank you to the 10gen folks--we have moved the entire text

> > > > > corpus ofhttp://www.wordnik.comintoMongoDBversion 1.4 and pushed

Grig Gheorghiu

unread,
Apr 3, 2010, 6:50:19 PM4/3/10
to mongod...@googlegroups.com
Thanks, Tony, I appreciate your openness in furnishing all these
details. Great to know about your EC2 experience too.

Grig

Andrew Kalek

unread,
Apr 4, 2010, 11:53:39 AM4/4/10
to mongodb-user
Actually the reason I ask, is because I'm trying to build a ruby gem
that will migrate data for you via a translation file. So I was
wondering how you did it in case there was something I had to look out
for.
My project is still really young and not yet fully running.
http://github.com/anlek/mongify

> > > > > > > corpus ofhttp://www.wordnik.comintoMongoDBversion1.4and pushed

Reply all
Reply to author
Forward
0 new messages