mongoimport very slow

1,231 views
Skip to first unread message

Billy Kimble

unread,
Sep 15, 2011, 4:04:40 PM9/15/11
to mongodb-user
Hi all -- we're trying to import 3 million rows of test data in to
mongo (1.8.2). Each line in the JSON fie looks very simliar to this:

{ "_id" : { "$oid" : "4e711ac4d413190fcc0000c2" }, "popularids" : [],
"listing_ids" : [], "email_optin" : true, "sms_age_verification" :
null, "created_at" : { "$date" : 1316035268000 }, "sms_alert" : false,
"updated_at" : { "$date" : 1316035268000 }, "categoryids" : [ 5118494,
5118936, 5118495, 5118487, 5118492, 5118924, 5118993, 5118498,
5118497, 5118521 ], "email_processed" : false, "mobile_number" : null,
"email_alert" : true, "last_name" : null, "storeid" : 2395871,
"email_age_verification" : true, "remind_time" : null, "sms_optin" :
true, "brandids" : [ -2099999884, -2099997271, -2099998575 ],
"email" : "te...@test.com", "newsletter_signup" : true, "first_name" :
null, "remind_timezone" : null, "sms_processed" : false }

We import the file via:

mongoimport -d (database) -c performance_test --file test_data.json

And we're seeing between 8 and 20 records per second for the import!

This collection has no indexes on it:

.getIndexes()
[ ]
staging:PRIMARY>

and mongostat doesn't seem like anything funny is going on:


insert query update delete getmore command flushes mapped
vsize res faults locked % idx miss % qr|qw ar|aw netIn
netOut conn set repl time
*0 *0 *0 *0 0 1|0 0 14.2g
28.5g 1.07g 0 0 0 0|0 0|0 62b
1k 5 staging SEC 19:51:37
2 5 0 0 0 2 0 14.2g
29g 373m 0 0 0 5|1 9|1 1k
1k 29 staging M 19:51:37

*0 *0 *0 *0 0 3|0 0 14.2g
28.5g 1.07g 0 0 0 0|0 0|0 326b
1k 5 staging SEC 19:51:38
0 0 0 0 0 2 0 14.2g
29g 373m 0 0 0 7|1 8|1 431b
1k 29 staging M 19:51:38

*0 *0 *0 *0 0 1|0 0 14.2g
28.5g 1.07g 0 0 0 0|0 0|0 62b
1k 5 staging SEC 19:51:39
1 1 0 0 0 2 0 14.2g
29g 373m 0 0 0 2|1 4|1 1k
1k 29 staging M 19:51:39

*0 *0 *0 *0 0 3|0 0 14.2g
28.5g 1.07g 0 0 0 0|0 0|0 326b
1k 5 staging SEC 19:51:40
4 8 0 0 0 2 0 14.2g
29g 373m 0 0.1 0 4|1 5|1 3k
1k 29 staging M 19:51:40

*0 *0 *0 *0 0 1|0 0 14.2g
28.5g 1.07g 0 0 0 0|0 0|0 62b
1k 5 staging SEC 19:51:41
1 8 0 0 0 2 0 14.2g
29g 373m 0 0 0 6|1 8|1 1k
1k 29 staging M 19:51:41

insert query update delete getmore command
flushes mapped vsize res faults locked % idx miss % qr|qw ar|
aw netIn netOut conn set repl time
*0 *0 *0 *0 0 3|0 0 14.2g
28.5g 1.07g 0 0 0 0|0 0|0 326b
1k 5 staging SEC 19:51:42
0 0 0 0 0 2 0 14.2g
29g 373m 0 0 0 7|1 8|1 194b
1k 29 staging M 19:51:42

*0 *0 *0 *0 0 2|0 0 14.2g
28.5g 1.07g 0 0 0 0|0 0|0 194b
1k 5 staging SEC 19:51:43
3 1 0 0 0 2 0 14.2g
29g 373m 0 0 0 7|1 7|1 2k
1k 29 staging M 19:51


What could be causing this dramatic of a slow down? I am used to
20-50k per second. Thanks in avance!

Billy Kimble

unread,
Sep 15, 2011, 4:28:59 PM9/15/11
to mongodb-user
Here is a more readable mongoimport:

https://gist.github.com/1220267

Scott Hernandez

unread,
Sep 15, 2011, 4:45:36 PM9/15/11
to mongod...@googlegroups.com
Where did this json come from?

Did it come from another mongo instance?

What does iostat -xm 2 show during the import?

What server version is this?

On Thu, Sep 15, 2011 at 1:28 PM, Billy Kimble <bass...@gmail.com> wrote:
> Here is a more readable mongoimport:
>
> https://gist.github.com/1220267
>

> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>
>

Billy Kimble

unread,
Sep 15, 2011, 4:52:22 PM9/15/11
to mongodb-user
Thanks for the quick response. In order:

1. Not sure how it was created. Another dev made it and he is out
until Monday. I assume he took one of the real documents from our
production collection and tweaked it for testing purposes.

2. Most likely yes

3. See below

4. Ubuntu 4.4.3

iostat output:

Linux 2.6.35.4-rscloud (staging-mongodb2) 09/15/11
_x86_64_ (4 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
3.05 0.00 1.70 0.03 0.04 95.17
xvdap1 0.00 8.73 0.33 2.03 0.01 0.04
46.76 0.01 5.29 1.59 0.37
xvdap2 0.00 0.00 0.00 0.00 0.00 0.00
51.42 0.00 22.07 19.95 0.00
avg-cpu: %user %nice %system %iowait %steal %idle
14.83 0.00 2.00 0.00 0.15 83.02
xvdap1 0.00 10.00 0.00 2.00 0.00 0.05
48.00 0.00 0.00 0.00 0.00
xvdap2 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00
avg-cpu: %user %nice %system %iowait %steal %idle
14.08 0.00 1.99 0.00 0.07 83.85
xvdap1 0.00 8.50 0.00 1.50 0.00 0.04
53.33 0.01 10.00 3.33 0.50
xvdap2 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00
avg-cpu: %user %nice %system %iowait %steal %idle
14.92 0.00 1.99 0.00 0.08 83.01
xvdap1 0.00 7.50 0.00 2.00 0.00 0.04
38.00 0.00 0.00 0.00 0.00
xvdap2 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00
avg-cpu: %user %nice %system %iowait %steal %idle
14.04 0.00 1.91 0.00 0.14 83.90
xvdap1 0.00 11.50 0.00 3.50 0.00 0.06
34.29 0.00 0.00 0.00 0.00
xvdap2 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00
avg-cpu: %user %nice %system %iowait %steal %idle
14.30 0.00 2.24 0.00 0.29 83.18
xvdap1 0.00 0.00 0.00 0.50 0.00 0.00
8.00 0.01 10.00 10.00 0.50
xvdap2 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00
avg-cpu: %user %nice %system %iowait %steal %idle
14.68 0.00 1.83 0.07 0.07 83.35
xvdap1 0.00 5.50 0.00 1.50 0.00 0.03
37.33 0.01 6.67 3.33 0.50
xvdap2 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00


On Sep 15, 1:45 pm, Scott Hernandez <scotthernan...@gmail.com> wrote:
> Where did this json come from?
>
> Did it come from another mongo instance?
>
> What does iostat -xm 2 show during the import?
>
> What server version is this?
>

Billy Kimble

unread,
Sep 15, 2011, 4:55:19 PM9/15/11
to mongodb-user
Sorry, #4 should be Ubuntu 10.04 LTS. Mis-pasted!

Scott Hernandez

unread,
Sep 15, 2011, 5:52:32 PM9/15/11
to mongod...@googlegroups.com
Generally it is best to use mongodump/restore; both utilities are
single threaded so it could be a client issue.

Can you split the files and try running a few in parallel?

Billy Kimble

unread,
Sep 15, 2011, 6:06:05 PM9/15/11
to mongodb-user
I split the big file in to files of 100k lines, and I ran one file
locally and one on staging. Locally performed like a champ but staging
was still slow as hell:

local:

mongoimport -d performance -c load_test --drop --file foo.json
connected to: 127.0.0.1
dropping: performance.load_test
12622400/68600000 18%
18400 6133/second
39239200/68600000 57%
57200 9533/second
65170000/68600000 95%
95000 10555/second
imported 100000 objects


staging:

mongoimport -d performance -c load_test --drop --file foo.json
connected to: 127.0.0.1
dropping: performance.load_test
1646400/68600000 2%
2400 600/second
4870600/68600000 7%
7100 142/second
^C


Our staging setup is Master-Slave with an Arbiter. Could that be
causing the obscene slowdown?

On Sep 15, 2:52 pm, Scott Hernandez <scotthernan...@gmail.com> wrote:
> Generally it is best to use mongodump/restore; both utilities are
> single threaded so it could be a client issue.
>
> Can you split the files and try running a few in parallel?
>

Scott Hernandez

unread,
Sep 15, 2011, 6:21:00 PM9/15/11
to mongod...@googlegroups.com

No. Writes do not wait for replication.

I'd suggest looking at the hardware/os-config on your staging server.
There must be something there.

Can you post db.serverStatus() for the staging machine?

Billy Kimble

unread,
Sep 15, 2011, 6:45:25 PM9/15/11
to mongodb-user
Sure thing :


staging:PRIMARY> db.serverStatus()
{
"host" : "staging-mongodb2",
"version" : "1.8.2",
"process" : "mongod",
"uptime" : 1002638,
"uptimeEstimate" : 990068,
"localTime" : ISODate("2011-09-15T22:43:31.504Z"),
"globalLock" : {
"totalTime" : 1002638078848,
"lockTime" : 255674627,
"ratio" : 0.00025500191185014854,
"currentQueue" : {
"total" : 0,
"readers" : 0,
"writers" : 0
},
"activeClients" : {
"total" : 7,
"readers" : 7,
"writers" : 0
}
},
"mem" : {
"bits" : 64,
"resident" : 366,
"virtual" : 29821,
"supported" : true,
"mapped" : 14570
},
"connections" : {
"current" : 26,
"available" : 15974
},
"extra_info" : {
"note" : "fields vary by platform",
"heap_usage_bytes" : 53822768,
"page_faults" : 23576
},
"indexCounters" : {
"btree" : {
"accesses" : 12294,
"hits" : 12294,
"misses" : 0,
"resets" : 0,
"missRatio" : 0
}
},
"backgroundFlushing" : {
"flushes" : 16710,
"total_ms" : 90845,
"average_ms" : 5.436564931178935,
"last_ms" : 0,
"last_finished" : ISODate("2011-09-15T22:42:57.080Z")
},
"cursors" : {
"totalOpen" : 1,
"clientCursors_size" : 1,
"timedOut" : 1
},
"network" : {
"bytesIn" : 2271718645,
"bytesOut" : 11893085856,
"numRequests" : 6664418
},
"repl" : {
"setName" : "staging",
"ismaster" : true,
"secondary" : false,
"hosts" : [
"xxx.19:27017",
"xxx.17:27017"
],
"arbiters" : [
"xxx.11:27017"
]
},
"opcounters" : {
"insert" : 278888,
"query" : 4973995,
"update" : 18855,
"delete" : 3941,
"getmore" : 346287,
"command" : 1038078
},
"asserts" : {
"regular" : 2,
"warning" : 141,
"msg" : 0,
"user" : 4824,
"rollovers" : 0
},
"writeBacksQueued" : false,
"dur" : {
"commits" : 0,
"journaledMB" : 0,
"writeToDataFilesMB" : 0,
"commitsInWriteLock" : 0,
"earlyCommits" : 0,
"timeMs" : {
"dt" : 3007,
"prepLogBuffer" : 0,
"writeToJournal" : 0,
"writeToDataFiles" : 0,
"remapPrivateView" : 0
}
},
"ok" : 1
}


On Sep 15, 3:21 pm, Scott Hernandez <scotthernan...@gmail.com> wrote:

Billy Kimble

unread,
Sep 15, 2011, 7:07:18 PM9/15/11
to mongodb-user
I put the file on our dev envirionment which is also Master-Slave-
Arbiter and it also took a very long time to import. I then put it on
our production server which is Master-Slave-Slave with each server
being an arbiter as well, and it went through successfully and
quickly.

Something about the mongo setup on Dev/Staging is causing mongoimport
to go very very slow.
> ...
>
> read more »

Riyad

unread,
Sep 16, 2011, 8:46:12 AM9/16/11
to mongod...@googlegroups.com
Bill, you might try and do some basic I/O tests on the slow server and see if there is something odd about that machine in particular. Maybe a quick 'hdparm -t /dev/XXX' would do the trick?

I also didn't see who your host is, is all of this on AWS? What is your disk setup that is hosting the data file for mongo? Does the Dev/Staging server have a different flushing/journaling setup than the faster servers?

All things to check I suppose.

Billy Kimble

unread,
Sep 16, 2011, 3:02:21 PM9/16/11
to mongodb-user
Riyad -

Thanks for the reply. The hardware should be nearly identical between
environments. It's all Rackspace cloud. Our staging and Prod are sized
identically (16gig servers - the top end) and our dev is a 2gig
server. Here is the output for hdparm:

Prod: Timing buffered disk reads: 686 MB in 3.01 seconds = 228.23 MB/
sec
Stage: Timing buffered disk reads: 704 MB in 3.01 seconds = 234.09
MB/sec
Dev: Timing buffered disk reads: 820 MB in 3.01 seconds = 272.30 MB/
sec

Our mongo configurations are identical between the servers except for
the aforementioned arbiter being set up in dev/staging. All
mongodb.conf files look like:

# mongodb.conf
# http://www.mongodb.org/display/DOCS/File+Based+Configuration

# Where to store the data.

# Note: if you run mongodb as a non-root user (recommended) you may
# need to create and set permissions for this directory manually,
# e.g., if the parent directory isn't mutable by the mongodb user.
dbpath = /var/lib/mongodb

# Where to log
logpath = /var/log/mongodb/mongodb.log

# Whether the log file will be appended (true) or over-written (false)
logappend = true

# In replica set configuration, specify the name of the replica set
replSet = (environment name here)

# Turn on simple rest API
rest = true

# Enables periodic logging of CPU utilization and I/O wait
cpu = true

# Turn on/off security. Off is currently the default
# noauth = true
# auth = true

# Verbose logging output (same as v = true)
verbose = false

# Whether writes will be journaled or not. This feature enables fast
recovery
# from crashes





What is odd though is I thought each of the prod servers were also
arbiters, but the replset property in db.serverStatus() does not
reflect that:

staging:

"repl" : {
"setName" : "staging", "ismaster" : true, "secondary" :
false,
"hosts" : [
"xxx.19:27017",
"xxx.17:27017"
],
"arbiters" : [
"xxx.11:27017"
]
}


dev:

"repl" : {
"setName" : "dev", "ismaster" : true, "secondary" : false,
"hosts" : [
"xxx.145:27017",
"xxx.240:27017"
],
"arbiters" : [
"xxx.27:27017"
]
}

production:

"repl" : {
"setName" : "production", "ismaster" : true, "secondary" :
false,
"hosts" : [
"xxx.162",
"xxx.163",
"xxx.168"
]
}



On Sep 16, 5:46 am, Riyad <rka...@gmail.com> wrote:
> Bill, you might try and do some basic I/O tests on the slow server and see
> if there is something odd about that machine in particular. Maybe a quick 'hdparm
> -t /dev/XXX <http://www.linuxinsight.com/how_fast_is_your_disk.html>' would

Billy Kimble

unread,
Sep 16, 2011, 5:34:55 PM9/16/11
to mongodb-user
So I dropped the arbiter on the dev repllica set configuration, and
the import ran fast once again -- score!.

I tried putting the arbiter back via rs.addArb() from the master, and
I got:

dev:PRIMARY> rs.addArb("xxx.27:27017")
Fri Sep 16 21:22:15 DBClientCursor::init call() failed
Fri Sep 16 21:22:15 query failed : admin.$cmd { replSetReconfig:
{ _id: "dev", version: 8, members: [ { _id: 0, host: "xxx.
240:27017" }, { _id: 1, host: "xxx.145:27017" }, { _id: 2.0, host:
"xxx.27:27017", arbiterOnly: true } ] } } to: 127.0.0.1
shell got exception during reconfig: Error: error doing query: failed
in some circumstances, the primary steps down and closes connections
on a reconfig
null
Fri Sep 16 21:22:15 trying reconnect to 127.0.0.1
Fri Sep 16 21:22:15 reconnect 127.0.0.1 ok

Though when I do rs.status(), it shows the third server added.

When I go to the arbiter and do:

> rs.status()

I get:

{
"startupStatus" : 2,
"errmsg" : "replSet error loading set config (BADCONFIG)",
"ok" : 0
}

I've quadruple checked my configuration file and everything is
identical to the master/slave node with the exception of there is a
fork=true and rest=false in there.

It seems like something is amiss with the arbiter. Thanks for your
help all - I'm going to turn this over to our sysadmin.

On Sep 16, 12:02 pm, Billy Kimble <bassli...@gmail.com> wrote:
> Riyad -
>
> Thanks for the reply. The hardware should be nearly identical between
> environments. It's all Rackspace cloud. Our staging and Prod are sized
> identically (16gig servers - the top end) and our dev is a 2gig
> server. Here is the output for hdparm:
>
> Prod: Timing buffered disk reads:  686 MB in  3.01 seconds = 228.23 MB/
> sec
> Stage:  Timing buffered disk reads:  704 MB in  3.01 seconds = 234.09
> MB/sec
> Dev:  Timing buffered disk reads:  820 MB in  3.01 seconds = 272.30 MB/
> sec
>
> Our mongo configurations are identical between the servers except for
> the aforementioned arbiter being set up in dev/staging. All
> mongodb.conf files look like:
>
> # mongodb.conf
> #http://www.mongodb.org/display/DOCS/File+Based+Configuration

Scott Hernandez

unread,
Sep 21, 2011, 5:43:20 PM9/21/11
to mongod...@googlegroups.com
Did you get your configuration fixed? If not, what does rs.conf() show
on all replicas?

The arbiter has no role in reads or writes and is just a voting member
of the replica set.

> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.

> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>
>

Reply all
Reply to author
Forward
0 new messages