mongoimport large data sets

359 views
Skip to first unread message

J Singh

unread,
Aug 18, 2012, 12:50:25 PM8/18/12
to mongod...@googlegroups.com
I have to import a CSV file containing a couple of million rows. OK, it's large but not that large.

It starts out well enough but slows to a crawl as it progresses. Is there a different way to import a data sets this size?

Currently I am using:
mongoimport -d mydb -c myColl --file abc.csv --headerline --type csv

Thanks.

markh

unread,
Aug 20, 2012, 5:53:28 AM8/20/12
to mongod...@googlegroups.com
Hi,

Where did the file come from, another mongo instance or another db?

What server version is this?

If you're running Linux, what does "iostat -xm 2" show during the import?

It's generally better to use "mongorestore" but I appreciate that that's not possible with .csv files.

Mark

J Singh

unread,
Aug 20, 2012, 6:16:12 AM8/20/12
to mongod...@googlegroups.com
MongoDB shell version: 2.0.7 running on ubuntu 11.04.

The file was part of data I am analyzing. I don't have access to the original source, only the extracted CSV files. The server is under-powered, I know that, but still curious about the the slowing-down behavior as the import proceeds. The data is on the mounted device xvdf.

iostat, run while import was running, shows this (excerpted)
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          46.22    0.00   12.89   20.89   18.67    1.33

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvdap1            0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
xvdf              0.00     0.00   35.56  111.11     1.36     4.63    83.68     5.71   38.91    9.75   48.24   1.42  20.89

Thanks in advance, Mark.

markh

unread,
Aug 20, 2012, 7:54:22 AM8/20/12
to mongod...@googlegroups.com
Hi,

Looking the "avg-cpu" row, the CPU is pretty pegged sitting at 1.33% idle and the device overall seems to be heavily loaded.

Do you have any indexes? 

It's possible that pre-allocation is also having an effect as larger files will be generated until the size hits 2gb. The files are filled with zero bytes and the initialisation can have an effect, especially on a lower-spec system.

Thanks

Mark

J Singh

unread,
Aug 20, 2012, 9:08:11 AM8/20/12
to mongod...@googlegroups.com
Good observations. No indexes except any that may be getting generated automatically during mongoimport.

So you're saying the slowdown during import is probably due to the pre-initialization? Makes sense. But since that is already done, hopefully things will be better going forward. Also, we will plan to use a faster machine when this project really gets going.

There doesn't seem to be a way to pre-allocate a bunch of disk space before one starts an import. Might be a good practice for us to follow if such a way existed.

Thanks again.
Reply all
Reply to author
Forward
0 new messages