steps to import 1 TB of JSON data into mongodb

KR

unread,

Aug 1, 2014, 6:31:08 AM8/1/14

to mongod...@googlegroups.com

Hi,

I have 456 MB of data in the JSOn format (validated by JSON lint). I tried with mongoimport in mongoddb shell, but even after 15 hr of run, it was not finished

mongoimport -d fastqdb -c seqcol --file /Users/loginname/FASTQ.json --jsonArray

connected to: 127.0.0.1
2014-07-29T18:06:54.311+0200        Progress: 31598/432568845   0%
2014-07-29T18:06:54.311+0200            200 66/second
2014-07-29T18:06:58.207+0200        Progress: 78998/432568845   0%
2014-07-29T18:06:58.208+0200            500 71/second
...........................
...............................
2014-07-30T13:42:32.027+0200        Progress: 138486966/432568845   32%
2014-07-30T13:42:32.027+0200            860200  12/second
2014-07-30T13:42:36.004+0200        Progress: 138534798/432568845   32%
2014-07-30T13:42:36.004+0200            860500  12/secon

 details of my PC Memory 4 GB 1067 MHz DDR3 Processor Name: Intel Core 2 Duo Processor Speed: 2.66 GHz

Tis is my first interaction with mongodb,

My question is, what are the exact steps to follow

what it is taking so long to import to data mongodb

Tyler Brock

unread,

Aug 1, 2014, 9:53:10 AM8/1/14

to mongod...@googlegroups.com

Have you created any indexes on that collection prior to running mongoimport? Having many indexes can significantly slow things down during data loads.

--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/8f776d15-5873-4d3d-8ce6-edac1a4b02dc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Tim Hawkins

unread,

Aug 1, 2014, 9:55:36 AM8/1/14

to mongod...@googlegroups.com

If you have indexes on the collections, then remove them before loading the data. Maintaing unique index whilst importing data of that size can eat a lot of memory, 1tb of data sitting in 4gb of memory means that if it has to access any significant number of index pages to add the data its going to eat all the memory very very fast.

Once the data is imported then add the indexes back in.

Asya Kamsky

unread,

Aug 1, 2014, 6:19:01 PM8/1/14

to mongodb-user

You are importing using mongoimport which is a single threaded process.

So some of the delays is simply lack of taking advantage of the fact that mongod can be dealing with thousands of connections and operations in parallel.

It looks like your documents are maybe quite tiny: you say you have 456MB of data but mongoimport shows 432 million documents. Are you sure that 456MB is the size of the import? Your subject says 1TB. Also, jsonArrays are usually limited to 16MB of data (MongoDB cannot parse more than that as a single document).

From mongoimport --help:

--jsonArray load a json array, not one item per

line. Currently limited to 16MB.

Can you double check exactly what is in the file you are loading and what is the exact command line you are running? Also, please include a sample document, and version of MongoDB as well as your OS and file system details.

I would also check output of mongostat and disk stats on your system - I suspect you are limited by page faulting, or worse (check iostat as well as mongostat).

Asya

Reply all

Reply to author

Forward