steps to import 1 TB of JSON data into mongodb

370 views
Skip to first unread message

KR

unread,
Aug 1, 2014, 6:31:08 AM8/1/14
to mongod...@googlegroups.com
Hi,

         I have 456 MB of data in the JSOn format (validated by JSON lint). I tried with mongoimport in mongoddb shell, but even after 15 hr of run, it was not finished 


mongoimport -d fastqdb -c seqcol --file /Users/loginname/FASTQ.json --jsonArray
connected to: 127.0.0.1
2014-07-29T18:06:54.311+0200        Progress: 31598/432568845   0%
2014-07-29T18:06:54.311+0200            200 66/second
2014-07-29T18:06:58.207+0200        Progress: 78998/432568845   0%
2014-07-29T18:06:58.208+0200            500 71/second
...........................
...............................
2014-07-30T13:42:32.027+0200        Progress: 138486966/432568845   32%
2014-07-30T13:42:32.027+0200            860200  12/second
2014-07-30T13:42:36.004+0200        Progress: 138534798/432568845   32%
2014-07-30T13:42:36.004+0200            860500  12/secon
 details of my PC Memory 4 GB 1067 MHz DDR3 Processor Name: Intel Core 2 Duo Processor Speed: 2.66 GHz
Tis is my first interaction with mongodb, 

My question is, what are the exact steps to follow
what it is taking so long to import to data mongodb

Tyler Brock

unread,
Aug 1, 2014, 9:53:10 AM8/1/14
to mongod...@googlegroups.com
Have you created any indexes on that collection prior to running mongoimport? Having many indexes can significantly slow things down during data loads.
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
 
For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/8f776d15-5873-4d3d-8ce6-edac1a4b02dc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Tim Hawkins

unread,
Aug 1, 2014, 9:55:36 AM8/1/14
to mongod...@googlegroups.com

If you have indexes on the collections, then remove them before loading the data. Maintaing unique index whilst importing data of that size can eat a lot of memory, 1tb of data sitting in 4gb of memory means that if it has to access any significant number of index pages to add the data its going to eat all the memory very very fast.

Once the data is imported then add the indexes back in.

Asya Kamsky

unread,
Aug 1, 2014, 6:19:01 PM8/1/14
to mongodb-user
You are importing using mongoimport which is a single threaded process.

So some of the delays is simply lack of taking advantage of the fact that mongod can be dealing with thousands of connections and operations in parallel.

It looks like your documents are maybe quite tiny: you say you have 456MB of data but mongoimport shows 432 million documents.   Are you sure that 456MB is the size of the import? Your subject says 1TB.    Also, jsonArrays are usually limited to 16MB of data (MongoDB cannot parse more than that as a single document).

From mongoimport --help:
  --jsonArray                           load a json array, not one item per
                                              line. Currently limited to 16MB.

Can you double check exactly what is in the file you are loading and what is the exact command line you are running?   Also, please include a sample document, and version of MongoDB as well as your OS and file system details.

I would also check output of mongostat and disk stats on your system - I suspect you are limited by page faulting, or worse (check iostat as well as mongostat).

Asya




Reply all
Reply to author
Forward
0 new messages