Bulk loading of xml files to Mongodb as json

64 views
Skip to first unread message

Shashidhar Rao

unread,
Jan 11, 2015, 5:19:23 AM1/11/15
to mongod...@googlegroups.com
Hi ,

Can someone help me with an open source tool to load 1 Terabytes of xml data after conversion to json and load into Mongodb.

I have been trying to get an answer but I haven't found any useful links

Please help any suggestions will be helpful.

Regards
Shashi

Rob Moore

unread,
Jan 11, 2015, 2:01:56 PM1/11/15
to mongod...@googlegroups.com

Shashidhar,

I don't know of any standard tools to import XML into MongoDB. I think the main reason for that is there are loads of little decisions that you have to make when converting XML to JSON:

* How do you encode the root document element? e.g., is <foo><bar a="a"></foo> encoded as { foo : { bar : { a : "a" } } } or { bar :  { a : "a" } }?
* Do you try and coerce the XML string values into integers, longs, dates, binary? Do you coerce some? none? all?
* What to do about tags that look like <tag>value</tag>? Does that become a sub document or a field?
* What do you do with repeated tag names?

That is just a start...

I did write a little utility to do  the conversion in Java. Feel free to use it as a starting point for your own implementation. The test files I used are here

I wrote the code very quickly so standard disclaimers apply. 

It does use a number of threads to read and insert the documents and defers the evaluation of the inserts so if you give the application enough threads it should be able to saturate the disk (for reading the files) or MongoDB's ingest capability. You can adjust the number of threads by changing the number of connections (it creates 1 XML reading thread per connection). Here is an example URI with 5 connections:

  mongodb://locahost:27017/db.collection?maxConnectionCount=5

Let me know if you have any questions.

Rob.

Maisnam Ns

unread,
Jan 12, 2015, 1:38:43 PM1/12/15
to mongod...@googlegroups.com
Hi Rob,
Thanks so much for your reply , indeed it is a good place to start.
Appreciate your help.

Regards
Shashi
Reply all
Reply to author
Forward
0 new messages