Shashidhar,
I don't know of any standard tools to import XML into MongoDB. I think the main reason for that is there are loads of little decisions that you have to make when converting XML to JSON:
* How do you encode the root document element? e.g., is <foo><bar a="a"></foo> encoded as { foo : { bar : { a : "a" } } } or { bar : { a : "a" } }?
* Do you try and coerce the XML string values into integers, longs, dates, binary? Do you coerce some? none? all?
* What to do about tags that look like <tag>value</tag>? Does that become a sub document or a field?
* What do you do with repeated tag names?
That is just a start...
I did write a
little utility to do the conversion in Java. Feel free to use it as a starting point for your own implementation. The test files I used are
here.
I wrote the code very quickly so standard disclaimers apply.
It does use a number of threads to read and insert the documents and defers the evaluation of the inserts so if you give the application enough threads it should be able to saturate the disk (for reading the files) or MongoDB's ingest capability. You can adjust the number of threads by changing the number of connections (it creates 1 XML reading thread per connection). Here is an example URI with 5 connections:
mongodb://locahost:27017/db.collection?maxConnectionCount=5
Let me know if you have any questions.
Rob.