how to insert XML into MongoDB ...?

10,378 views
Skip to first unread message

Gregor Stich

unread,
Feb 18, 2010, 5:30:19 PM2/18/10
to mongodb-user
Hi,

for one of our next projects I plan to use MongoDB, but there are some
questions left which I hope you can anwer.
Some major requirements and key aspects of the before-mentioned
project are:

1 - We have lots documents in *XML* style, following a DTD (details
see below). We like to use a document-oriented DB like Mongo as the
XML files are not suitable to be stored in an old-fashioned RDBMS.
2 - Lots of documents means: ± 100.000, each about 5-10KB.
3 - We must be able to search the content using the DB's API.
4 - Everything will be done in Java (or Groovy). Fortunately we know
that there are JVM drivers :)

Most important to me is the following question:
==================================
How to insert XML content to MongoDB?
On http://www.mongodb.org/display/DOCS/Inserting it is written that
it's possible, but frankly spoken I haven't seen any example using
XML. Only JSON.
Could you give me an example how to to that? (Best in Java :-))

Now to the XML's structure:
=====================
It's a recursive schema simply looking like this:
<record>
<item name="X"><value>42</value></item>
<item name="y"><value><item name="z"><value>inner node value</value></
item></item>
</record>

So, every item's value can have another item, and so on.
We are able to convert the XML into the following form:
<record>
<X><value>42</value></X>
<y><value><z><value>inner node value</value></z></y>
</record>

... thus we make the function of each item clearer and will be able to
generate named attributes (instead of thousands of items).

Is this structure above suitable to be inserted into MongoDB?
I'd like to be able to query the DB like MongoDB.mydb.find(y: {z:
"inner node value"}});


Finally, there will be embedded XML content in the XML (CDATA stuff).
Can this cause any trouble for MongoDB?

Thanks!!
Greg

Kyle Banker

unread,
Feb 18, 2010, 6:16:15 PM2/18/10
to mongod...@googlegroups.com
Which driver would you be using? If you have a library that can convert XML into whichever native type converts to BSON, then you could conceivable store the data, but not as XML.

The doc page you mentioned doesn't say that you can insert XML; it just provides XML as an example of a document format.


--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.


Daniel Friesen

unread,
Feb 18, 2010, 8:09:00 PM2/18/10
to mongod...@googlegroups.com
Just for the record, there are actual xml document based databases out
there as well.

~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

Mathias Stearn

unread,
Feb 18, 2010, 8:50:53 PM2/18/10
to mongod...@googlegroups.com
if that is your xml schema then it is rather simple to JSON/BSON-ize it.

{
x: 42,
y: {z: "inner node value"}
}

On Thu, Feb 18, 2010 at 5:30 PM, Gregor Stich <grg...@googlemail.com> wrote:

Mitch Pirtle

unread,
Feb 18, 2010, 9:49:01 PM2/18/10
to mongod...@googlegroups.com
On Thu, Feb 18, 2010 at 5:30 PM, Gregor Stich <grg...@googlemail.com> wrote:

Look around for xml2json converters, that way you can store your data
in JSON/BSON but still have your XML at the app layer. Here's just a
few ideas with a quick search (in different languages to boot):

* http://www.thomasfrank.se/xml_to_json.html
* http://www.ibm.com/developerworks/xml/library/x-xml2jsonphp/
* http://search.cpan.org/~ken/XML-XML2JSON-0.05/
* http://www.phdcc.com/xml2json.htm

What language are you programming in?

-- Mitch

Gregor Stich

unread,
Feb 19, 2010, 7:56:46 AM2/19/10
to mongodb-user
Thanks for your replies.
As mentioned the project will be done using Java.

I don't see the big point in converting XML to JSON. This can be done
using a library, or "manually" by XSLT.

What about the CDATA-stuff that is embedded in the XML.
Should it be embedded into its own (JSONized) document? I don't like
the idea of storing XML content in JSON.
What's an adequate approach here?

Thanks again! :-)
Greg

> Look around for xml2json converters, that way you can store your data
> in JSON/BSON but still have your XML at the app layer. Here's just a
> few ideas with a quick search (in different languages to boot):
>
> *http://www.thomasfrank.se/xml_to_json.html
> *http://www.ibm.com/developerworks/xml/library/x-xml2jsonphp/
> *http://search.cpan.org/~ken/XML-XML2JSON-0.05/

> *http://www.phdcc.com/xml2json.htm

Julson Lim

unread,
Feb 19, 2010, 10:48:41 AM2/19/10
to mongodb-user
I think the point for converting it into JSON is that the MongoDB
document structure is more or less similar to JSON,
and with the existence of xml-to-json converters as mentioned above,
it greatly simplifies the process.

mwaschkowski

unread,
Feb 19, 2010, 10:50:55 AM2/19/10
to mongodb-user
> I don't see the big point in converting XML to JSON. This can be done
> using a library, or "manually" by XSLT.
Well, you *have* to convert it, as mongo is json data only. json is
simpler than xml:

'One word of warning. JSON is made for an easy data transfer between
systems and languages. It's syntax offers much less possibilities to
express certain data structures. It supports name/value pairs of
primitive data types, arrays and lists and allows to nest them - but
that's it! No references, no possibility for meta data (attributes),
etc.'

see: http://xstream.codehaus.org/faq.html#JSON_limitations

You xml is simple, so there wouldn't be any problems with the examples
you provided.

>
> What about the CDATA-stuff that is embedded in the XML.
> Should it be embedded into its own (JSONized) document? I don't like
> the idea of storing XML content in JSON.
> What's an adequate approach here?

I'm not too sure about this...as a field in the json I guess, since
cdata stuff is just a string right?

Gregor Stich

unread,
Feb 19, 2010, 11:53:25 AM2/19/10
to mongodb-user
Hi,

I'm aware of the abilities and "shortcomings" of JSON.


> > What about the CDATA-stuff that is embedded in the XML.

> I'm not too sure about this...as a field in the json I guess, since


> cdata stuff is just a string right?

Yes, my question is precisely this:
Is it recommended to "outsource" CDATA stuff (which can be something
like 2KB of data or more, because it's editorial text with some HTML-
formattings) or can it be included as a field in that JSON document?
Is it likely to break the JSON structure? What about new lines (or
other control characters like tabs)? What if the CDATA content
contains something like ": { ...}"? Can that be escaped safely?

Best regards
Greg

Kristina Chodorow

unread,
Feb 19, 2010, 11:56:20 AM2/19/10
to mongod...@googlegroups.com
Mongo can safely handle any CDATA string you throw at it.  If it's non-UTF8, you could save it as the binary type instead of a string.  A string containing {} is fine.


Reply all
Reply to author
Forward
0 new messages