for one of our next projects I plan to use MongoDB, but there are some
questions left which I hope you can anwer.
Some major requirements and key aspects of the before-mentioned
project are:
1 - We have lots documents in *XML* style, following a DTD (details
see below). We like to use a document-oriented DB like Mongo as the
XML files are not suitable to be stored in an old-fashioned RDBMS.
2 - Lots of documents means: ± 100.000, each about 5-10KB.
3 - We must be able to search the content using the DB's API.
4 - Everything will be done in Java (or Groovy). Fortunately we know
that there are JVM drivers :)
Most important to me is the following question:
==================================
How to insert XML content to MongoDB?
On http://www.mongodb.org/display/DOCS/Inserting it is written that
it's possible, but frankly spoken I haven't seen any example using
XML. Only JSON.
Could you give me an example how to to that? (Best in Java :-))
Now to the XML's structure:
=====================
It's a recursive schema simply looking like this:
<record>
<item name="X"><value>42</value></item>
<item name="y"><value><item name="z"><value>inner node value</value></
item></item>
</record>
So, every item's value can have another item, and so on.
We are able to convert the XML into the following form:
<record>
<X><value>42</value></X>
<y><value><z><value>inner node value</value></z></y>
</record>
... thus we make the function of each item clearer and will be able to
generate named attributes (instead of thousands of items).
Is this structure above suitable to be inserted into MongoDB?
I'd like to be able to query the DB like MongoDB.mydb.find(y: {z:
"inner node value"}});
Finally, there will be embedded XML content in the XML (CDATA stuff).
Can this cause any trouble for MongoDB?
Thanks!!
Greg
--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]
{
x: 42,
y: {z: "inner node value"}
}
On Thu, Feb 18, 2010 at 5:30 PM, Gregor Stich <grg...@googlemail.com> wrote:
Look around for xml2json converters, that way you can store your data
in JSON/BSON but still have your XML at the app layer. Here's just a
few ideas with a quick search (in different languages to boot):
* http://www.thomasfrank.se/xml_to_json.html
* http://www.ibm.com/developerworks/xml/library/x-xml2jsonphp/
* http://search.cpan.org/~ken/XML-XML2JSON-0.05/
* http://www.phdcc.com/xml2json.htm
What language are you programming in?
-- Mitch
I don't see the big point in converting XML to JSON. This can be done
using a library, or "manually" by XSLT.
What about the CDATA-stuff that is embedded in the XML.
Should it be embedded into its own (JSONized) document? I don't like
the idea of storing XML content in JSON.
What's an adequate approach here?
Thanks again! :-)
Greg
> Look around for xml2json converters, that way you can store your data
> in JSON/BSON but still have your XML at the app layer. Here's just a
> few ideas with a quick search (in different languages to boot):
>
> *http://www.thomasfrank.se/xml_to_json.html
> *http://www.ibm.com/developerworks/xml/library/x-xml2jsonphp/
> *http://search.cpan.org/~ken/XML-XML2JSON-0.05/
> *http://www.phdcc.com/xml2json.htm
'One word of warning. JSON is made for an easy data transfer between
systems and languages. It's syntax offers much less possibilities to
express certain data structures. It supports name/value pairs of
primitive data types, arrays and lists and allows to nest them - but
that's it! No references, no possibility for meta data (attributes),
etc.'
see: http://xstream.codehaus.org/faq.html#JSON_limitations
You xml is simple, so there wouldn't be any problems with the examples
you provided.
>
> What about the CDATA-stuff that is embedded in the XML.
> Should it be embedded into its own (JSONized) document? I don't like
> the idea of storing XML content in JSON.
> What's an adequate approach here?
I'm not too sure about this...as a field in the json I guess, since
cdata stuff is just a string right?
I'm aware of the abilities and "shortcomings" of JSON.
> > What about the CDATA-stuff that is embedded in the XML.
> I'm not too sure about this...as a field in the json I guess, since
> cdata stuff is just a string right?
Yes, my question is precisely this:
Is it recommended to "outsource" CDATA stuff (which can be something
like 2KB of data or more, because it's editorial text with some HTML-
formattings) or can it be included as a field in that JSON document?
Is it likely to break the JSON structure? What about new lines (or
other control characters like tabs)? What if the CDATA content
contains something like ": { ...}"? Can that be escaped safely?
Best regards
Greg