Why a single root and duplicated xmlns/xsi attributes under metadata?

172 views
Skip to first unread message

John Flatness

unread,
Oct 9, 2014, 3:30:46 PM10/9/14
to oai...@googlegroups.com
When looking at the examples in the OAI-PMH spec document, and examples in the wild, the XML data within the "metadata" element for a record always contains separate xmlns and xsi:schemaLocation attributes. For a ListRecords response, this results in a great deal of duplication over simply defining the namespaces, prefixes and schema locations on the root OAI element (since, after all, the results are guaranteed to be all of the same metadata format).

Even more, this gets taken out to absurd levels when the data within the metadata element doesn't have a single root. This is the case with DSpace's "qdc" format, which simply includes the elements from the dcterms namespace directly under "metadata" with no wrapper. But, in this case, the output defines the "dc" or "dcterms" namespaces and corresponding xsi:schemaLocations on every single element.

The results, for simply declaring three very short strings of metadata:

<metadata>
   
<dc:title xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:doc="http://www.lyncode.com/xoai" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://purl.org/dc/terms/ http://dublincore.org/schemas/xmls/qdc/2006/01/06/dcterms.xsd http://purl.org/dc/elements/1.1/ http://dublincore.org/schemas/xmls/qdc/2006/01/06/dc.xsd">Test Webpage</dc:title>
   
<dc:subject xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:doc="http://www.lyncode.com/xoai" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://purl.org/dc/terms/ http://dublincore.org/schemas/xmls/qdc/2006/01/06/dcterms.xsd http://purl.org/dc/elements/1.1/ http://dublincore.org/schemas/xmls/qdc/2006/01/06/dc.xsd">cat</dc:subject>
   
<dc:subject xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:doc="http://www.lyncode.com/xoai" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://purl.org/dc/terms/ http://dublincore.org/schemas/xmls/qdc/2006/01/06/dcterms.xsd http://purl.org/dc/elements/1.1/ http://dublincore.org/schemas/xmls/qdc/2006/01/06/dc.xsd">calico</dc:subject>

The OAI-PMH schema enforces a single element as the child of <metadata>, but I was all ready to note that I couldn't find the part of the standard that actually required the duplicate xmlns declarations, merely an example that noted their presence. Only on closer inspection did I realize the section introduced as an example actually includes the relevant mandates: "every metadata part must include one or more xmlns prefixed attributes" and "every metadata part must include the attribute xmlns:xsi." Curiously, the actual xsi:schemaLocation attribute is mentioned, but without the "must" language. The presence of all this within a list introduced as "The following example" makes it a little tough to tell what's required and what isn't, but I suppose the bolded "must" is clear enough. What is the rationale here, the ability to cater to harvesters that would simply chop out the text content of the metadata element and attempt to reparse it as its own XML document? 

The OAI-PMH schema requires adherence to the single-child rule to maintain validity, but what am I to make of such a large-scale and high-visibility implementer as DSpace seemingly having a clear violation of the standard and schema by omitting the "root" within <metadata>? For that matter, if breaking the rule about a single child of <metadata>, why does DSpace then bother to adhere to the rules about duplicating xmlns attributes, which combined give the frankly ridiculous result excerpted above? If I'm assuming the "naive harvester" explanation for the spec's requirement, the lack of the "root" under metadata would seemingly cause serious problems for those harvesters anyway.

-John Flatness

Simeon Warner

unread,
Oct 22, 2014, 4:52:42 AM10/22/14
to oai...@googlegroups.com
Hi John,

I don't recall the details of why this ended up as a "must", it does seem very verbose, unnecessary and nearing madness in the example you quote. XML tools were not very good when OAI-PMH was created and I suspect handling of record extraction by "chopping out" some of the response was the motivation.

I expect (hope) everyone now uses a proper XML parser to deal with OAI-PMH responses. If that is the case then a single set of namespace declarations should yield identical results. I wonder whether there are data providers out there already doing this and finding that harvesters are OK with it (even if not technically following the spec)?

Cheers,
Simeon
Reply all
Reply to author
Forward
0 new messages