Wenzig, Knut <KWe...@diw.de>: Apr 10 08:58AM
Hi,
Thanks a lot for the initiative.
From my point of view, there is a need for standardized methods to publish metadata. Some of you may have seen our IASSIST Quarterly paper "State of the DDI Cloud" (https://doi.org/10.29173/iq1116), in which we analyzed 250,000 records of DDI-Codebook metadata harvested via OAI-PMH. With this protocol, a similar issue arises: one must specify a so-called metadataPrefix (https://www.openarchives.org/OAI/openarchivesprotocol.html#MetadataNamespaces), and we found a zoo of DDI-related prefixes—oai_ddi, oai_ddi25, oai_ddi25-de, oai_ddi25-en, oai_ddi32, ddi, ddi_c, ddi25, ddi33, oai_ddi31. This clearly demonstrates the need for guidance.
Since OAI-PMH requires metadata to be delivered as an "XML-encoded byte stream", the protocol is not well suited for other formats like JSON or others. (At SOEP, we face another issue: existing OAI-PMH server software cannot deliver a single DDI-Codebook XML record larger than 500 MB—at least not out of the box.)
An alternative approach could be to use signposting (https://signposting.org/FAIR/) and insert a link into the HTML code of a DOI landing page, such as:
```html
<link rel="describedby" type="application/XML" href="link.to/some/ddi.xml">
```
If the MIME types Olof suggested for XML and Pascal extended to other file formats—I would also like to see version numbers—harvesting would become significantly easier. This would offer a straightforward way to publish metadata for a digital object without requiring a dedicated API (in this sense, I would respectfully disagree with Achim).
In that scenario, harvesting metadata would simply require a central registry for repositories (e.g., re3data.org, which already exists), and for those repositories to provide some sort of sitemap—like Dataverse does. (Example: https://dataverse.harvard.edu/sitemap_index.xml,
Documentation: https://guides.dataverse.org/en/latest/installation/config.html)
The Model Context Protocol (MCP, https://modelcontextprotocol.io/introduction) also seems promising, though I’m not aware of the potential costs it might impose on repositories wishing to publish metadata.
In conclusion, I believe the DDI Alliance should recommend a standardized way to publish metadata, and the proposed MIME types would be a meaningful contribution toward this goal. (It would be ideal if they could be used also as a basis for a recommendation on OAI-PMH prefixes.)
Best,
Knut
|
Adrian Dușa <dusa....@gmail.com>: Apr 10 03:36PM +0300
On Thu, Apr 10, 2025 at 11:58 AM 'Wenzig, Knut' via DDI Developers <
> significantly easier. This would offer a straightforward way to publish
> metadata for a digital object without requiring a dedicated API (in this
> sense, I would respectfully disagree with Achim).
Agree this should not be mandatory, but it shouldn't exclude it either. If
a REST API is available (REST, not just an API), it would be a standard way
to communicate (send/receive) with a server. I see nothing wrong discussing
/ agreeing / publishing the standard specifications for something like this.
Best,
Adrian
|