Re: [ddi-developers] Digest for ddi-developers@googlegroups.com - 3 updates in 1 topic

64 views
Skip to first unread message

Pascal Heus

unread,
Apr 9, 2025, 10:37:05 PMApr 9
to ddi-dev...@googlegroups.com
Achim:
I couldn't agree more that APIs are more than needed ;-) 
LLMs, agents, and rising specifications such as Model Control Protocol (MCP) only add to their importance.
I'll chime in if I get the chance.
Cheers,
*P

On Wed, Apr 9, 2025 at 6:11 PM <ddi-dev...@googlegroups.com> wrote:
Pascal Heus <pas...@codata.org>: Apr 08 10:32PM -0600

Olaf:
Thanks for looking into this. I find the double 'ddi' a bit odd/confusing.
I also ran this question through a couple of LLMs.
 
Some suggestions:
- Use application/vnd.ddialliance.* (instead of application/vnd.ddi.*)
+ This may apply to other Alliance products that do not start with DDI
(e.g. SDTL, CVs, future specs)
- Consider using parameters for the version (though it should be in the XML
as well). For example application/vnd.ddialliance.ddi-c+xml;version=2.5
- For CDI (and potentially lifecycle), keep in mind that there are other
RDF serializations (JSON-LD, turtle, N3). So, I will formalize these as
well.
+ for example application/vnd.ddilliance.cdi+json
+ I'm also still hoping someday for an official codebook JSON (this is a
barrier to adoption)
- Consider official registration with IANA
Hope this helps.
Best,
*P
 
 
 
 
 
Olof Olsson <bor...@gmail.com>: Apr 09 01:49PM -0700

Hi Pascal!
Good point, make sense to use application/vnd.ddialliance.*
Having a parameter for version as a recommended part is also a good point.
for cdi i think it should be ddi-cdi to bring it into line with the naming
convension for ddi-c and ddi-l.
 
Updated suggestion (for xml):
 
 
*application/vnd.ddialliance.ddi-c+xml
application/vnd.ddialliance.ddi-l+xml
application/vnd.ddialliance.ddi-cdi+xml*
 
Example with version:
 
 
*application/vnd.ddialliance.ddi-c+xml;version=2.6
application/vnd.ddialliance.ddi-l+xml;version=3.3
application/vnd.ddialliance.ddi-cdi+xml;version=1.0*
 
Register the prefix with IANA might be good to reserve it, will talk to TC
if this is something we should look into.
 
Official json schema is in the bete for ddi-l 4.0 and also possible for
ddi-cdi.
Ddi-c might get json schema in the future but I dont think there is any
concrete plans for it yet.
 
Best regards,
Olof
Joachim Wackerow <joachim...@googlemail.com>: Apr 09 11:01PM +0200

Hi Olof, others,
 
Good exchange!
 
Reading here about mime types, thoughts about a DDI REST API came into my
mind. I think only with a default API the vision of the use of distributed
DDI can become reality. Mime types are one step but an API seems to be also
important. Could this be another topic for the next meeting?
 
I have some old practical plans on a REST API in small zip files (mime
types are also mentioned). Should I send them to the list?
 
Cheers,
Achim
 
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to ddi-developer...@googlegroups.com.

Wenzig, Knut

unread,
Apr 10, 2025, 4:58:58 AMApr 10
to ddi-dev...@googlegroups.com
Hi,

Thanks a lot for the initiative.

From my point of view, there is a need for standardized methods to publish metadata. Some of you may have seen our IASSIST Quarterly paper "State of the DDI Cloud" (https://doi.org/10.29173/iq1116), in which we analyzed 250,000 records of DDI-Codebook metadata harvested via OAI-PMH. With this protocol, a similar issue arises: one must specify a so-called metadataPrefix (https://www.openarchives.org/OAI/openarchivesprotocol.html#MetadataNamespaces), and we found a zoo of DDI-related prefixes—oai_ddi, oai_ddi25, oai_ddi25-de, oai_ddi25-en, oai_ddi32, ddi, ddi_c, ddi25, ddi33, oai_ddi31. This clearly demonstrates the need for guidance.

Since OAI-PMH requires metadata to be delivered as an "XML-encoded byte stream", the protocol is not well suited for other formats like JSON or others. (At SOEP, we face another issue: existing OAI-PMH server software cannot deliver a single DDI-Codebook XML record larger than 500 MB—at least not out of the box.)

An alternative approach could be to use signposting (https://signposting.org/FAIR/) and insert a link into the HTML code of a DOI landing page, such as:

```html
<link rel="describedby" type="application/XML" href="link.to/some/ddi.xml">
```

If the MIME types Olof suggested for XML and Pascal extended to other file formats—I would also like to see version numbers—harvesting would become significantly easier. This would offer a straightforward way to publish metadata for a digital object without requiring a dedicated API (in this sense, I would respectfully disagree with Achim).

In that scenario, harvesting metadata would simply require a central registry for repositories (e.g., re3data.org, which already exists), and for those repositories to provide some sort of sitemap—like Dataverse does. (Example: https://dataverse.harvard.edu/sitemap_index.xml

The Model Context Protocol (MCP, https://modelcontextprotocol.io/introduction) also seems promising, though I’m not aware of the potential costs it might impose on repositories wishing to publish metadata.

In conclusion, I believe the DDI Alliance should recommend a standardized way to publish metadata, and the proposed MIME types would be a meaningful contribution toward this goal. (It would be ideal if they could be used also as a basis for a recommendation on OAI-PMH prefixes.)

Best,
Knut







Adrian Dușa

unread,
Apr 10, 2025, 8:36:16 AMApr 10
to ddi-dev...@googlegroups.com
On Thu, Apr 10, 2025 at 11:58 AM 'Wenzig, Knut' via DDI Developers <ddi-dev...@googlegroups.com> wrote:
[...]
If the MIME types Olof suggested for XML and Pascal extended to other file formats—I would also like to see version numbers—harvesting would become significantly easier. This would offer a straightforward way to publish metadata for a digital object without requiring a dedicated API (in this sense, I would respectfully disagree with Achim).

Agree this should not be mandatory, but it shouldn't exclude it either. If a REST API is available (REST, not just an API), it would be a standard way to communicate (send/receive) with a server. I see nothing wrong discussing / agreeing / publishing the standard specifications for something like this.

Best,
Adrian

Pascal Heus

unread,
Apr 13, 2025, 5:24:04 PMApr 13
to DDI Developers
Knut:

Thanks for pointing out your "State of the DDI Cloud" paper, which I was unaware of. It's very interesting. Do you have a repository with the 250K documents you harvested? 

I did similar work around the IHSN NADA catalog application and have collected about 23K files from various countries (see server catalog). Since these use the IHSN version of the Nesstar Publisher, they are all based on that flavor of DDI-C (1.2.2 + template). They naturally all have variable-level information. I also uploaded some into a BaseX database for analysis (best open-source XML database option out there). 

I considered pushing all this into a public repository and database/API to be used as a reference for DDI developers and others but never found the time/resources. Maybe we could merge our collections?

I note the rare use of <fileDscr> and <dataDscr> mentioned in your paper. This is likewise a major surprise to me. Note that this may also be a publication/dissemination policy in some cases. For example, ICPSR does not share variable-level metadata in their DDI (which is unfortunate given it is the DDI birthplace).

Note that one common issue is quality assurance. I find many of the files out there do not properly validate against the schema. The Nesstar published, for example, does allow invalid DDI to be exported. The most notable issue is not preventing the @ID attribute from starting with a number. But it's not only validation. Given that the title is the only required DDI-C element, content is another QA related aspect (e.g., missing ID,  descriptions that are too short, etc.)? Is this something you investigated? Documenting the most common validation/QA issues as well as different DDI "flavors" would be quite useful. But validating documents should be top level recommendations. That being said, having tools and/or a free public API endpoint that supports this would be useful. This is something I'm happy to host, but it would be better under the umbrella of the Alliance.

Best,
*P

Wenzig, Knut

unread,
Apr 14, 2025, 4:39:48 AMApr 14
to ddi-dev...@googlegroups.com
Pascal:

No, we did not store the XML files. Instead, we archived a dataset containing the OAI-PMH endpoints, metadata prefixes, and record IDs required for harvesting via OAI-PMH: https://doi.org/10.5281/zenodo.13255674 (file: data.csv).
Based on our approach using re3data.org and OAI-PMH, I estimate that over half a million documents are now accessible. This is largely due to the increasing number of Dataverse instances. However, there's an important caveat: Dataverse appears to manage two versions of DDI-Codebook metadata—one exposed via OAI-PMH and another via the user-facing web interface. Only the latter includes detailed, variable-level information. Unfortunately, the OAI-PMH version lacks validation; for example, we encountered a document with an incorrectly capitalized element name.
Our research has been focused on the elements' usage. We did not look at other features of the documents like attributes or content. To check for proper validation would have been a very good feature for our paper.
While we explored the REST APIs offered by various repositories we encountered a highly fragmented landscape. No additional sources that provided DDI metadata beyond what’s available via the OAI-PMH approach could be accessed. (Also: A short look into NADA's documentation did not inform me, how I could access DDI metadata via its API.) That’s why I remain somewhat skeptical about relying on new APIs and instead feel more optimistic about the Data on the web/FAIR signposting approach, using qualified links on the DOI landing page. I was glad to read in this thread that registering a MIME type shouldn't be too complex.
At the moment, we got some funding to develop metrics for DDI adoption and use metadata harvesting to collect the numbers. I am happy to collaborate on this.

Best 

knut.







From: ddi-dev...@googlegroups.com <ddi-dev...@googlegroups.com> on behalf of Pascal Heus <pas...@codata.org>
Sent: Sunday, April 13, 2025 23:24
To: DDI Developers <ddi-dev...@googlegroups.com>
Subject: Re: [ddi-developers] mime types for DDI
 
You don't often get email from pas...@codata.org. Learn why this is important

Joachim Wackerow

unread,
Apr 16, 2025, 8:07:55 AMApr 16
to ddi-dev...@googlegroups.com

I appreciate all the contributions.

Based on this, I would like first to mention some general thoughts and suggestions.

Looking now from a high level it seems to make sense to start a combined initiative for DDI MIME types and a DDI API (possibly based on REST, OpenAPI, ...). An API would be the formal agreement between the metadata provider and the metadata user. Both sides should have a benefit from it. A standardized proposal from the DDI Alliance could form the basis of a distributed network of DDI resources. Registries will play for sure also an important role in this. But this seems to be one step further.

Such an initiative could start and follow-up in the DDI developers meeting as Olof suggested. It would help if the goals of such an initiative are supported by the DDI Alliance. Any more formal steps would need the involvement of the DDI Alliance.

Goals:

  1. Definition of MIME types and application for formal approval at IANA
  2. Definition of requirements of a DDI API. Ideally it is REST-based and provides a level for simple requirements. A second level can then provide more advanced requirements.

Action items:

  1. Discussion of the goals at the upcoming DDI developers meeting. The result could be provided to the DDI user community for further discussion.
  2. Suggestion as an idea for discussion at the upcoming DDI Alliance Meeting of the Scientific Community (there was a recent public call for this on the DDI users list). This would be the chance to get DDI Alliance bodies more involved. Who should send the suggestion?


Now I would like to address some individual messages:


Knut:

I think you raise a couple of important points. The idea of an API has not to be something complicated. But any HTTP-based request and answer for DDI resources should follow some agreed rules. This would be basically the "contract" between the metadata provider and the user. OAI-PMH is a great example for this but its design was heading to publications and XML. Nevertheless ideas from OAI-PMH could be used in an  approach for DDI.


Olof:

My old ideas (probably outdated) on MIME types and a RESTful service are available here: https://bitbucket.org/wackerow/restful_service/. The main suggestion from that time is visible here: https://htmlpreview.github.io/?https://bitbucket.org/wackerow/restful_service/raw/13e00b896f705e4b43a03a6eccf6fc8599f543e8/RESTService_2015-03-18/DDI_wadl.html


Pascal:

This would be great if you could contribute with your experiences to the API effort.


Paul:

Thanks a lot. This is really helpful. I was not aware that the procedure is nowadays easier.


Any thoughts?

Cheers,
Achim


--
You received this message because you are subscribed to the Google Groups "DDI Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ddi-developer...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/ddi-developers/BEVP281MB359974C67BB205FF70E25EECCCB32%40BEVP281MB3599.DEUP281.PROD.OUTLOOK.COM.

Pascal Heus

unread,
Apr 16, 2025, 1:22:20 PMApr 16
to DDI Developers
Achim:
A few initial thoughts around APIs. Should we create a brainstorming shared document?
Best,
*P

General:
- I would highly recommend an API-first approach. We can plan, mock, and prototype in an environment like Postman
- We can go a long way by starting with consumer APIs (read-only)
- I'm curious to explore API endpoints that are DDI flavour/version agnostic (as well of course as specific to codebook, lifecycle, and CDI)
- AI Readiness: APIs should be designed with both humans and machines in mind (users and agents)

Protocols:
I agree that this needs to be REST and JSON /JSON-LD based.  Most of the world understands and operates on this (see 2023 survey).
- Streaming should be taken into consideration, as some DDI content can be quite large (particularly CDI / Lifecycle)
- SPARQL is the natural API for metadata in RDF format/stores, and we do not need to implement. But what would be helpful are reference SPARQL queries (DDI cookbook)
- If we want to do something similar for XML, we can use XQuery (and BaseX as an OSS back-end database)
- Layering REST on top of SPARQL (or XQuery) can be a good implementation approach
- GraphQL could also be considered as an overlay that brings together data (SQL) and RDF backends

Wendy Thomas

unread,
Apr 16, 2025, 1:31:25 PMApr 16
to ddi-dev...@googlegroups.com


On Wed, Apr 16, 2025 at 8:26 AM Wendy Thomas <w...@umn.edu> wrote:
Just to let everyone know this issue was filed in the TC issue tracker at our request. This is an area that TC will be working with the developers group (taking advantage of their advice and expertise) in pursuing what is needed to address this for the DDI Alliance. Olof and Oliver as well as others have also expressed interest in developing a section of implementation guides for the DDI site.

Wendy



--
Wendy L. Thomas                            
ISRDI [retired]

Joachim Wackerow

unread,
Apr 28, 2025, 7:58:15 AMApr 28
to ddi-dev...@googlegroups.com

Pascal:

A shared document on the API topic for this would be great. This can be used for the preparation of the developers meeting.

All:
I sent now a suggestion re content types and API to the open call of the DDI Alliance community meeting. You should have received it if you are subscribed to the DDI users list:

Cheers;
Achim


Reply all
Reply to author
Forward
0 new messages