custom metadata blocks now easier to spin up and evaluate

164 views
Skip to first unread message

Philip Durbin

unread,
Nov 19, 2019, 8:20:40 AM11/19/19
to dataverse...@googlegroups.com
Yesterday I tested a new feature of dataverse-ansible that allows you to spin up an installation of Dataverse with as many custom metadata blocks as you like.

I posted some screenshots at https://github.com/IQSS/dataverse-ansible/issues/133#issuecomment-555227955 which I will attach as well.

The idea is that you can configure one or more URLs to custom metadata blocks like this:


Then, during the process of installing Dataverse (which is the main purpose of dataverse-ansible), these custom metadata blocks will be loaded and Solr will be automatically reconfigured to support them.

I think the ability to create custom metadata blocks is a powerful feature of Dataverse. It allows us to express metadata specific to a scientific field or to an organization. When we launched Dataverse 4 we had this idea that we would start with five metadata blocks and encourage the community to help us create more. In recent memory we documented the process of creating metadata blocks (which is tricky), which should help: http://guides.dataverse.org/en/4.18/admin/metadatacustomization.html

I'm hoping that now that the process of spinning up Dataverse with custom metadata blocks is completely automated, it will become even easier for the community to get involved in the process of creating them.

I'd like to thank Oliver Bertuch for automating the configuration of Solr after new metadata blocks have been added. I'd like to thank Don Sizemore for building on this work and making it so easy to spin up Dataverse (on AWS EC2, for example) with custom metadata blocks by simply adding one or more URLs, as described above.

Finally, I'd like to thank members of the community that are working on new metadata blocks. These are the ones I'm aware of:


In addition, I'm aware of some bugs reported against (or suggested improvements for) existing metadata blocks:


In short, I hope that recent documentation and automation makes it easier for the community to contribute to making robust metadata support in Dataverse even better.

May a thousand metadata blocks bloom!

Thanks,

Phil
it-works.png

Geneviève Michaud

unread,
Nov 22, 2019, 8:29:04 AM11/22/19
to Dataverse Users Community
Hi Phil,

I agree, the ability to create custom metadata blocks is a very powerful feature of Dataverse, and it has been a key element in our decision to go for Dataverse (back in 2015)
Smoothing the process is a great improvement! (metadata blocks blooming, so nice! :D

Geneviève

Philipp at UiT

unread,
Nov 24, 2019, 3:32:49 AM11/24/19
to Dataverse Users Community

I agree, customizable metadata blocks are very useful. There might though be some risk of proliferation of individual customized metadata blocks in different (Dataverse) repositories. I think unless the metadata fields you want to implement in Dataverse are unique to your community/installation, we should aim at the metadata blocks to be integrated in the Dataverse software to be common for all installations - in order to ensure interoperability. I think this is already being done; cf. e.g. GitHub issue 6359 and work being done in the SSHOC project on metadata in linguistics.

Any thoughts about this?

Best, Philipp

Philipp at UiT

unread,
Dec 3, 2019, 6:59:45 AM12/3/19
to Dataverse Users Community

Philip Durbin

unread,
Dec 3, 2019, 8:59:12 AM12/3/19
to dataverse...@googlegroups.com
Apologies if I'm not being clear. I said this:

"When we launched Dataverse 4 we had this idea that we would start with five metadata blocks and encourage the community to help us create more."

What I mean is that Dataverse 4.0 shipped in May 2015 with support for these groupings of metadata fields:

- Citation Metadata (Required)
- Geospatial Metadata
- Social Science and Humanities Metadata
- Astronomy and Astrophysics Metadata
- Life Sciences Metadata
- Journal Metadata

(I guess it's actually six but "Journal Metadata" wasn't documented in the User Guide appendix until 4.17.)

My question is, are we covering enough of science with these groupings? Should Dataverse ship with support for more scientific disciplines? Should we work together to take standards such as Darwin Core and encode the fields in such a way such at a future version of Dataverse ships with support for that standard? Do we want Dataverse to ship with support for CodeMeta for software citation? Are there other metadata standards in other scientific fields that are mature enough that we can imagine supporting them out of the box with Dataverse?

Or are the coming years not the right time? Should we wait on all of this? In the other thread we saw, "The DDI Alliance is now developing an information model for integrating data across domain boundaries." Great! When will this information model be ready to use? Does it have a name or a website? Is development of this model happening on GitHub? Elsewhere?

I'm fine with waiting. I'm also excited about documenting and automating the metadata block system we have. Documentation is crucial. Automation makes development easier. I was thinking the next logical step for dataverse-ansible would be to make it easy to swap in an updated version of any of the six metadata blocks above that ship with Dataverse (such as "Life Sciences Metadata" in issue 6359 that you mentioned) for dev and QA purposes.

I hope this helps! May metadata standards evolve! May a thousand of metadata fields bloom! May every scientific discipline enjoy robust metadata support in data repositories!

Thanks,

Phil






--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/b0ef6d57-e95f-4314-b4da-dd114b062c99%40googlegroups.com.

Philipp at UiT

unread,
Dec 3, 2019, 11:02:06 AM12/3/19
to Dataverse Users Community
Thanks Phil for clarifying this!

I think we definitely should stay tuned for new domain-specific metadata standards to be included in Dataverse. Both the SSHOC project and the DARIAH project are working on a Dataverse metadata schema for linguistic data, and I guess their are other fields that have metadata standards which are mature enough to be considered as candidates to be implemented as metadata schemas in Dataverse. Maybe catalogs like the RDA Metadata Standards Catalog (https://rdamsc.bath.ac.uk/) can give an indication.

Best, Philipp
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

Crosas, Mercè

unread,
Dec 3, 2019, 11:57:54 AM12/3/19
to dataverse...@googlegroups.com
A few of us  (as Steve McEachern said) will start a discussion soon on this topic and then inform and engage the mailing list and community for feedback. 

A couple of important issues that need to be addressed are:
  • Dataverse should provide more flexibility/scalability for supporting domain-specific and across domain metadata standards that can be reused by all Dataverse installations (and evaluate the currently supported metadata blocks; for example, the lifecycle metadata is not the most commonly used metadata standard in this domain at this point).
  • Dataverse should provide better support for reusable, standardized controlled vocabularies (standard data dictionaries and ontologies).
We haven't proposed yet what's the best way to achieve this, but we should review all the work that is going on in this area. I'm guessing that the solution will involve moving away from metadata blocks as they work now, but I'm not sure yet.

Merce


--
Mercè Crosas, Ph.D.
University Research Data Officer,  HUIT  |   Chief Data Science and Technology Officer, IQSS
Harvard University


To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/1c3597ee-6c0c-4226-b8b1-f8cc386eed9e%40googlegroups.com.

Kamil Guryn

unread,
Jan 20, 2020, 8:03:34 AM1/20/20
to Dataverse Users Community
Just to let you now, there is one single less metadata block from thousands to create :)

James Myers

unread,
Jan 21, 2020, 12:10:25 PM1/21/20
to dataverse...@googlegroups.com

Kamil,

 

Nice to see a block associated with an external vocabulary with an RDF mapping!

 

FWIW, the metadata block you’ve created can be more directly tied to the Darwin Core RDF terms with a couple minor changes. Your block made me realize that the info on how to do that isn’t in the docs yet, so I just created a pull request (see https://github.com/IQSS/dataverse/pull/6548 ). The benefit to be had is that info from this block will be correctly serialized in JSON-LD in the OAI_ORE metadata export and the archival zipped Bags that Dataverse can create. The OAI_ORE file can be parsed by RDF/JSON-LD aware programs to pull out the Darwin Core info.

 

There are two options for a change.

 

·         Since it looks like DWC prefixes all terms with http://rs.tdwg.org/dwc/terms/, you could set that as the blockURI and remove the ‘dwc’ from your term names (Dataverse appends the property name to the blockURI to create the term.) Using this option, if you also renamed the block to dwc, the JSON export would look like:

 

{

“dwc:type”:”<type value>”,

“@context”:{

                “dwc”:”http://rs.tdwg.org/dwc/terms/

       }

      

       i.e. the block name is used as the prefix, so the export would look similar to your current term naming with the dwc prefix.

 

·         Alternately, you can define the termURI for each term, e.g. http://rs.tdwg.org/dwc/terms/type for ‘dwcType’. This would allow you to keep the term names as they are, but you’d have to define the termURI for each property in the block. (In JSON-LD, this results in

{

  “dwcType”:”<dwcType value>”,

  “@context”: {

      “dwcType”: “http://rs.tdwg.org/dwc/terms/type”,

      …

   }

}

      

      with additional entries for each term in the @context – so similar but a bit more verbose.)

 

These are definitely optional changes, particularly given the discussion about further changes to Dataverse’s metadata handling, but they might be useful in a new block.

 

Cheers,

   -- Jim

--

You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

Kamil Guryn

unread,
Jan 30, 2020, 10:12:39 AM1/30/20
to Dataverse Users Community
Jim,

Thanks for that. I think I found some issue with first option, because after removing "dwc" from term names the duplicated fields will appear in SOLR (that must be globally unique), i.e field name: "type" is used in other metadata block. Please correct me if it's working in the other way? If no, I'll choose the second option.

Best Regards,
Kamil

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

James Myers

unread,
Jan 30, 2020, 10:30:43 AM1/30/20
to dataverse...@googlegroups.com

Kamil,

I think you’re right, and that ought to be reported as an issue (metadatablocks must have unique terms (independent of any semantic web links) – it looks like that is documented in http://guides.dataverse.org/en/latest/admin/metadatacustomization.html but it seems like something that could/should be fixed in a metadata redesign). As you say, the second option is probably the best way to go then.

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

--

You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/d2436d07-8b93-4f92-84ae-3c1e8ea70505%40googlegroups.com.

Oliver Bertuch

unread,
Feb 27, 2020, 7:10:14 AM2/27/20
to Dataverse Users Community
Hey folks,

you guys might be interested in https://github.com/IQSS/dataverse/issues/6700 as a follow up to my addition of that little shell script.

Please reach out there if you are interested.

Best,
Oliver
Reply all
Reply to author
Forward
0 new messages