Dataverse 5.13 is here! Schema.org improvements, DVWebloader, NetCDF/HDF5 support, geospatial search, CodeMeta, and more!

64 views
Skip to first unread message

Philip Durbin

unread,
Feb 14, 2023, 1:35:53 PM2/14/23
to dataverse...@googlegroups.com
Hello Dataverse enthusiasts!

We just released Dataverse 5.13 and it's packed with good stuff.

To highlight a few features, there are Schema.org improvements, a new DVWebloader, NetCDF/HDF5 support, geospatial search, CodeMeta support, and many more! We've deployed the code to https://demo.dataverse.org if you'd like to kick the tires.

Enjoy! Thank you to our many contributors! As always, please let us know what you think!

Thanks,

Phil

p.s. Below are the highlights and major use cases (also at https://dataverse.org/blog/dataverse-software-513-release ) but you can find even more details and upgrade instructions at https://github.com/IQSS/dataverse/releases/tag/v5.13

# Dataverse 5.13

This release brings new features, enhancements, and bug fixes to the Dataverse software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

## Release Highlights

### Schema.org Improvements (Some Backward Incompatibility)

The Schema.org metadata used as an export format and also embedded in dataset pages has been updated to improve compliance with Schema.org's schema and Google's recommendations for Google Dataset Search.

Please be advised that these improvements have the chance to break integrations that rely on the old, less compliant structure. For details see the "backward incompatibility" section below. (Issue #7349)

### Folder Uploads via Web UI (dvwebloader, S3 only)

For installations using S3 for storage and with direct upload enabled, a new tool called [DVWebloader](https://github.com/gdcc/dvwebloader) can be enabled that allows web users to upload a folder with a hierarchy of files and subfolders while retaining the relative paths of files (similarly to how the DVUploader tool does it on the command line, but with the convenience of using the browser UI). See [Folder Upload](https://guides.dataverse.org/en/5.13/user/dataset-management.html#folder-upload) in the User Guide for details. (PR #9096)

### Long Descriptions of Collections (Dataverses) are Now Truncated

Like datasets, long descriptions of collections (dataverses) are now truncated by default but can be expanded with a "read full description" button. (PR #9222)

### License Sorting

Licenses as shown in the dropdown in UI can be now sorted by the superusers. See [Sorting Licenses](https://guides.dataverse.org/en/5.13/installation/config.html#sorting-licenses) section of the Installation Guide for details. (PR #8697)

### Metadata Field Production Location Now Repeatable, Facetable, and Enabled for Advanced Search

Depositors can now click the plus sign to enter multiple instances of the metadata field "Production Location" in the citation metadata block. Additionally this field now appears on the Advanced Search page and can be added to the list of search facets. (PR #9254)

### Support for NetCDF and HDF5 Files

 NetCDF and HDF5 files are now detected based on their content rather than just their file extension. Both "classic" NetCDF 3 files and more modern NetCDF 4 files are detected based on content. Detection for older HDF4 files is only done through the file extension ".hdf", as before.

For NetCDF and HDF5 files, an attempt will be made to extract metadata in NcML (XML) format and save it as an auxiliary file. There is a new NcML previewer available in the [dataverse-previewers](https://github.com/gdcc/dataverse-previewers) repo.

An [extractNcml](https://guides.dataverse.org/en/5.13/api/native-api.html#extract-ncml) API endpoint has been added, especially for installations with existing NetCDF and HDF5 files. After upgrading, they can iterate through these files and try to extract an NcML file.

See the [NetCDF and HDF5](https://guides.dataverse.org/en/5.13/user/dataset-management.html#netcdf-and-hdf5) section of the User Guide for details. (PR #9239)

### Support for .eln Files (Electronic Laboratory Notebooks)

The [.eln file format](https://github.com/TheELNConsortium/TheELNFileFormat) is used by Electronic Laboratory Notebooks as an exchange format for experimental protocols, results, sample descriptions, etc...

### Improved Security for External Tools

External tools can now be configured to use signed URLs to access the Dataverse API as an alternative to API tokens. This eliminates the need for tools to have access to the user's API token in order to access draft or restricted datasets and datafiles. Signed URLs can be transferred via POST or via a callback when triggering a tool via GET. See [Authorization Options](https://guides.dataverse.org/en/5.13/api/external-tools.html#authorization-options) in the External Tools documentation for details. (PR #9001)

### Geospatial Search (API Only)

Geospatial search is supported via the Search API using two new [parameters](https://guides.dataverse.org/en/5.13/api/search.html#parameters): `geo_point` and `geo_radius`.

The fields that are geospatially indexed are "West Longitude", "East Longitude", "North Latitude", and "South Latitude" from the "Geographic Bounding Box" field in the geospatial metadata block. (PR #8239)

### Reproducibility and Code Execution with Binder

Binder has been added to the list of external tools that can be added to a Dataverse installation. From the dataset page, you can launch Binder, which spins up a computational environment in which you can explore the code and data in the dataset, or write new code, such as a Jupyter notebook. (PR #9341)

### CodeMeta (Software) Metadata Support (Experimental)

Experimental support for research software metadata deposits has been added.

By adding a metadata block for [CodeMeta](https://codemeta.github.io), we take another step toward adding first class support of diverse FAIR objects, such as research software and computational workflows.

There is more work underway to make Dataverse installations around the world "research software ready."

**Note:** Like the metadata block for computational workflows before, CodeMeta is listed under [Experimental Metadata](https://guides.dataverse.org/en/5.13/user/appendix.html#experimental-metadata) in the guides. Experimental means it's brand new, opt-in, and might need future tweaking based on experience of usage in the field. We hope for feedback from installations on the new metadata block to optimize and lift it from the experimental stage. (PR #7877)

### Mechanism Added for Stopping a Harvest in Progress

It is now possible for a sysadmin to stop a long-running harvesting job. See [Harvesting Clients](https://guides.dataverse.org/en/5.13/admin/harvestclients.html#how-to-stop-a-harvesting-run-in-progress) in the Admin Guide for more information. (PR #9187)

### API Endpoint Listing Metadata Block Details has been Extended

The API endpoint `/api/metadatablocks/{block_id}` has been extended to include the following fields:

- `controlledVocabularyValues` - All possible values for fields with a controlled vocabulary. For example, the values "Agricultural Sciences", "Arts and Humanities", etc. for the "Subject" field.
- `isControlledVocabulary`:  Whether or not this field has a controlled vocabulary.
- `multiple`: Whether or not the field supports multiple values.

See [Metadata Blocks](https://guides.dataverse.org/en/5.13/api/native-api.html#metadata-blocks-api) in the API Guide for details. (PR #9213)

### Advanced Database Settings

You can now enable advanced database connection pool configurations useful for debugging and monitoring as well as other settings. Of particular interest may be `sslmode=require`. See the new [Database Persistence](https://guides.dataverse.org/en/5.13/installation/config.html#database-persistence) section of the Installation Guide for details. (PR #8915)

### Support for Cleaning up Leftover Files in Dataset Storage

Experimental feature: the leftover files stored in the Dataset storage location that are not in the file list of that Dataset, but are named following the Dataverse technical convention for dataset files, can be removed with the new [Cleanup Storage of a Dataset](https://guides.dataverse.org/en/5.13/api/native-api.html#cleanup-storage-of-a-dataset) API endpoint.

### OAI Server Bug Fixed

A bug introduced in 5.12 was preventing the Dataverse OAI server from serving incremental harvesting requests from clients. It was fixed in this release (PR #9316).

## Major Use Cases and Infrastructure Enhancements

Changes and fixes in this release not already mentioned above include:

- Administrators can configure an alternative storage location where files uploaded via the UI are temporarily stored during the transfer from client to server. (PR #8983, See also [Configuration Guide](http://guides.dataverse.org/en/5.13/installation/config.html#temporary-upload-file-storage))
- To improve performance, Dataverse estimates download counts. This release includes an update that makes the estimate more accurate. (PR #8972)
- Direct upload and out-of-band uploads can now be used to replace multiple files with one API call (complementing the prior ability to add multiple new files). (PR #9018)
- A persistent identifier, [CSRT](https://www.cstr.cn/search/specification/), is added to the Related Publication field's ID Type child field. For datasets published with CSRT IDs, Dataverse will also include them in the datasets' Schema.org metadata exports. (Issue #8838)
- Datasets that are part of linked dataverse collections will now be displayed in their linking dataverse collections.

Reply all
Reply to author
Forward
0 new messages