Questions about Dataverse for repository evaluation

76 views
Skip to first unread message

julian...@g.harvard.edu

unread,
Jul 5, 2018, 9:11:41 AM7/5/18
to Dataverse Users Community
Christy Grant from The University Corporation for Atmospheric Research is evaluating data repositories and in a support ticket last week asked questions that she couldn't find answers for in the Dataverse guides or in a repository comparative review Dataverse published last year. I let her know I'd open the discussion to the community. The questions are copied below with asterisks and I've answered what I can so far:

* 1. Customization
* We want to be able to completely customize pages or parts of pages - how is that done beyond the Dataverse Admin UI options?

* Can pages be customized beyond CSS? E.g., can we add custom sections of text, links, or other content to a page?  Can we change labels? Is there a way to add pages?  Add JavaScript?

* Can we override functionality with code by overriding Java files, etc? (but without forking)

* Metadata fields - Can we add metadata fields to the input form, schema, database?

Yes. The process for adding and editing metadata fields is being documented in this Google Doc: Dataverse 4.x Metadata Block Syntax and Semantics

* Does the Dataverse schema support multiple values (e.g. multiple Authors).

Yes. For example, in Dataverse's citation "metadata block", depositors can add multiple authors, descriptions, alternative IDs, keywords and more. The Metadata Block Syntax and Semantics doc details how to control which fields are "repeatable".


* 2. ISO metadata - We need to support a specific custom dialect of ISO19139/19115 that is far richer than Dublin Core.  Is there a way to crosswalk ISO to the DC or whatever schema is used internally by Dataverse?

The nine metadata fields in Dataverse's geospatial metadatablock are influenced by the geospatial fields in DDI Codebook 2.5:

Geogrphic Coverage
1. Country / Nation
2. State / Province
3. City
4. Other

5. Geographic Unit

Geographic Bounding Box
6. West Longitude
7. East Longitude
8. North Latitude
9. South Latitude

From my first brief look at the ISO 19115 standard, I would think mapping metadata from your custom ISO 19139/19115 dialect to these 9 fields would mean a lot of information loss. My first thought would be to edit Dataverse's geospatial metadata block instead, adding and editing fields you need, so the mapping is closer to 1 to 1. I hope that makes sense.

What I'm not sure about is how customizing Dataverse's standard metadata fields, like its geo bounding box fields, affects interoperability: What would someone editing fields need to do to make sure that edited fields are still mapped well to the other metadata standards Dataverse exports (DDI, DC, DataCite and Schema.org)?

* 3. Reporting - Is there any metrics reporting for downloads, page hits, etc?  I found Curl commands (http://guides.dataverse.org/en/latest/api/metrics.html). Would we have to use those to build our own report?

The metrics API is pretty new. I think the idea is to continue developing it. Some in the community running their own Dataverse installations have very recently shared how they're using the APIs to build custom reports and supplementing the APIs with queries to the database. I know of three installations that are using a mix of the new APIs, database queries and Google Analytics to generate reports: CGIAR's repositories, Scholar's Portal and Texas Digital Repository.

The Admin Guides include a page on monitoring http traffic, including using Google Analytics and Piwiks: http://guides.dataverse.org/en/latest/admin/monitoring.html

* 4. Audit- Is there a way to audit data - e.g. check integrity of files and validity of URL links, limit folder sizes. Is there a reporting mechanism for infractions (e.g. email, diagnostics page). Are checksums supported?

Dataverse developed an app called Miniverse that you can connect to your Dataverse installation. It has a dashboard with a lot of quality control information. You can see Harvard Dataverse's QA dashboard here: https://services.dataverse.harvard.edu/miniverse/metrics/metrics-links. It includes a report on checksums, which are supported (http://guides.dataverse.org/en/latest/installation/config.html?highlight=checksums#filefixitychecksumalgorithm). It also uses universal numerical fingerprints (http://guides.dataverse.org/en/latest/developers/unf/index.html).

Miniverse is no longer being worked on - the thinking, I think, is that the metrics APIs will take over as a more sustainable and scalable method of metrics reporting.

* 5. Versioning - Does versioning support files or only datasets?


* 6. System Monitoring - Are there tools for logging, automated error notification via email, security/access breach notification?

The Monitoring page in the Admin guides includes some monitoring tools that we know work with Dataverse: http://guides.dataverse.org/en/latest/admin/monitoring.html. I'm hoping others in our community can write about other tools.


There are a few demo installations of Dataverse useful for evaluation. One is http://demo.dataverse.org/.

And here's a discussion last summer about Dataverse and DSpace: https://groups.google.com/forum/#!msg/dataverse-community/xHCX9mWWqbo/_zv_KZ1GAwAJ

I hope this is all helpful!

Julian

Pete Meyer

unread,
Jul 5, 2018, 9:49:37 AM7/5/18
to Dataverse Users Community


On Thursday, July 5, 2018 at 9:11:41 AM UTC-4, julian...@g.harvard.edu wrote:
Christy Grant from The University Corporation for Atmospheric Research is evaluating data repositories and in a support ticket last week asked questions that she couldn't find answers for in the Dataverse guides or in a repository comparative review Dataverse published last year. I let her know I'd open the discussion to the community. The questions are copied below with asterisks and I've answered what I can so far:

* 1. Customization
* We want to be able to completely customize pages or parts of pages - how is that done beyond the Dataverse Admin UI options?

* Can pages be customized beyond CSS? E.g., can we add custom sections of text, links, or other content to a page?  Can we change labels? Is there a way to add pages?  Add JavaScript?

The custom header / custom footer options can be used to add javascript, and other HTML elements.  I don't think these were designed for large-scale customization, but these could be used for adding additional sections.
 

* Can we override functionality with code by overriding Java files, etc? (but without forking)

There have been discussions about several ways to approach this, but that the moment my understanding is that this isn't supported currently (others may be able to provide better insight than me though).
 

* Metadata fields - Can we add metadata fields to the input form, schema, database?

Yes. The process for adding and editing metadata fields is being documented in this Google Doc: Dataverse 4.x Metadata Block Syntax and Semantics

* Does the Dataverse schema support multiple values (e.g. multiple Authors).

Yes. For example, in Dataverse's citation "metadata block", depositors can add multiple authors, descriptions, alternative IDs, keywords and more. The Metadata Block Syntax and Semantics doc details how to control which fields are "repeatable".


* 2. ISO metadata - We need to support a specific custom dialect of ISO19139/19115 that is far richer than Dublin Core.  Is there a way to crosswalk ISO to the DC or whatever schema is used internally by Dataverse?

The nine metadata fields in Dataverse's geospatial metadatablock are influenced by the geospatial fields in DDI Codebook 2.5:

Geogrphic Coverage
1. Country / Nation
2. State / Province
3. City
4. Other

5. Geographic Unit

Geographic Bounding Box
6. West Longitude
7. East Longitude
8. North Latitude
9. South Latitude

From my first brief look at the ISO 19115 standard, I would think mapping metadata from your custom ISO 19139/19115 dialect to these 9 fields would mean a lot of information loss. My first thought would be to edit Dataverse's geospatial metadata block instead, adding and editing fields you need, so the mapping is closer to 1 to 1. I hope that makes sense.

What I'm not sure about is how customizing Dataverse's standard metadata fields, like its geo bounding box fields, affects interoperability: What would someone editing fields need to do to make sure that edited fields are still mapped well to the other metadata standards Dataverse exports (DDI, DC, DataCite and Schema.org)?

I'm not sure about mappings to existing blocks; but my understanding is that custom blocks are generally not included in various exports (although they are visible in Dataverse native API).
 

* 3. Reporting - Is there any metrics reporting for downloads, page hits, etc?  I found Curl commands (http://guides.dataverse.org/en/latest/api/metrics.html). Would we have to use those to build our own report?

The metrics API is pretty new. I think the idea is to continue developing it. Some in the community running their own Dataverse installations have very recently shared how they're using the APIs to build custom reports and supplementing the APIs with queries to the database. I know of three installations that are using a mix of the new APIs, database queries and Google Analytics to generate reports: CGIAR's repositories, Scholar's Portal and Texas Digital Repository.

The Admin Guides include a page on monitoring http traffic, including using Google Analytics and Piwiks: http://guides.dataverse.org/en/latest/admin/monitoring.html

* 4. Audit- Is there a way to audit data - e.g. check integrity of files and validity of URL links, limit folder sizes. Is there a reporting mechanism for infractions (e.g. email, diagnostics page). Are checksums supported?


Checksums (MD5 or SHA1) are supported, and the infrastructure for automated checking of file integrity is available, but I don't believe this is build in to the application.
 
Reply all
Reply to author
Forward
0 new messages