Digital preservation features in Dataverse

47 views
Skip to first unread message

Sara Mannheimer

unread,
Feb 7, 2017, 3:23:26 PM2/7/17
to Dataverse Users Community
Hi Dataverse community!

I'm the Data Management Librarian at Montana State University, and we're thinking of setting up a Dataverse pilot. I have a question about Dataverse's digital preservation features. I see that provenance metadata was added in a recent release. What other preservation features are available? For example, does the system run fixity checks when files are uploaded? Does it verify filetypes? Any other preservation features that you know of?

Thanks so much for your help,
Sara

Philip Durbin

unread,
Feb 7, 2017, 3:46:26 PM2/7/17
to dataverse...@googlegroups.com
Hi Sara,

Sorry, provenance is planned for a *future* release but hasn't shipped yet. When looking at http://dataverse.org/goals-roadmap-and-releases please bear in mind that the current release of Dataverse is 4.6, the top most release at https://github.com/IQSS/dataverse/releases

In terms of fixity checks, md5 values are calculated on upload but it's up to the people uploading files to double check against against the md5 values they see on their laptops or whatever. Something called a UNF is also calculated for tabular files. You can read more at http://guides.dataverse.org/en/4.6/user/dataset-management.html#adding-a-new-dataset

There's a feature in the works that does more fixity checks in the context of an rsync feature that's being added. Please look for "client-side checksums" on the roadmap link above.

Dataverse does its best to figure out file types. I want to say is uses JHove for this but I'm not super familiar with this part of the code.

Another preservation feature that's top of mind is exporting the dataset metadata into standard formats such as Dublin Core and DDI. For more on this, please see http://guides.dataverse.org/en/4.6/user/dataset-management.html#supported-metadata

Oh, it'll take binary formats like Stata and create a text version from them. See http://guides.dataverse.org/en/4.6/user/dataset-management.html#tabular-data-files

I'm realizing that not all of this is at http://dataverse.org/software-features but it probably should be! And I'm sure there's stuff I'm forgetting.

I hope this helps!

Phil

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsubscribe...@googlegroups.com.
To post to this group, send email to dataverse-community@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/54c70a5a-6645-458d-9f38-1aa4a758d650%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--

Sebastian Karcher

unread,
Feb 7, 2017, 4:05:12 PM2/7/17
to dataverse...@googlegroups.com
Hi Sara,
you're everywhere! ;)
Amber would surely know more about this, but Alan Darnell gave a great presentation about how Scholars Portal use their Dataverse in a preservation workflow at PASIG last October, slides at https://doi.org/10.6084/m9.figshare.4141761.v1
This is also one of our biggest interests, so if people want to do a panel on this at the community meeting or so, that'd be great.
Sebastian

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

To post to this group, send email to dataverse-community@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Sebastian Karcher, PhD
www.sebastiankarcher.com

Sara Mannheimer

unread,
Feb 7, 2017, 5:10:16 PM2/7/17
to Dataverse Users Community
Hi Sebastian! 

Actually, I think it's you who's everywhere! Thanks for the link to Alan Darnell's presentation—very helpful. I'm glad that the community is prioritizing preservation, even if many preservation features are still in the works.

Sara


On Tuesday, February 7, 2017 at 2:05:12 PM UTC-7, Sebastian Karcher wrote:
Hi Sara,
you're everywhere! ;)
Amber would surely know more about this, but Alan Darnell gave a great presentation about how Scholars Portal use their Dataverse in a preservation workflow at PASIG last October, slides at https://doi.org/10.6084/m9.figshare.4141761.v1
This is also one of our biggest interests, so if people want to do a panel on this at the community meeting or so, that'd be great.
Sebastian
On Tue, Feb 7, 2017 at 3:46 PM, Philip Durbin <philip...@harvard.edu> wrote:
Hi Sara,

Sorry, provenance is planned for a *future* release but hasn't shipped yet. When looking at http://dataverse.org/goals-roadmap-and-releases please bear in mind that the current release of Dataverse is 4.6, the top most release at https://github.com/IQSS/dataverse/releases

In terms of fixity checks, md5 values are calculated on upload but it's up to the people uploading files to double check against against the md5 values they see on their laptops or whatever. Something called a UNF is also calculated for tabular files. You can read more at http://guides.dataverse.org/en/4.6/user/dataset-management.html#adding-a-new-dataset

There's a feature in the works that does more fixity checks in the context of an rsync feature that's being added. Please look for "client-side checksums" on the roadmap link above.

Dataverse does its best to figure out file types. I want to say is uses JHove for this but I'm not super familiar with this part of the code.

Another preservation feature that's top of mind is exporting the dataset metadata into standard formats such as Dublin Core and DDI. For more on this, please see http://guides.dataverse.org/en/4.6/user/dataset-management.html#supported-metadata

Oh, it'll take binary formats like Stata and create a text version from them. See http://guides.dataverse.org/en/4.6/user/dataset-management.html#tabular-data-files

I'm realizing that not all of this is at http://dataverse.org/software-features but it probably should be! And I'm sure there's stuff I'm forgetting.

I hope this helps!

Phil
On Tue, Feb 7, 2017 at 3:23 PM, Sara Mannheimer <sara.ma...@gmail.com> wrote:
Hi Dataverse community!

I'm the Data Management Librarian at Montana State University, and we're thinking of setting up a Dataverse pilot. I have a question about Dataverse's digital preservation features. I see that provenance metadata was added in a recent release. What other preservation features are available? For example, does the system run fixity checks when files are uploaded? Does it verify filetypes? Any other preservation features that you know of?

Thanks so much for your help,
Sara

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.

Sara Mannheimer

unread,
Feb 7, 2017, 5:11:06 PM2/7/17
to Dataverse Users Community, philip...@harvard.edu
Hi Philip,

Thanks for providing so much detail on Dataverse preservation features. This is very helpful!

Sara


On Tuesday, February 7, 2017 at 1:46:26 PM UTC-7, Philip Durbin wrote:
Hi Sara,

Sorry, provenance is planned for a *future* release but hasn't shipped yet. When looking at http://dataverse.org/goals-roadmap-and-releases please bear in mind that the current release of Dataverse is 4.6, the top most release at https://github.com/IQSS/dataverse/releases

In terms of fixity checks, md5 values are calculated on upload but it's up to the people uploading files to double check against against the md5 values they see on their laptops or whatever. Something called a UNF is also calculated for tabular files. You can read more at http://guides.dataverse.org/en/4.6/user/dataset-management.html#adding-a-new-dataset

There's a feature in the works that does more fixity checks in the context of an rsync feature that's being added. Please look for "client-side checksums" on the roadmap link above.

Dataverse does its best to figure out file types. I want to say is uses JHove for this but I'm not super familiar with this part of the code.

Another preservation feature that's top of mind is exporting the dataset metadata into standard formats such as Dublin Core and DDI. For more on this, please see http://guides.dataverse.org/en/4.6/user/dataset-management.html#supported-metadata

Oh, it'll take binary formats like Stata and create a text version from them. See http://guides.dataverse.org/en/4.6/user/dataset-management.html#tabular-data-files

I'm realizing that not all of this is at http://dataverse.org/software-features but it probably should be! And I'm sure there's stuff I'm forgetting.

I hope this helps!

Phil
On Tue, Feb 7, 2017 at 3:23 PM, Sara Mannheimer <sara.ma...@gmail.com> wrote:
Hi Dataverse community!

I'm the Data Management Librarian at Montana State University, and we're thinking of setting up a Dataverse pilot. I have a question about Dataverse's digital preservation features. I see that provenance metadata was added in a recent release. What other preservation features are available? For example, does the system run fixity checks when files are uploaded? Does it verify filetypes? Any other preservation features that you know of?

Thanks so much for your help,
Sara

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages