Fixity Checks and Checksum

37 views
Skip to first unread message

Sherry Lake

unread,
Jan 16, 2025, 11:37:50 AMJan 16
to Dataverse Users Community

We now have a mix of checksums on our Dataverse installation.

The default UI (and DVUploader API) uses the default MD5.

We now have a lab that has files on their own S3 bucket. They are using the 3-step direct upload API commands to send those files to UVA Dataverse with SHA-1 as the checksum.

Those files in our Dataverse are now SHA-1 and the rest of our files are MD5.

Is there anyway of updating those files with SHA-1 to MD5?

I've been told that S3 does not have "MD5" as a checksum choice?

Thanks,
Sherry Lake
University of Virginia http://dataverse.lib.virginia.edu


James Myers

unread,
Jan 16, 2025, 12:05:40 PMJan 16
to dataverse...@googlegroups.com

There’s an API for that 😊 https://guides.dataverse.org/en/latest/api/native-api.html#update-checksums-to-use-new-algorithm . The original use was to allow you to update existing files when you change the https://guides.dataverse.org/en/latest/installation/config.html#filefixitychecksumalgorithm to a new setting, but it works to change files that were uploaded using an algorithm that is not the one you’ve configured. The api call won’t change files that already have the requested algorithm.

 

Other notes:

For S3, we are not relying on the hash computed by the S3 store – for multipart uploads, the hash depends on part size, etc. In all cases where the store allows Dataverse to read the file (some remote file stores and Globus stores do not), the checksum algorithm can be changed.

 

When using the call above, you should watch the log. The API will only change the checksum if it is able to validate the existing one first, so it would fail for corrupt files,

 

There is also an api call to see what the Dataverse fixity algorithm is, i.e. what the :FileFixityChecksumAlgorithm is set to. Uploader tools can call this to decide which algorithm to use. The recent versions of DVUploader do this as does the DVWebloader (I think – didn’t just check now).

 

Hope that helps,

  -- Jim

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/dataverse-community/218f1195-0fed-4b20-8303-b374f2e1f9e2n%40googlegroups.com.

Sherry Lake

unread,
Jan 17, 2025, 8:58:11 AMJan 17
to dataverse...@googlegroups.com
Thanks, Jim.

I found that API endpoint, but wasn't sure it would do what I wanted. But it does.

Now to change/update those SHA-1 files. Then to re-bag those datasets.

Found out that the ingest to our preservation system - APTrust, only accepts MD5 manifest files, so the files need to be MD5, not SHA-1.

Thanks again, Jim.
Sherry Lake


You received this message because you are subscribed to a topic in the Google Groups "Dataverse Users Community" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/dataverse-community/T5_H_cZ02O8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to dataverse-commu...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/dataverse-community/IA2P220MB1986AAC9E38125FBA533A543BF1A2%40IA2P220MB1986.NAMP220.PROD.OUTLOOK.COM.
Reply all
Reply to author
Forward
0 new messages