How does UNF (and MD5) work in Dataverse?

74 views
Skip to first unread message

Philipp at UiT

unread,
Jun 30, 2019, 4:43:27 AM6/30/19
to Dataverse Users Community
Hi all!

I understand the basic logic behind UNFs as described in the Dataverse Guides and elsewhere. However, I have some questions about how UNF works in Dataverse:

1. Are UNFs only assigned to properly ingested tabular files?
2. Are MD5s assigned to all files that did not get a UNF?
3. In datasets that contain only one file with a UNF, is the UNF in the dataset citation identical with the UNF of the only file with a UNF (see e.g. https://doi.org/10.18710/XG6CYW)?
4. In datasets that contain more than one file with a UNF, the UNF in the dataset citation seems to be different from any of the file UNFs (see e.g. https://doi.org/10.18710/6JUYEY). I guess the UNF in the dataset citation is calculated based on the different file UNFs as described in section IIb in the Dataverse Developer Guide. If this is correct, it seems somewhat strange that the UNF in the dataset citation is specified by the addition "[fileUNF]". I would have expected this addition only in the dataFILE citation. From screenshots used in some of our presentations based on earlier versions of Dataverse, I see that UNFs in dataset citations do not contain the addition "[fileUNF]". Maybe we should drop this addition at dataset level or replace it by "[datasetUNF]"?

Best, Philipp

Philip Durbin

unread,
Jul 1, 2019, 2:11:33 PM7/1/19
to dataverse...@googlegroups.com
Yes, UNFs are only calculated for tabular files that are successfully ingested.

MD5s are assigned to all files regardless of if they get an UNF. (Technically, you can use the ":FileFixityChecksumAlgorithm" setting for stronger fixity algorithms than MD5.) We attempt to explain here: "Additionally, an MD5 checksum will be added for each file. If you upload a tabular file a Universal Numerical Fingerprint (UNF) will be added to this file." http://guides.dataverse.org/en/4.15/user/dataset-management.html#adding-a-new-dataset


You're right, maybe in the citation there should be "[datasetUNF]" to compliment "[fileUNF]" or other solutions you're suggesting. Please feel free to create an issue at https://github.com/IQSS/dataverse/issues about this.

Phil


--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/d3a81c32-fea6-4e78-b462-7cf234f8e575%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--

Philipp at UiT

unread,
Jul 2, 2019, 12:23:16 AM7/2/19
to Dataverse Users Community
Thanks for explaining/confirming, Phil!

MD5s are assigned to all files regardless of if they get an UNF.

I would also have expected that. But I can't find any MD5 on the landing page of data files that have got assigned a UNF; see the attachments. Am I looking in the wrong place?

Philipp



mandag 1. juli 2019 20.11.33 UTC+2 skrev Philip Durbin følgende:
Yes, UNFs are only calculated for tabular files that are successfully ingested.

MD5s are assigned to all files regardless of if they get an UNF. (Technically, you can use the ":FileFixityChecksumAlgorithm" setting for stronger fixity algorithms than MD5.) We attempt to explain here: "Additionally, an MD5 checksum will be added for each file. If you upload a tabular file a Universal Numerical Fingerprint (UNF) will be added to this file." http://guides.dataverse.org/en/4.15/user/dataset-management.html#adding-a-new-dataset


You're right, maybe in the citation there should be "[datasetUNF]" to compliment "[fileUNF]" or other solutions you're suggesting. Please feel free to create an issue at https://github.com/IQSS/dataverse/issues about this.

Phil


On Sun, Jun 30, 2019 at 4:43 AM Philipp at UiT <uit.p...@gmail.com> wrote:
Hi all!

I understand the basic logic behind UNFs as described in the Dataverse Guides and elsewhere. However, I have some questions about how UNF works in Dataverse:

1. Are UNFs only assigned to properly ingested tabular files?
2. Are MD5s assigned to all files that did not get a UNF?
3. In datasets that contain only one file with a UNF, is the UNF in the dataset citation identical with the UNF of the only file with a UNF (see e.g. https://doi.org/10.18710/XG6CYW)?
4. In datasets that contain more than one file with a UNF, the UNF in the dataset citation seems to be different from any of the file UNFs (see e.g. https://doi.org/10.18710/6JUYEY). I guess the UNF in the dataset citation is calculated based on the different file UNFs as described in section IIb in the Dataverse Developer Guide. If this is correct, it seems somewhat strange that the UNF in the dataset citation is specified by the addition "[fileUNF]". I would have expected this addition only in the dataFILE citation. From screenshots used in some of our presentations based on earlier versions of Dataverse, I see that UNFs in dataset citations do not contain the addition "[fileUNF]". Maybe we should drop this addition at dataset level or replace it by "[datasetUNF]"?

Best, Philipp

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
File_with_UNF.png
File_without_UNF.png

Philip Durbin

unread,
Jul 2, 2019, 6:49:58 AM7/2/19
to dataverse...@googlegroups.com
I believe you have to go to the file level page, like this: https://dataverse.no/file.xhtml?persistentId=doi:10.18710/7TSABU/YNRZUY

If you scroll down you'll see "Original File MD5".

On Tue, Jul 2, 2019 at 12:23 AM Philipp at UiT <uit.p...@gmail.com> wrote:
Thanks for explaining/confirming, Phil!

MD5s are assigned to all files regardless of if they get an UNF.

I would also have expected that. But I can't find any MD5 on the landing page of data files that have got assigned a UNF; see the attachments. Am I looking in the wrong place?

Philipp



mandag 1. juli 2019 20.11.33 UTC+2 skrev Philip Durbin følgende:
Yes, UNFs are only calculated for tabular files that are successfully ingested.

MD5s are assigned to all files regardless of if they get an UNF. (Technically, you can use the ":FileFixityChecksumAlgorithm" setting for stronger fixity algorithms than MD5.) We attempt to explain here: "Additionally, an MD5 checksum will be added for each file. If you upload a tabular file a Universal Numerical Fingerprint (UNF) will be added to this file." http://guides.dataverse.org/en/4.15/user/dataset-management.html#adding-a-new-dataset


You're right, maybe in the citation there should be "[datasetUNF]" to compliment "[fileUNF]" or other solutions you're suggesting. Please feel free to create an issue at https://github.com/IQSS/dataverse/issues about this.

Phil


On Sun, Jun 30, 2019 at 4:43 AM Philipp at UiT <uit.p...@gmail.com> wrote:
Hi all!

I understand the basic logic behind UNFs as described in the Dataverse Guides and elsewhere. However, I have some questions about how UNF works in Dataverse:

1. Are UNFs only assigned to properly ingested tabular files?
2. Are MD5s assigned to all files that did not get a UNF?
3. In datasets that contain only one file with a UNF, is the UNF in the dataset citation identical with the UNF of the only file with a UNF (see e.g. https://doi.org/10.18710/XG6CYW)?
4. In datasets that contain more than one file with a UNF, the UNF in the dataset citation seems to be different from any of the file UNFs (see e.g. https://doi.org/10.18710/6JUYEY). I guess the UNF in the dataset citation is calculated based on the different file UNFs as described in section IIb in the Dataverse Developer Guide. If this is correct, it seems somewhat strange that the UNF in the dataset citation is specified by the addition "[fileUNF]". I would have expected this addition only in the dataFILE citation. From screenshots used in some of our presentations based on earlier versions of Dataverse, I see that UNFs in dataset citations do not contain the addition "[fileUNF]". Maybe we should drop this addition at dataset level or replace it by "[datasetUNF]"?

Best, Philipp

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

To post to this group, send email to dataverse...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Philipp at UiT

unread,
Jul 3, 2019, 12:05:19 AM7/3/19
to Dataverse Users Community
Ah..., thanks! Sometimes, scrolling is useful. Somewhat sloppy of me :-/ However, the discussion about it was rather SLOPI - to use Phil's words :-)


tirsdag 2. juli 2019 12.49.58 UTC+2 skrev Philip Durbin følgende:
I believe you have to go to the file level page, like this: https://dataverse.no/file.xhtml?persistentId=doi:10.18710/7TSABU/YNRZUY

If you scroll down you'll see "Original File MD5".

On Tue, Jul 2, 2019 at 12:23 AM Philipp at UiT <uit.p...@gmail.com> wrote:
Thanks for explaining/confirming, Phil!

MD5s are assigned to all files regardless of if they get an UNF.

I would also have expected that. But I can't find any MD5 on the landing page of data files that have got assigned a UNF; see the attachments. Am I looking in the wrong place?

Philipp



mandag 1. juli 2019 20.11.33 UTC+2 skrev Philip Durbin følgende:
Yes, UNFs are only calculated for tabular files that are successfully ingested.

MD5s are assigned to all files regardless of if they get an UNF. (Technically, you can use the ":FileFixityChecksumAlgorithm" setting for stronger fixity algorithms than MD5.) We attempt to explain here: "Additionally, an MD5 checksum will be added for each file. If you upload a tabular file a Universal Numerical Fingerprint (UNF) will be added to this file." http://guides.dataverse.org/en/4.15/user/dataset-management.html#adding-a-new-dataset


You're right, maybe in the citation there should be "[datasetUNF]" to compliment "[fileUNF]" or other solutions you're suggesting. Please feel free to create an issue at https://github.com/IQSS/dataverse/issues about this.

Phil


On Sun, Jun 30, 2019 at 4:43 AM Philipp at UiT <uit.p...@gmail.com> wrote:
Hi all!

I understand the basic logic behind UNFs as described in the Dataverse Guides and elsewhere. However, I have some questions about how UNF works in Dataverse:

1. Are UNFs only assigned to properly ingested tabular files?
2. Are MD5s assigned to all files that did not get a UNF?
3. In datasets that contain only one file with a UNF, is the UNF in the dataset citation identical with the UNF of the only file with a UNF (see e.g. https://doi.org/10.18710/XG6CYW)?
4. In datasets that contain more than one file with a UNF, the UNF in the dataset citation seems to be different from any of the file UNFs (see e.g. https://doi.org/10.18710/6JUYEY). I guess the UNF in the dataset citation is calculated based on the different file UNFs as described in section IIb in the Dataverse Developer Guide. If this is correct, it seems somewhat strange that the UNF in the dataset citation is specified by the addition "[fileUNF]". I would have expected this addition only in the dataFILE citation. From screenshots used in some of our presentations based on earlier versions of Dataverse, I see that UNFs in dataset citations do not contain the addition "[fileUNF]". Maybe we should drop this addition at dataset level or replace it by "[datasetUNF]"?

Best, Philipp

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

Philipp at UiT

unread,
Jul 3, 2019, 12:14:40 AM7/3/19
to Dataverse Users Community
P.S.: I have just created a GitHub issue (#5988) for the suggested "datasetUNF" addition on dataset level.
Reply all
Reply to author
Forward
0 new messages