Data curation tool (DCT) Qs

154 views
Skip to first unread message

Janet McDougall - Australian Data Archive

unread,
Mar 12, 2021, 12:14:27 AM3/12/21
to Dataverse Users Community
hi all

ADA doesn't have the DCT installed so I would like to know a bit more about how it works.

"The Data Curation Tool (DCT) allows data owners and curators to view summary statistics for variables and to create and edit variable-level metadata for any tabular file in a data set."

- I have tried the demo dataset, but can't save my edits.  Where is the metadata saved to?

- Can the variable-level metadata also b exported using the 'export metadata' button on published datasets?

- How is the extra metadata able to be viewed in a published dataset?  Does it also require the installation of the Data Explorer Tool?

- Can someone link me to a dataset I can look at to see how the variable-level metadata is rendered?

thanks
Janet

Amber Leahey

unread,
Mar 12, 2021, 12:06:09 PM3/12/21
to Dataverse Users Community
Hi Janet! 

SP has a demo instance available for anyone to test this, it has both DCT and Data Explorer installed https://demodv.scholarsportal.info 

I'll try to answer some of your questions below:

- I have tried the demo dataset, but can't save my edits.  Where is the metadata saved to?
Only Admins/Dataset Admins can access the DCT for a deposited tabular datafile. The variable metadata is stored in the DV database and is versioned (in line with versioning in DV). This is a point of confusion in the workflow right now, every new edit & save in DCT creates a new version of the dataset in Dataverse. To view the latest edits in DCT, navigate back to the DV dataset, publish the latest version, and then view the changes in DCT or in Data Explorer using the updated published version. If you are still experiencing issues, let me know. Sometimes a refresh of the DV page before navigating back to the DCT tool again does the trick! (note: something to improve)

- Can the variable-level metadata also b exported using the 'export metadata' button on published datasets?
Yes, all of the variable-level metadata is exposed in the export metadata 'DDI HTML Codebook' or 'DDI' metadata options for a published dataset. 
You can see that the DDI variable level metadata including edits made using DCT are included (search for and find: <qstn><qstnLit>What is your age?</qstnLit>)
Here is a link to the DDI HTML Codebook example that is structured for saving as a PDF for example: https://demodv.scholarsportal.info/api/datasets/export?exporter=html&persistentId=doi%3A10.80240/FK2/WZS1RT 
All variable-level metadata is made public there is no way to restrict it to my knowledge once it has been published in DV (even if the files themselves are restricted access). 

- How is the extra metadata able to be viewed in a published dataset?  Does it also require the installation of the Data Explorer Tool?
Yes, Data Explorer is currently the way to view the variable metadata in the DV UI without viewing the DDI HTML Codebook or DDI metadata export from the Export Metadata options. Eventually, we'd like to build a discovery interface onto Dataverse that would allow researchers to search and interact with variables and variable metadata in a more detailed way but that is a future goal at this point. https://odesi.ca is currently connected to our Dataverse through the 'Canadian Dataverses' collection search option. We allow variable-level searching in this tool and can develop this further to be more closely tied to our Dataverse instance. We currently harvest and store all the DDI XML nightly in a MarkLogic database for indexing and on-demand search. 

- Can someone link me to a dataset I can look at to see how the variable-level metadata is rendered?
Here is a link to a dataset in demodv with minor DCT metadata edits: https://demodv.scholarsportal.info/dataset.xhtml?persistentId=doi%3A10.80240%2FFK2%2FWZS1RT&version=1.2 
You can view this in Data Explorer or via the export metadata options mentioned above. 
And here is a link to the DCT demo instance (no saving, since it's just a demo instance not tied to a DV or db instance): https://scholarsportal.github.io/Dataverse-Data-Curation-Tool/?dfId=40620&siteUrl=https://dataverse.scholarsportal.info


Hope that helps, let me know if you have any other questions!!

Philip Durbin

unread,
Mar 12, 2021, 1:34:43 PM3/12/21
to dataverse...@googlegroups.com
With regard to this...

"All variable-level metadata is made public there is no way to restrict it to my knowledge once it has been published in DV (even if the files themselves are restricted access)."

The next version of Dataverse (probably 5.4) will have a fix for this. That is to say, if files are restricted, the summary statistics (the whole "dataDscr" section of DDI) are no longer available publicly. For more details, please see https://github.com/IQSS/dataverse/pull/7642

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/211e3e5b-0ca6-4d2f-af49-25daabe8b804n%40googlegroups.com.


--

Janet McDougall - Australian Data Archive

unread,
Mar 25, 2021, 2:29:18 AM3/25/21
to Dataverse Users Community
Hi Amber & Phil
Thanks for the detailed responses.  I have a few questions about the variable-level metadata that I didn't understand:

-  I was going to check what 'question text' looks like in DDI-C but I don't have nesstar publisher installed at the moment to test (without writing xml).  I didn't notice question texts in the DDI exports so was wondering how they were being treated?

- What are the individual UNFs representing?  As the UNF is per variable as shown in this example: 

<var ID="v24476" name="Citation" intrvl="discrete">
<location fileid="f12486"/>
<labl level="variable">Citation</labl>
<varFormat type="character"/>
<notes subject="Universal Numeric Fingerprint" level="variable" type="Dataverse:UNF">UNF:6:Myef6XaQV00cE3So/4xkMQ==</notes>
</var>

I am in the process of looking at SDTL and DDI-CDI trying to understand how Dataverse variable level metadata may play into this space.   I meant to reply earlier but have only finished reading up on SDTL, and now venturing onto DDI-CDI...

Janet

Philip Durbin

unread,
Mar 25, 2021, 11:28:58 AM3/25/21
to dataverse...@googlegroups.com
I am by no means an expert on UNF but it's basically a checksum for data. Perhaps this quote will help:

"Universal Numerical Fingerprint (UNF) is a unique signature of the semantic content of a digital object. It is not simply a checksum of a binary data file. Instead, the UNF algorithm approximates and normalizes the data stored within. A cryptographic hash of that normalized (or canonicalized) representation is then computed. The signature is thus independent of the storage format. E.g., the same data object stored in, say, SPSS and Stata, will have the same UNF."


Janet McDougall - Australian Data Archive

unread,
Mar 31, 2021, 12:23:14 AM3/31/21
to Dataverse Users Community
hi Phil
I understand how UNF is a checksum for a a data file, but I was wondering how they are calculated for a variable.  

Each variable has a unique UNF so I was wondering about that. I have found some doco describing how multiple UNFs are combined so presume the individual variables are already being calculated and now display as metadata per variable.  It's not really important to me, I just wondered about it. 

https://guides.dataverse.org/en/latest/developers/unf/unf-v6.html#ii-combining-multiple-unfs-to-create-unfs-of-higher-level-objects  

<var ID="v24476" name="Citation" intrvl="discrete">
<location fileid="f12486"/>
<labl level="variable">Citation</labl>
<varFormat type="character"/>
<notes subject="Universal Numeric Fingerprint" level="variable" type="Dataverse:UNF">UNF:6:Myef6XaQV00cE3So/4xkMQ==</notes>
</var>

Amber Leahey

unread,
Apr 6, 2021, 10:57:09 AM4/6/21
to Dataverse Users Community
Hi Janet, 

Just saw your message, yes I didn't really markup this file well but there is one question text added (search for: <qstn>) from here: https://demodv.scholarsportal.info/api/datasets/export?exporter=ddi&persistentId=doi%3A10.80240/FK2/WZS1RT 

I'll work on some other examples! In the meantime I'm also attaching a heavily marked up DDI XML from our Nesstar repository if you are interested - this is what we are aiming for with DCT in Dataverse. Sounds like we might even be able to get more with help from Steve and others involved in DDI Alliance. 

Best, 
Amber

xml.zip

Janet McDougall - Australian Data Archive

unread,
Apr 6, 2021, 10:11:04 PM4/6/21
to Dataverse Users Community
hi Amber
Thanks for following my question up - I should have further searched the example you gave...  I had planned to create a quick xml with Nesstar Publisher but I am waiting to have it installed again.  I presumed the DCT is aiming for the same functions as Nesstar re question and variable metadata.   I also saw the latest Dataverse 5.4 release does not export 'dataDscr' info for restricted files as Phil mentioned above.
Janet

Reply all
Reply to author
Forward
0 new messages