OAI: export of metadata field 'Controlled Access Files' available?

41 views
Skip to first unread message

Janet McDougall - Australian Data Archive

unread,
May 13, 2019, 2:25:07 AM5/13/19
to Dataverse Users Community
Hi All

I am mapping Dataverse OAI export (DDI) to allow metadata harvesting by an Australian research infrastructure service.  We would like to identify whether a dataset holds 'restricted' datafiles via harvested metadata.

Dataverse provides a metadata field that specifies whether there are restricted datafiles in a dataset, but I am unable to find this field mapped to any metadata standard, or discover whether it is able to be exported and harvested.  

See the field with sample content:
Controlled Access Files  There are 4 Controlled Access files in this dataset.

We could use an existing field <restrctn> for example to hold this information, but it would be useful to have access to the 'Controlled Access Files' field ouput as it already calculates and holds holds the number of controlled access files in the dataset.

Has anyone else have any suggestions?

Thanks
Janet

Philip Durbin

unread,
May 13, 2019, 7:55:41 PM5/13/19
to dataverse...@googlegroups.com
Hi Janet,

I poked around https://dataverse.ada.edu.au but couldn't find a dataset that's using this "Controlled Access Files" metadata field you're talking about. Can you please link to one?

Thanks!

Phil

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/bb2e42ab-04fd-412d-b96d-05b455cbfe5b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--

Janet McDougall - Australian Data Archive

unread,
May 13, 2019, 11:02:43 PM5/13/19
to Dataverse Users Community
Hi Phil
If you scroll down under the TERMS tab towards the bottom you will see the field. This example dataset is typical of all ADA datasets at the moment as They all hold at least one restricted datafile:

https://dataverse.ada.edu.au/dataset.xhtml?persistentId=doi:10.26193/R70QJY
Controlled Access Files. There are 4 Controlled Access files in this dataset.

This field is not mapped to any DDI elements as I presume it is a generated field calculated on the restricted datafiles flagged in the database. I don’t know how this is coded, only presuming...

Thanks
Janet

Steven McEachern

unread,
May 14, 2019, 5:44:12 AM5/14/19
to Dataverse Users Community
Hi Phil

We have renamed “Restricted Access” to “Controlled Access” in our ADA installation.

Cheers
Steve

Philip Durbin

unread,
May 14, 2019, 6:28:35 AM5/14/19
to dataverse...@googlegroups.com
Ah, thanks to both of you. I was having trouble finding an example.

In a "vanilla" non-forked installation of Dataverse, you can see something like "There are 2 restricted files in this dataset." under "Restricted Files + Terms of Access" under the "Terms" tab of a dataset like https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/RKEZEP

Here's where the number comes from in the code from a method called "getRestrictedFileCount":


It looks like you can get "dvcore:restricted":true (or false) at the file level from the "OAI_ORE" export format: https://dataverse.harvard.edu/api/datasets/export?exporter=OAI_ORE&persistentId=doi:10.7910/DVN/RKEZEP

You can also get "restricted":true (or false) at the file level from the "dataverse_json" export format, but this is not based on a standard: https://dataverse.harvard.edu/api/datasets/export?exporter=dataverse_json&persistentId=doi:10.7910/DVN/RKEZEP


It sounds like want an OAI-PMH harvest-able field that gives just the number of restricted files (rather than adding them up). Something like "numberOfRestrictedFiles: 2". I can't imagine this would be hard to add as long as there's a field to put it in. Finding the right field in the right standard is the tricky part. :)

I hope this helps,

Phil

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Crosas, Mercè

unread,
May 14, 2019, 8:57:06 AM5/14/19
to dataverse...@googlegroups.com
Perhaps the best place to add this is in the study description section of the DDI since that section provides a summary of all files in the dataset.



For more options, visit https://groups.google.com/d/optout.
--
Mercè Crosas, Ph.D.
Harvard University's Research Data Officer, Office of Vice Provost for Research
Chief Data Science and Technology Officer, Institute for Quantitative Social Science

Janet McDougall - Australian Data Archive

unread,
May 15, 2019, 2:52:00 AM5/15/19
to Dataverse Users Community
hi All

The Dataverse "DDI exported metadata" xml file descriptions are mapped to <otherMat> (Other Study-Related Materials).  The definition even here does not really cater for datafiles, but rather other materials, although is able to capture data file information: 

"Other Study-Related Materials may include: questionnaires, coding notes, SPSS/SAS/Stata setup files (and others), user manuals, continuity guides, sample computer software programs, glossaries of terms, interviewer/project instructions, maps, database schema, data dictionaries, show cards, coding information, interview schedules, missing values information, frequency files, variable maps, etc."

<otherMat ID="f2841442" URI="https://dataverse.harvard.edu/api/access/datafile/2841442" level="datafile">
<labl>
00974Mawardi-Radcliffe-Box 1 of 2.zip
</labl>
<txt>
Deidentified digitized paper data
</txt>
<notes level="file" type="DATAVERSE:CONTENTTYPE" subject="Content/MIME Type">
application/zip
</notes>
</otherMat>


It would be convenient to identify restrictions per datafile rather than at a dataset level element, but there doesn't seem to be an explicit 'datafile' element in DDI 2.5 that I am aware of.  I have not looked at DDI Lifecycle.  Is this under future consideration for Dataverse?  Steve knows a lot more about DDI 3&4 and may recommend I don't go there at this point...

As you say oai-ore and json output includes [restricted":true]. Is the json output built on any standard or is it and export of all the metadata and database fields existing for a dataset?

Anyway, thanks for responses so far, and I would like to hear further resposes and/or corrections to my understanding.
Thanks
Janet  

On Tuesday, 14 May 2019 22:57:06 UTC+10, Crosas, Mercè wrote:
Perhaps the best place to add this is in the study description section of the DDI since that section provides a summary of all files in the dataset.

On Tue, May 14, 2019 at 6:28 AM Philip Durbin <philip...@harvard.edu> wrote:
Ah, thanks to both of you. I was having trouble finding an example.

In a "vanilla" non-forked installation of Dataverse, you can see something like "There are 2 restricted files in this dataset." under "Restricted Files + Terms of Access" under the "Terms" tab of a dataset like https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/RKEZEP

Here's where the number comes from in the code from a method called "getRestrictedFileCount":


It looks like you can get "dvcore:restricted":true (or false) at the file level from the "OAI_ORE" export format: https://dataverse.harvard.edu/api/datasets/export?exporter=OAI_ORE&persistentId=doi:10.7910/DVN/RKEZEP

You can also get "restricted":true (or false) at the file level from the "dataverse_json" export format, but this is not based on a standard: https://dataverse.harvard.edu/api/datasets/export?exporter=dataverse_json&persistentId=doi:10.7910/DVN/RKEZEP


It sounds like want an OAI-PMH harvest-able field that gives just the number of restricted files (rather than adding them up). Something like "numberOfRestrictedFiles: 2". I can't imagine this would be hard to add as long as there's a field to put it in. Finding the right field in the right standard is the tricky part. :)

I hope this helps,

Phil

On Tue, May 14, 2019 at 5:44 AM Steven McEachern <stev...@gmail.com> wrote:
Hi Phil

We have renamed “Restricted Access” to “Controlled Access” in our ADA installation.

Cheers
Steve

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.


--

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

Janet McDougall - Australian Data Archive

unread,
May 15, 2019, 5:22:56 AM5/15/19
to Dataverse Users Community
Hi All
I got distracted by thinking about access info per datafile level metadata, but as Phil says, it would also be good to have the calculated total of restricted files listed per dataset. If this was calculated content the DDI element/field would need to be reserved for export use only, or repeatable?

Otherwise, a potential element is under the Data Access <dataAccs> 2.4 Section, where we could manually enter the metadata, which is not as enticing as generated datafile metadata:

Extent of Collection
<collSize> 2.4.1.4
Summarizes the number of physical files that exist in a collection, recording the number of files that contain data and noting whether the collection contains machine-readable documentation and/or other supplementary files and information such as data dictionaries, data definition statements, or data collection instruments.
Example:
<collSize>1 data file + machine-readable documentation (PDF) + SAS data definition statements</collSize>
Optional
Not Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.

Anyway, am open to suggestions.

Thanks
Janet

Philip Durbin

unread,
May 16, 2019, 7:11:00 PM5/16/19
to dataverse...@googlegroups.com
I'm open to suggestions as well but I wanted to respond to Janet asking about Dataverse's "native" JSON format: "Is the json output built on any standard or is it an export of all the metadata and database fields existing for a dataset?"

No, the native Dataverse JSON format is not based on any standard. It's meant to be a relatively high fidelity "dump" or backup of what's in the database. It's the same format that's used in the native API to create a dataset.

I hope this helps,

Phil

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

To post to this group, send email to dataverse...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Janet McDougall - Australian Data Archive

unread,
May 16, 2019, 10:20:28 PM5/16/19
to Dataverse Users Community
hi Phil
Thanks re native JSON format response  - I presumed this was the case considering the content, but wanted to verify. It is certainly useful. 

Sorry for the overload of examples - it would be great to have both the dataset (# restricted files) and file level access details generated and mapped to DDI.  I guess the datafile level details can also relate back to an earlier discussion around file level access  - https://groups.google.com/forum/#!msg/dataverse-community/FJaVCuVzkKM/Ty9XJlP7BQAJ;context-place=forum/dataverse-community.

I've had a further discussion with Steve - we have decided that we will use the <restrctn> element to textually describe whether there are restricted files in the dataset.  This metadata will be sufficient for the harvesting client in the interim.    

Thanks again, and I will ask Wendy at the iASSIST conference (SYDNEY!!! 27-31 May) for any suggestions.
Janet

On Friday, 17 May 2019 09:11:00 UTC+10, Philip Durbin wrote:
I'm open to suggestions as well but I wanted to respond to Janet asking about Dataverse's "native" JSON format: "Is the json output built on any standard or is it an export of all the metadata and database fields existing for a dataset?"

No, the native Dataverse JSON format is not based on any standard. It's meant to be a relatively high fidelity "dump" or backup of what's in the database. It's the same format that's used in the native API to create a dataset.

I hope this helps,

Phil


On Wed, May 15, 2019 at 5:22 AM Janet McDougall - Australian Data Archive <janet.m...@anu.edu.au> wrote:
Hi All
I got distracted by thinking about access info per datafile level metadata, but as Phil says, it would also be good to have the calculated total of restricted files listed per dataset.  If this was calculated content the DDI element/field would need to be reserved for export use only, or repeatable?

Otherwise, a potential element is under the Data Access <dataAccs> 2.4  Section, where we could manually enter the metadata, which is not as enticing as generated datafile metadata:

Extent of Collection
<collSize> 2.4.1.4
Summarizes the number of physical files that exist in a collection, recording the number of files that contain data and noting whether the collection contains machine-readable documentation and/or other supplementary files and information such as data dictionaries, data definition statements, or data collection instruments.
Example:
<collSize>1 data file + machine-readable documentation (PDF) + SAS data definition statements</collSize>
Optional
Not Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.

Anyway, am open to suggestions.

Thanks
Janet

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages