notes for 2019-09-10 Dataverse Community Call

27 views
Skip to first unread message

Philip Durbin

unread,
Sep 13, 2019, 8:57:05 AM9/13/19
to dataverse...@googlegroups.com
Great call the other day! Thanks for all the participation! Here are the notes from https://docs.google.com/document/d/1QjOBTlMuGQU_TSH3n3ZM9smOdYCZknwZN23LwTaAFI0/edit?usp=sharing

2019-09-10 Dataverse Community Call

Agenda

* Community Questions

Attendees

* Gustavo Durand (IQSS)
* Julian Gautier (IQSS)
* Slava Tykhonov (DANS/DataverseEU)
* Phil Durbin (IQSS)
* Courtney Mumma (TDL)
* Jim Myers (QDR, TDL)
* Tania Schlatter (IQSS)
* Jamie Jamison (UCLA)
* Sherry Lake (UVa)

Notes

* (Phil) Slava can you please talk about https://github.com/IQSS/dataverse-ddi-converter-tool  ?
   * (Slava) We Dockerized all the external tools from Jim in the separate Docker container https://github.com/IQSS/dataverse-docker/tree/glam DDI explorer from Scholars Portal also included in the distribution.
   * We're going to deploy the infrastructure on our pre-production Kubernetes cluster. We want the community to help creating mappings from DDI and use DDI converter tool to migrate their DDI files. There's a RESTful API. Contact Slava for details about how to test the DDI converter tool. We're going to concentrate on writing tests.
* (Slava) We started to work on a spreadsheet viewer to add to https://github.com/QualitativeDataRepository/dataverse-previewers
   * (Jim) Yes, please feel free to make a pull request.
* (Jamie) Now that we've taken in our first deposit I'm wondering how other installations check to make sure there is no identifiable data. Do you have a script for this? We have a tiny staff.
   * (Julian) JPAL has done some work on this, to programmatically check tabular data.
   * (Sherry) At UVa we don't check but we do use the popup publish text to put the responsibility on the person who has uploaded the data. They are indicating that their data is clean. See http://guides.dataverse.org/en/4.16/installation/config.html#datasetpublishpopupcustomtext
   * (Gustavo) We have workflows that can be triggered by the publish button. It's used by SBGrid to move files around on a filesystem after a user clicks "Publish" but before publication is complete. You could, in theory, add a workflow to check for identifiable data before publication is complete.
* (Gustavo) We released Dataverse 4.16.
* (Phil) We are focusing on automated testing in the current sprint and if you would like to help with https://github.com/IQSS/dataverse-jenkins please get in touch! Gustavo and I are meeting with Don Sizemore on Wednesdays at 3pm

Don Sizemore

unread,
Sep 13, 2019, 9:07:59 AM9/13/19
to dataverse...@googlegroups.com
Jamie,

We run Simson Garfinkel's Bulk Extractor against our Dataverse "files.dir" hierarchy
In its voluminous output it correctly identified one datafile, of which we already knew.

RPMs are available here:

I hope this helps,
Donald

Sonia Barbosa

unread,
Sep 13, 2019, 12:29:12 PM9/13/19
to dataverse...@googlegroups.com
Don, I'm interested in trying that tool on our production site as an initial check of data. 

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/CAPfMOayacGCfPV1AK%3Djm1cgQM-Mc8D%3DQpoc3MvNB%2BNfSgGZuSA%40mail.gmail.com.

Julian Gautier

unread,
Sep 13, 2019, 12:49:13 PM9/13/19
to Dataverse Users Community
James Turrito from JPAL pointed invited the community to check out (and if possible contribute to!) tools on Github available in both Stata and R for flagging variables in tabular data that it suspects might contains PII. "The R version is pretty basic in that it works on string matching of variables. The stata version is a little more "intelligent" and has some additional features to identify/flag variables that have high unique counts."
Reply all
Reply to author
Forward
0 new messages