Notes from
https://docs.google.com/document/d/1wnZuSMTZYsRMt52QrjoLZRYcH4FCJxr6FYb4EtzMZjE/edit?usp=sharing2017-12-19 Dataverse Community Call
Agenda
* Community Questions
Attendees
* Danny Brooke (IQSS)
* Courtney Mumma (TDL)
* Pete Meyer (HMS)
* Julian Gautier (IQSS)
Notes
* Community Questions
* Sherry will miss the call today, but still taking text (paragraphs) for the Open Repository Panel Proposal:
https://docs.google.com/document/d/1ouoLnwLZK2pS8GiibHne_hKse0rxrZynJ13QQqxTsZw/edit Can someone from IQSS write a paragraph about Dataverse Community (maybe how Dataverse made a community and what the community has done to promote and enhance dataverse)? If not, I can make something up.
* (Danny) Sure thing - I added some text
* Question about Restricted files - I thought there was more info on the dataset page about what one needs to do (log on) before accessing (downloading) restricted files - marked with lock. Now I do not see any text about how to get access to the restricted (locked) files.
* (Derek) I’ll shoot you an email and we can discuss this in more detail.
* (Courtney) TDR upgrading to 4.8.4 and aligned with Dataverse codebase (no forking); unlikely to update until 5.0; might use script for moving content to S3 storage so they can handle larger files
* (Danny) Team working on script in #4321, planned for next two-week sprint
* (Courtney) TDR also experimenting with DCM after the holidays to better handle larger datasets, e.g. datasets with upwards of 1000 small files and datasets with ~10gig files
* (Pete) No way for one installation to support DCM and local uploads at the same time, but being planned.
* (Danny) Haven’t discussed timeline for this, but SBGrid/Harvard Medical School does have interest in it as a next step. Harvard Dataverse is also interested in this. Stay tuned for the 2018 roadmap.
* (Danny) What about other “backdoor” ways of uploading large datasets/files?
* (Courtney) They’re aware of methods for this. Link to docs in github.
* Steps:
* You should upload a small dummy file to the dataset that you'd like to house the file. Call the file the same name as the file to upload. Repeat this process with as many files as you'd like to put onto Dataverse. This way, you can organize the files in a way that works best for curation/preservation/sharing purposes. Then, get the big files to the dataverse administrators on a thumbdrive (or some other way) and provide the Dataverse links for each dummy file you'd like to replace. From there, we'll
* scp the large files to the production file system, copying it over for each small uploaded file
* recalculate the md5 on it
* recalculate file size
* adjust mime-type
* update these values in the db for the datafile