notes from 2019-01-14 Dataverse Community Calls

41 views

Skip to first unread message

Philip Durbin

unread,

Jan 14, 2020, 4:56:44 PM1/14/20

to dataverse...@googlegroups.com

Great calls! Here the notes from https://docs.google.com/document/d/1rpLVU_booGE1gnrlFeli-iu-hyFrt08Hb9PKASs7X2k/edit?usp=sharing

2020-01-14 Dataverse Community Call

Dataverse Community Call 1 Agenda

* 4.19 - OIDC, Python Installer
* Dataset and File Redesign Testing ( https://validately.com/unmoderated/6716a601-2d91-11ea-a2f1-42010af00531 )
* Move to Zoom for 1/28 Call
* Community Questions

Dataverse Community Call 1 Attendees

* Danny Brooke (IQSS)
* Gustavo Durand (IQSS)
* Marina McGale & Janet McDougall (ADA)
* Guang Yuan (NTU)
* Phil Durbin (IQSS)
* Jim Myers (QDR, TDL, GDCC)

Dataverse Community Call 1 Notes

* 4.19 - OIDC, Python Installer
* Dataset and File Redesign Testing ( https://validately.com/unmoderated/6716a601-2d91-11ea-a2f1-42010af00531 )
* Jump into the prototype AJPS dataset page: https://sketch.cloud/s/VYGYr/a/ZonZ7Z/play
* Jump into the prototype SBGrid dataset page: https://sketch.cloud/s/VYGYr/a/AMROE5/play
* All pages: https://sketch.cloud/s/VYGYr
* Move to Zoom for 1/28 Call
* Community Questions
* (Schusie NTU) 4.16 - Viewing Citations
* (Danny) We haven't turned on Make Data Count (MDC) yet at Harvard. We are tracking this at https://github.com/IQSS/dataverse.harvard.edu/issues/3
* (Jim) We haven't turned on MDC for *display* yet but we are logging, which I recommend. In 4.18 you can log without displaying the MDC metrics.
* (Schusie) Metrics panel for non-MDC installations?
* (Danny) Some installations use Handles, etc.
* (Schusie) From the 4.16 release notes: "As a user, I'll encounter fewer locks as I go through the publishing process." What does this mean?
* (Danny) This was a bug fix for https://github.com/IQSS/dataverse/issues/5979
* (Schusie) I looked through "the 4.16 milestone in Github" but I'm wondering if the use cases could be linked to the GitHub issues?
* (Danny) Yes, we can do that for 4.19, for example.
* (Schusie) Rationale behind HTML Codebook Export? Human friendly. Who is the target audience? Does it help the batch metadata export (text serializations that are not human friendly)?
* (Danny) Partners with the data itself to describe the data. The target is researchers, people discovering the dataset and wanting to learn about the data.
* (Janet) That's a classic codebook from social sciences. It's exporting the metadata so you can see the variables and make decisions about which variables you're interested in.

Dataverse Community Call 2 Agenda

* 4.19 - OIDC, Python Installer
* Dataset and File Redesign Testing ( https://validately.com/unmoderated/6716a601-2d91-11ea-a2f1-42010af00531 )
* Move to Zoom for 1/28 Call
* Community Questions
Dataverse Community Call 2 Attendees

* Danny Brooke (IQSS)
* Craig Willis (NCSA/UIUC, Whole Tale)
* Deirdre Kirmis, (Arizona State University)
* Julian Gautier (IQSS)
* Phil Durbin (IQSS)
* Slava (DANS/DataverseEU)
* Stefan Kasberger (AUSSDA)
* Tania Schlatter (IQSS)
* Gustavo Durand (IQSS)
* Frank Smutniak and Courtney Mumma (TDL)
* Oliver Bertuch (FZJ)
* Anna Dabrowski (Texas Advanced Computing Center, University of Texas at Austin)
* JIm Myers (GDCC, QDR, TDL)

Dataverse Community Call 2 Notes

* 4.19 - OIDC, Python Installer
* Hoping to release late this week or early next week.
* Dataset and File Redesign Prototype Testing
* Link to remote, unmoderated online prototype test: ( https://validately.com/unmoderated/6716a601-2d91-11ea-a2f1-42010af00531 )
* Participants MUST select “Yes” to screener questions about using data and conducting research to pass through to the study
* Designed mostly for researchers.
* Jump into the prototype AJPS dataset page: https://sketch.cloud/s/VYGYr/a/ZonZ7Z/play
* Jump into the prototype SBGrid dataset page: https://sketch.cloud/s/VYGYr/a/AMROE5/play
* All pages: https://sketch.cloud/s/VYGYr
* Let Tania know any feedback and questions – you don’t have to complete the study to give feedback: tschl...@g.harvard.edu
* Move to Zoom for 1/28 Call
* Community Questions
* (Courtney) I want to introduce Frank Smutniak, a developer at TDL. He'll be our lead developer for Dataverse and will attend the 2020 Dataverse Community Meeting.
* Welcome! See you on Github!
* (Deirdre) From Arizona State University. Piloting Dataverse. Any tricks to putting Solr on its own server?
* Don would love to throw together a “best practices in AWS” page, and welcomes input and suggestions/gotchas from the community.
* There are some security considerations that should be taken into account when having solr on a different server (Phil)
* (Phil) I would suggest trying curl http://localhost:8983/solr/collection1/schema/fields or other curl commands mentioned at http://guides.dataverse.org/en/4.18.1/developers/tips.html#solr
* (Craig): I'm working on a project and want to look at dataset metrics for all datasets in a journal Dataverse (AJPS). I can scrape the values from the dataset page, but was suggested to file an issue to have a query run. I just want to understand the differences.
* (Danny) You are welcome to open a ticket to have us run a query by emailing sup...@dataverse.harvard.edu
* (Craig) I'm already scraping so maybe I'll keep doing that.
* (Julian) Scraping will get you the aggregate “download” metrics for each dataset (or file if you can scrape file downloads in the file table of each dataset page). Querying the database could get you additional information like when the downloads happened and what kind of “download” it was (when people click “Explore” on certain file types, e.g. to open an ingested tabular file in the Data Explore tool, that’s counted as a download as well).
* (Craig): This is exactly the info I was looking for -- so it seems to make sense to open the ticket. Thank you.
* (Stefan): ORCID integration → any plans, needs by others?
* (Danny) No plans from IQSS
* (Phil) … beyond the current integration for authentication which fills in the ORCID info in the dataset
* (Phil) There's "update users' ORCID record on dataset publication" at https://github.com/IQSS/dataverse/issues/3490
* (Jim) Isn't it also in DataCite? I think from ORCID you might be able to fetch your publications from DataCite. (may only be Scopus and/or CrossRef - but could be a model).
* (Stefan) We’ll discuss it in Tromso. Gustavo will be there.
* (Slava): We’re working on Weblate as a service running on AWS to get all languages synchronized for Bundle.properties, etc. and interested to discuss this workflow with Canadian Dataverse Consortium.
* (Danny) Slava, I'll get you in touch with people from Scholars Portal.
* (Stefan): Jenkins Tests available from others in a public repository or something similar?
* (Danny) Will ask Don to share anything relevant
* (Phil) There are some old Selenium tests here: https://github.com/IQSS/dataverse/tree/v4.18.1/tests
* (Phil) Some students from Zurich contributed tests using Cypress last year: https://github.com/IQSS/dataverse/tree/v4.18.1/tests/cypress
* (Stefan) I like the new conversation about standardizing test data: https://github.com/IQSS/dataverse-client-r/issues/44
* (Jim): Multiple stores and direct to S3 upload PRs are out
* (Multiple File Stores) https://github.com/IQSS/dataverse/pull/6488
* There used to be only one store. "file://" or "s3://". In the pull request, by adding additional config, you can have multiple S3 stores, multiple file system stores, etc.
* The "direct upload to S3" pull request builds on the "multiple stores" pull request. You could direct big files to a certain S3 store.
* (Direct Upload) https://github.com/IQSS/dataverse/pull/6490
* Uploads file direct to S3, then lets Dataverse know that the file is there.
* Slight change to the user interface when using direct upload.
* Non unzipping in the current pull request when using direct upload.

Philip Durbin
Software Developer for http://dataverse.org
http://www.iq.harvard.edu/people/philip-durbin

Reply all

Reply to author

Forward

0 new messages