notes for 2019-12-17 Dataverse Community Call

22 views
Skip to first unread message

Philip Durbin

unread,
Dec 17, 2019, 8:44:15 PM12/17/19
to dataverse...@googlegroups.com

2019-12-17 Dataverse Community Call

Dataverse Community Call 1 (2 AM UTC)

Dataverse Community Call 1 Agenda

* Community Questions

Dataverse Community Call 1 Attendees

* Danny Brooke (IQSS)
* Gustavo Durand (IQSS)
* Steve McEachern (ADA)
* Janet McDougall (ADA)
* Marina McGale (ADA)
* Guang Yuan (NTU)
* Phil Durbin (IQSS)

Dataverse Community Call 1 Notes

* Quick Announcements
   * 4.19, early next year (Danny)
   * 12/31 call cancelled (Danny)
   * GDCC Announcement - Jim Myers https://groups.google.com/forum/#!topic/dataverse-community/HW2YoEX_VQg (Danny)
* Community Questions
   * Flagging a concern about Kubernetes (Steve)
      * Gustavo/Steve to link here
   * Is the proper objecttype being sent to Datacite? (Susie)
      * We plan to address this as part of better support for software (Danny)

Dataverse Community Call 2 (3 PM UTC)

Dataverse Community Call 2 Agenda

* Large File Transfers
* Community Questions

Dataverse Community Call 2 Attendees

* Danny Brooke (IQSS)
* Gustavo Durand (IQSS)
* Oliver Bertuch (FZJ)
* Phil Durbin (IQSS)
* Leonid Andreev (IQSS)
* Don Sizemore (Odum)
* Sherry Lake (UVA)
* Courtney Mumma (TDL)
* Julian Gautier (IQSS)
* Tania Schlatter (IQSS)
* Jim Myers (QDR, TDL, GDCC)

Dataverse Community Call 2 Notes

* (Danny) Quick announcements
   * Dataverse 4.19 in the new year, maybe mid January.
   * Dec 31st community call is cancelled.
   * Jim Myers has joined the GDCC as senior developer and community architect: https://groups.google.com/d/msg/dataverse-community/HW2YoEX_VQg/9B0_WV5bBgAJ
* Community Questions
   * What was that question/concern about Kubernetes during call 1? (Oliver)
      * (Danny) Steve mentioned it in passing, plans to add a link.
      * (Gustavo) Huawei the Chinese company is very involved in Kubernetes. This would be a concern for the Australian government, especially for sensitive data.
   * How best should we answer the question "How many of the 53 installations of Dataverse are running the latest version?" (Phil)
      * (Phil) Add a "version" column to the crowdsourced spreadsheet and write a Python script as a check to see if it's out of date?
      * (Oliver) GitLab has an admin UI, tells you if you're running the latest version. Asks a webservice on their side, so they keep track of installations around, even behind firewalls.
* (Courtney & Jim) Large File Transfers
   * https://texasdigitallibrary.atlassian.net/wiki/spaces/TDRUD/pages/1001422849/Remote+Data+Storage+Design?atlOrigin=eyJpIjoiMTUwYTRiMjMyMTI2NDJiZTlhYTFiM2IxOGMwNDg3MWUiLCJwIjoiYyJ9
   * Jim has successfully attached remote data in S3 storage.
   * Need decisions on how the storage location is designed and designated in Dataverse. Two or more storage locations. What attribute should we use? Currently we are using the "affiliation" attribute but we want to switch to something else.
   * More institutions in Texas are interested in testing. The front end would be hosted by TDL. Storage and backups would be handled locally, by each institution.
   * Need decision about ingest. This is mentioned in the notes above.
   * (Jim) Can put content into S3 in its final location. This can be used with DVUploader. Working pretty well. In the GUI, I might be able to use the same panel. I'm close to getting this working. Do we want ingest on or off? Based on file size? What's a good community consensus on how to control these aspects?  
   * (Tania) Use cases? The data is on a user's machine?
   * (Jim) The data is too big to upload through Glassfish. Direct to S3 avoids Glassfish and avoids a temporary copy of the file. In the case of Texas where people might be generating big data in TACC the data is in TACC which has fast data transfers internally.
   * (Jim) If you do ingest, you need to pull a copy from S3 into Dataverse. Maybe it makes sense.
   * (Danny) Odum is working on modularizing ingest. Maybe someday it will run externally.
   * (Jim) As you think about shifting the architecture a bit, if ingest is happening closer to the access classes, that might be interesting. Interesting to think about how Globus would work.
   * (Courtney) Maybe we should just go ahead and do the pilot assuming all ingest functions so that we're not prohibiting development down the line. We could always flip the switch later. Maybe we should just assume ingest is on and stress test it.
   * (Jim) Maybe we could have a global switch.
   * (Jim) Accidental benefit: what if Dataverse could handle multiple stores? Using almost the same JVM options. Give it a name and a type. The database tables are pretty independent of the store. Instead of "kind of the store" it would be "the name of the store" like S3one, S3two, etc., instead of a generic S3. I hope this would be useful to people. Maybe people could be running a file store and then add some files to S3 in a second store. This could be a separate pull request.
   * Re: Ingest functions - you can already switch ingest on/off per size and/or file type
* Other Community Questions?
   * From TDL dev Nick Woodward -- checking in on an issue he reported: “there was a 'creator' key in the endpoint to retrieve a dataverse that has been replaced by an 'ownerId' key and it's unclear what it points to, either the authenticated users table or the built-in users table. In most cases it doesn't seem to point to either.” Courtney indicated that it's throwing off their reporting.
      * (Phil) You might be running into the "email address of dataverse creator visible in API" change in Dataverse 4.12: https://github.com/IQSS/dataverse/issues/5583 and https://github.com/IQSS/dataverse/pull/5655
      * (Gustavo) So Phil is correct above that the creator was removed (if you have hideEmail set to true) in PR 5655; however it is not being replaced by ownerId. OwnerId has always been there and represents the parent dataverse. (i.e. every object other than the Root Dataverse has an owner / parent)
Reply all
Reply to author
Forward
0 new messages