notes from 2020-02-25 Dataverse Community Call

29 views
Skip to first unread message

Philip Durbin

unread,
Feb 27, 2020, 7:01:58 PM2/27/20
to dataverse...@googlegroups.com

2020-02-25 Dataverse Community Call

Dataverse Community Call 1 Agenda

* Potential Data Integrity Issue (https://groups.google.com/forum/#!topic/dataverse-community/mhlq6m-iyRY)
* 4.20 Release
* GDCC Updates
* Community Questions  

Dataverse Community Call 1 Attendees

* Danny Brooke (IQSS)
* Phil Durbin (IQSS)
* Julian Gautier (IQSS)
* Gustavo Durand (IQSS)
* Marina Mcgale (ADA)
* Janet McDougall (ADA)
* MingJing Peng (ADA)

Dataverse Community Call 1 Notes

* Potential Data Integrity Issue ( https://groups.google.com/forum/#!topic/dataverse-community/mhlq6m-iyRY )
* 4.20 Release
   * Multiple storage locations
   * Direct upload to S3
   * (Janet) We've been laying low but doing cool archival stuff, linked data, etc. Internally based at the moment.
* GDCC Updates
   * Not selected for Google Summer of Code: https://groups.google.com/d/msg/dataverse-community/2VyO0lfKB1g/ksaC9DqYCAAJ
   * Jim Myers worked on multiple storage locations, direct uploads to S3
* Community Questions
   * Question from Yuyun & Schusie, NTU, Singapore (sorry, can’t attend due to schedule clash):
      * Does Dataverse has a roadmap to make Related Publication’s identifier (DOI, Handle) to hyperlink automatically, without having to fill up the URL? Example: Users key-in DOI and DOI number. Without the URL, this won’t be a hyperlink/clickable.
         * (Phil) This issue (fixed in Dataverse 4.17) seems related: https://github.com/IQSS/dataverse/issues/6202 . https://researchdata.ntu.edu.sg seems to be running Dataverse 4.16.
      * Is there a way to display the statistics (number of downloads) for sub-dataverse?
We are thinking of a consortium model for Singapore (similar to Texas Data Repository: https://dataverse.tdl.org/) and would like to enable each institution (sub-dataverse) to view their own total downloads.
         * (Phil) I don't know. Maybe one of the tools listed at http://guides.dataverse.org/en/4.19/admin/reporting-tools.html would help?

Dataverse Community Call 2 Agenda

* Potential Data Integrity Issue ( https://groups.google.com/forum/#!topic/dataverse-community/mhlq6m-iyRY )
* 4.20 Release
* GDCC Updates
* Community Questions

Dataverse Community Call 2 Attendees

* Danny Brooke (IQSS)
* Oliver Bertuch (FZJ)
* Jim Myers (GDCC, QDR)
* Frank Smutniak (TDL)
* Phil Durbin (IQSS)
* Paul Boon (DANS)
* Menko de Ruijter (DANS)
* Sherry Lake (UVA)
* Jamie Jamison (UCLA)
* Tania Schlatter (IQSS)
* Anna Dabrowski (TACC)

Dataverse Community Call 2 Notes

* (Danny) Potential Data Integrity Issue ( https://groups.google.com/forum/#!topic/dataverse-community/mhlq6m-iyRY )
   * (Danny) It's likely you haven't been affected but we provide steps to run a script to detect any cases. If you do find any cases in your investigation, please feel free to reach out to sup...@dataverse.org if you'd like any help resolving them. We are adding database constraints in 4.20 to prevent this in the future. We are also planning more data integrity checks on publish.
   * (Paul) I used the script to check the database and we had something like 440 data files affected. So it's not a small problem, I guess. These are all from datasets from 2016 so I'm wondering if it may have something to do with the migration from DVN 3.x.
   * (Danny) Please email the details to sup...@dataverse.org
   * (Paul) Is this a problem now or only in 4.20?
   * (Danny) It's a problem now. The database constraint in 4.20 will prevent you from upgrading if you don't clean up first.
* (Danny) 4.20 Release
   * New role management APIs, adding, revoking.
   * Private URL enhancements: can now preview and explore.
   * Updating Solr (contributed by FZJ).
   * New API for getting the size of datasets.
   * Multiple stores for files (contributed by TDL)
      * (Jim) Files in more than one place, such as S3 and Swift. Note that there is a change required while upgrading. You need to set JVM options per store now. Release notes explain the details. With this feature you could big files in one store, for example, and small files in another.
   * Direct S3 upload for files (also contributed by TDL).
      * (Jim) Mirror what we have for downloads, forwarding the browser to the storage instead of pushing through application. Browser gets a temp URL that it can use to send files to S3. Works the same as the current uploader, but files are not ingested (would kind of defeat the purpose). Intended for larger files going into Dataverse. 40-60 GB files have been tested through direct upload. Lightweight way to handle larger file transfers. Supported by the DV Uploader tool as well via a flag. Kudos to all the testing at IQSS. Lots of little bugs were found. I appreciate the help. The process and the people worked.
   * Hopefully this will go out in a few weeks.
* GDCC Updates
   * (Jim) Not selected for Google Summer of Code: https://groups.google.com/d/msg/dataverse-community/2VyO0lfKB1g/ksaC9DqYCAAJ . The competition is stiff. There's lot of large projects like Apache, Fedora, etc. The science projects were working with Google Earth or all of geology or astronomy.
   * (Jim) GDCC github repositories news - https://github.com/GlobalDataverseCommunityConsortium now has a bunch of new repositories that used to live elsewhere such as QDR. There's a new policy document explaining that these are community managed rather than GDCC managed. File Previewers can now be translated and there are only a few strings (maybe 15).
   * (Jon) GDCC is starting to develop workshops for the community meeting in June. We are accepting ideas. Please just send me an email.
   * (Jon) We had Portugese added recently and Brazilian Portuguese is coming.
   * (Danny) We are working on a theme for the community meeting, keynote speakers, etc.
* Community Questions
   * (Oliver)
      * Things I’d like to talk about:
         * Jülich DATA coming (no. 55!)  https://data.fz-juelich.de
         * Slava also mentioned another installation is coming.
         * DataCite Test DOI URLs https://github.com/IQSS/dataverse/issues/6677
            * Willing to contribute. UI team trigger?
            * (Danny) The typical path is that Tania, Gustavo, and I have a meeting every Tuesday morning and then I'll make a decision about priorities.
         * Legal notice link https://github.com/IQSS/dataverse/issues/6676
            * Legal issue for Jülich DATA/Germany. Willing to contribute. Small thing. UI team trigger?
         * Favicon Branding (See https://groups.google.com/forum/#!topic/dataverse-community/wjH_dPaoSAY)
            * Willing to contribute. Also small, like legal notice 6677. UI team trigger?
         * Umlaut & Unicode bug https://github.com/IQSS/dataverse/issues/6675
            * Annoying bug. Willing to contribute, but no idea where to start. Help? Seems small.
      * Things I’m interested to talk about:
         * dvcli & first plugin in dev (K8s)
            * dvcli transferring to GDCC?
   * (Slava) New project called EOSC Synergy ( https://www.eosc-synergy.eu ). Partners from 10 countries will have a new installations coming to Dataverse, new countries on Dataverse map.
   * (Oliver) dvcli came about because I'm often typing commands to manage my Dataverse installation. That way I don't have to look up the curl command every time. I use similar tools for Kubernetes and OpenStack with bash completion and all the other goodies. This could be extended to more things. I'm using Python so I don't have to write a bunch of boilerplate code. `pip install dvcli` and then install plugins with `pip install dvcli-kubernetes`. I have a Keypass store for my secrets. I wrote a script for dvcli to read those secrets and push them into the cloud.
      * (Phil) Can you talk more about profiles for developers?
      * (Oliver) Yes, maybe a plugin for developers. We can talk about the architecture for this. I was happy that Ana from IQSS sounds willing to help.
      * (Phil) What about a GUI? There are ways to spin up AWS instances (bash script) and branches using the command line but this is a barrier for some folks. If we can make it easier it would be great.
      * (Oliver) Really crazy idea: since the installer was just rewritten from Perl to Python, it could become a plugin. `pip install dvcli-installer` or something.
      * (Danny) What's the next step?
      * (Oliver) For now, let's collect ideas in the Google Doc linked from https://github.com/poikilotherm/dvcli/issues/1
      * (Oliver) Jim, do you or Jon feel like this should be transferred to GDCC?
      * (Jim) Sure, it's probably a good place for it. It's easy to move repos from one GitHub org to another. pyDataverse may come to GDCC.
      * (Oliver) Should I not compete with DVUploader?
      * (Jim) Don't worry about competing. People can use whatever tool they want. DVUploader is designed to run on a client machine. It won't curl into an admin API on localhost on a Dataverse server.
      * (Paul) It's also useful for curation purposes. You can automate metadata enhancements.
         * Interested in discussing curation use cases (Anna Dabrowski, adabr...@tacc.utexas.edu )

Julian Gautier

unread,
Feb 27, 2020, 9:35:06 PM2/27/20
to Dataverse Users Community
Does Dataverse has a roadmap to make Related Publication’s identifier (DOI, Handle) to hyperlink automatically, without having to fill up the URL? Example: Users key-in DOI and DOI number. Without the URL, this won’t be a hyperlink/clickable.

This question could be addressed in the GitHub issue at https://github.com/IQSS/dataverse/issues/5277. I wonder how depositors prefer to enter DOIs. What if they prefer entering the URL form? There was a push to promote the URL form of DOIs and discourage the display of the "canonical" form, so wouldn't we expect people to more easily grab a DOI URL? And then Dataverse could figure out that it's a DOI (some regex magic?) and populate the different fields in different metadata exports.
Reply all
Reply to author
Forward
0 new messages