Community Call Tomorrow (12/4) Noon EST - NOTE New Connection Options

24 views
Skip to first unread message

danny...@g.harvard.edu

unread,
Dec 3, 2018, 5:26:53 PM12/3/18
to Dataverse Users Community
Hi everyone,

We'll have our regular call tomorrow. Check out the agenda here, which includes notes on how to participate:


Cheers,

Danny


Philip Durbin

unread,
Dec 4, 2018, 6:25:54 PM12/4/18
to dataverse...@googlegroups.com
Great call today! I hear it was recorded so hopefully we'll be able to put it on YouTube or whatever but here are the notes from https://docs.google.com/document/d/1EXKqARyxdPWW7yg3TPi9-QvXME44mwcyXkogxRGoFXk/edit?usp=sharing

2018-12-04 Dataverse Community Call 
 
Agenda
 
* Code Deposit (GitHub integration)
* File Hierarchy
* Community Questions
 
Attendees
 
* Danny Brooke (IQSS)
* Oliver Bertuch (FZJ)
* Jim Myers (QDR, TDL)
* Julian Gautier (IQSS)
* Courtney Mumma (TDL)
* Craig Willis (NCSA/Whole Tale)
* Kacper Kowalik (NCSA/Whole Tale)
* Sherry Lake (UVA)
* Pete Meyer (HMS)
* Derek Murphy
* Jaime Jamison 
* Philip Durbin (IQSS)
* Jon Crabtree (Odum)
* Tania Schlatter (IQSS)
* Mike Heppler (IQSS)
 
Notes
 
* Code Deposit (GitHub integration): https://github.com/IQSS/dataverse/issues/2739 
   * Phase 1 - Import a zipped GitHub repo into dataverse as a .zip file. Simply gets files from GitHub into Dataverse, no frills.
      * (Sherry) Will the zip file get unzipped?
         * (Danny) Phase 1: No.
   * Phase 2 - Similar to Zenodo’s functionality. Allows the researcher to tie their Dataverse account to their GitHub account. Can associate a dataset in Dataverse with one of their GitHub repos. When a release happens in GitHub, it triggers an update to the associated dataset in Dataverse.
      * (Jim) What happens if the researcher has new software, how do they specify which dataset it goes into?
         * (Danny) In phase 3, we hope to address this - software is its own dataset.
   * Phase 3 - Add software specific metadata, align with DataCite categories of data, (??), software. Can add a dataset as software. Full support for software to be deposited AS software and sent to DataCite with that appropriate category.
      * (Craig) In phase 3, are we looking at the deposit as a .zip file? From the Whole Tale perspective, code in a zip file makes sense but you miss out on file specific metadata.
      * (Pete) Jon Crabtree left a comment at https://github.com/IQSS/dataverse/issues/4714#issuecomment-443345176 saying we should converge on a standard. The code deposit needs to play nicely with other integrations.
      * (Jim) Keep things together but annotate at the proper level. Keep code separate but be able to reference it. Metadata with a pointer to another object
      * (Sherry) Jamie, were you talking about the R packages used in the R code?
         * (Jamie) R programmers will create "data packages" and upload them to CRAN for use with their code. (vs versions of “other” packages used – both are possible scenarios)
         * (Jon) Both are critical issues.
      * (Jon) The actionable items, the R code for example, are in the zip file that came from the GitHub repo. Phase 1 with the .zip is a good first step but we’ll want those files to be more actionable, maybe in phase 2 and 3.
      * (Jon) Linking between the data and code could be done using an external tool outside of dataverse, and then put back in Dataverse.
      * (Phil) Code Ocean as a graphical UI for creating Docker files is very useful
      * (Jon) Looking at using Gitlab, and Git in general is good for versioning, etc. Repo to Docker (?)
* File Hierarchy: https://github.com/IQSS/dataverse/issues/2249 
   * (Danny) Challenges in storage, versioning. Considering using the file hierarchy information as metadata, stored in the database, rather than having the files in a hierarchy on disk. Users would be able to view the hierarchy in a preview but a lot of the file display would still be in a table in the typical way. Users may not be able to move things around from the UI. On download, the original structure is recreated in the zip file they download. What do you think of this approach?
   * (Sherry) If I upload 5 files, can I create the structure on the Dataverse side.
      * (Danny) You could create the structure on the Dataverse side later.
   * (Jim) It sounds like you're trying to keep the main table. What about thousands of files? "Metadata only" sounds like a good first step but users might want to work with files in a tree.
   * (Oliver) You could run out of inodes, limitations.
   * (Pete) Significant downsides if you want direct compute access. You'd need a level of translation that's very robust since you'll be doing IO through it.
   * (Jon) Our plan for the TRSA stuff, anything in it will have its own interface. The data will stay in the file hierarchy within TRSA.
   * (Jim) Somewhat analogous to what you're doing with S3.
   * (Phil) Maybe with S3 you can mount with FUSE or something and do computation? I don't know.
      * (Pete) Cloud dv / swift had similar sounding use case of access to object store for compute outside dv
   * (Jim) Want to access using a DOI (code and data together). When accessing data via DOI maybe it doesn't matter how it's stored.
   * (Phil) In DVN 3 we had this feature of storing the path to each file in the database from a zip but it was buggy. I uploaded some screenshots to https://github.com/IQSS/dataverse-client-r/issues/18#issuecomment-327268497
   * (Jon) Our users killed us over the file hierarchy feature going away in Dataverse 4 so it would probably be good to bring it back.
   * (Oliver) Are you aware of Novell Filr? Maintaining a file table of external drives using Lucene. Stores metadata about files. By using a database index you can provide search, etc.
* Community Questions
   * Oliver: https://github.com/IQSS/dataverse/issues/5292 and related: small containers, option to bootstrap from code, non-persistent EJB timers, resources from code/config, Payara 5 vs. Eclipse Glassfish 5.1, dependency housekeeping
      * Interested in running on a Kubernetes deployment
      * (Craig) We've had some involvement with hacking on Dataverse in Docker and Kubernetes ( http://guides.dataverse.org/en/4.9.4/installation/prep.html#nds-labs-workbench-for-testing-only ). I've never heard of Payara. You've gone deep on the architecture of Dataverse, which is great but beyond what I know.
      * (Jim) Kubernetes and Docker in the future sounds like a good thing. I agree with not breaking "classic" as I've seen in IRC chat.
      * (Oliver) Sounds great so far. The community message seems to be "go ahead."
      * (Craig) Phil mentioned creating a separate git repo called "dataverse-kubernetes" or something: http://irclog.iq.harvard.edu/dataverse/2018-12-04#i_81390
      * (Oliver) A lot of the changes I need would have to be made in the main code base. Changes to the installer, for example. Maybe more bootstrapping from the code, if the root dataverse is not present.
      * (Jim) I'm in favor of changes to the code that don't break the classic installation but that provide benefit to people interested in Kubernetes.
      * (Danny) What are some good next steps? Part of the challenge is that we don't have experts in Kubernetes at IQSS.
      * (Jon) The folks in Australia have done some of this work. In Fudan I learned there are six universities using Dataverse. This is a challenging time for the community call for people in those time zones.
      * (Oliver) Leaving feedback in the issue above or sub issues is fine. I'm interested in connecting with core developers.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/b20a74d2-c236-47cc-95d5-1521a83afd60%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--

Philipp at UiT

unread,
Dec 6, 2018, 1:15:27 AM12/6/18
to Dataverse Users Community
Thanks for the call! Actually, I also attended it, for the first time :) Rotating the call time would be highly appreciated by European attendees. Another feedback: Using head phones and microphones increases the sound quality heavily!

Best,
Philipp (UiT)
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages