Integrations dashboard for pulling in data from data management systems (iRods, Gitlab, Github, OSF, RedCap...)

133 views
Skip to first unread message

Dieuwertje Bloemen

unread,
Feb 6, 2023, 12:12:26 PM2/6/23
to Dataverse Users Community
Hi all,

In tommorow's community call, our team at KU Leuven LIBIS will be demoing a new dashboard we've been working on for the last couple of months.
Instead of trying to set-up dataverse plug-ins in existing tools and systems to push data to a dataverse installation, we've flipped it around and have create a dashboard from which we pull data from tools such as iRods and Github into a dataset.
It's aim is to make integrations more flexible and less dependent on the integration’s provider’s cooperation. You can use it to either create a dataset from scratch and add metadata after files have been transferred or you can use it to compare what is already in an existing dataset to make updating files in datasets easier.
We're hoping to gather some feedback and input during tomorrow's call as the dashboard is not completely finished yet, but if you can't attend, a recording will be made available and you can already check out https://github.com/libis/rdm-integration for documentation, current code and an example that you can install locally to test with demo.dataverse.org.

Our aim is to make it adjustable for your needs and easy to connect other systems to as well. For example, our next integrations will be RedCap and OSF to cater to the needs of our biomedical researchers and our researchers in the humanities.

So, hopefully I'll see you tommorow & kind regards,
Dieuwertje

Philip Durbin

unread,
Feb 7, 2023, 12:50:20 PM2/7/23
to dataverse...@googlegroups.com
Hi Dieuwertje and Eryk,

Thank you for the amazing demo! GitHub, GitLab, iRODS, local filesystem... wow! The ability to pull data into Dataverse is very powerful.

I just put the video on DataverseTV: https://dataverse.org/dataversetv


(Please send me a link to the slides when you get a chance.)

As discussed, this thread is a good place for further questions and comments. I hope others join in.

As for me, as I said during the demo, I'm intrigued by how one can transfer files from a local computer, a laptop, for example. For years, thanks to Jim Myers, we've had DVUploader, but your tool has a GUI, which I'm sure many people will appreciate.

I like that once you start the transfer, you can close your web browser. Great feature.

We should go ahead and update the integrations page in the guides, especially to list tools like GitLab and iRODS that aren't currently listed: https://guides.dataverse.org/en/5.12.1/admin/integrations.html Please feel free to open an issue or pull request for this!

Yes, yes, we'll prioritize the pull requests you mentioned. Thanks for making them!

I'm wondering a bit about auth. You're using our established patterns but there's a new signed URLs option you might want to consider. This is a 5.13 feature (not released yet) but here's a preview of the docs: http://preview.guides.gdcc.io/en/develop/api/native-api.html#request-signed-url

We're also talking a lot about auth and how to enable a smoother experience for tools run outside of Dataverse. So let's keep talking. :)

I think it's cool that the tool is written in Go. I was wondering if you're using the "file replace" API but no big deal if you aren't.

Phew! I think that's enough from me. Again, great demo!

Thanks!

Phil


--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/6652f338-c151-4fdb-a9ae-86c300208161n%40googlegroups.com.


--

Dieuwertje Bloemen

unread,
Feb 8, 2023, 7:34:16 AM2/8/23
to Dataverse Users Community
Hi all,
Here's a link to the powerpoint we used for anyone who's interested: : 2023.02.07 (Dataverse community call - Integrations dashboard).pptx
Kind regards,
Dieuwertje

Eryk Kulikowski

unread,
Feb 8, 2023, 9:42:29 AM2/8/23
to Dataverse Users Community
Hi Phil,

I did look at the URL signing, and it will improve the tool, so I have started implementing it. As for the "replace file(s)" native API call, I think it will have the same problem as the "add files" call: https://github.com/IQSS/dataverse/pull/9003 (resulting in "Dataset store configuration does not allow provided storageIdentifier" when not using s3 file system). Once the pull request is merged, I will replace the SWORD API "add files" with the "add file" and "replace file" native API calls.

I would like to also replace the SWORD API call for deleting a file http://preview.guides.gdcc.io/en/develop/api/sword.html#id17
However, I do not see any corresponding call in the native API. Did I miss something, or it is not implemented? Maybe we should add it, if it is not implemented yet?

Eryk

Philip Durbin

unread,
Feb 8, 2023, 10:21:48 AM2/8/23
to dataverse...@googlegroups.com
Sadly, a native "delete file" API doesn't exist: https://github.com/IQSS/dataverse/issues/3913

Pull requests welcome!! 😅

Thanks for your thoughts on "replace file" and signed URLs.

Go, go, go!

Phil

Eryk Kulikowski

unread,
Feb 8, 2023, 11:51:39 AM2/8/23
to Dataverse Users Community
Hi,

I did implement the signed URLs, however, I run into some problems. I have tested it on the "develop" branch, the signing itself works. Furthermore, I know that I pass the correct user, because if the username is not existing, I get an error. I know that the user created the DRAFT dataset, because it is me. And it works for downloading the files (even from DRAFT dataset):
- /api/access/datafile/xxx

However, these APIs's give me errors (mostly "dataset not found" (because it is DRAFT?), except for the "retrieve" that complaints about the call not being authenticated, but I guess it is not meant to be used that way):
- /api/v1/mydata/retrieve
- /api/datasets/:persistentId/versions/:latest/files
- /api/datasets/:persistentId
- /api/admin/permissions/:persistentId

The username that I need for this to work comes in the header "Ajp_uid". We have our version of the integration tool in pilot deployed behind a reverse proxy protected with Shibboleth, and I just needed to add "ShibUseHeaders on" to the configuration to get that header on the backend. I have added the frontend and backend configuration (with some documentation) to make it work in the future. However, we would need to get the URL signing first to work for DRAFT datasets, and some alternative way of getting the list of datasets where the user has write permissions (/api/v1/mydata/retrieve is used for now, but it is not intended to be used outside the official frontend of Dataverse). Otherwise, the signed URLs are easy to configure and to implement. This is a nice feature.

I will make an issue for the URL signing in the context of DRAFT datasets. I do not know if there is already something for the "retrieve" API for listing the datasets of the user?

Eryk

James Myers

unread,
Feb 8, 2023, 12:18:30 PM2/8/23
to dataverse...@googlegroups.com

Hmm. The validation of signedURLs is all in one place so it should work uniformly for most api calls (there could be ones that don’t use the common findUserOrDie() methods). One thing I’d note are that signedURLs should all be generated with the /v1/ in the URL. Calls to /api/datasets for example are all translated to /api/v1/datasets in a servlet before they are checked. Another is that the signing is for the URL and all it’s parameters, so one would  sign “/api/v1/datasets/:persistentId?persistentId=doi:105072/FK2ABCDEF” for example. Finally, signedUrls don’t change the extra security around the /api/v1/admin calls – those are still restricted to localhost or the extra admin key.

 

If these notes don’t help, please post an issue(s) with the details of exactly what you are signing and I can try to replicate that and figure out what’s wrong.

 

--   Jim

James Myers

unread,
Feb 8, 2023, 12:21:16 PM2/8/23
to dataverse...@googlegroups.com

Re: This is a nice feature.

Thanks! I should also add that I think you’re the first person outside Harvard trying to use them. I hope we can switch all the Previewers to use them as well, but that hasn’t happened yet.

-- Jim

 

From: dataverse...@googlegroups.com <dataverse...@googlegroups.com> On Behalf Of Eryk Kulikowski
Sent: Wednesday, February 8, 2023 11:52 AM
To: Dataverse Users Community <dataverse...@googlegroups.com>
Subject: Re: [Dataverse-Users] Integrations dashboard for pulling in data from data management systems (iRods, Gitlab, Github, OSF, RedCap...)

 

Hi,

Eryk Kulikowski

unread,
Feb 8, 2023, 1:09:04 PM2/8/23
to Dataverse Users Community
Hi Jim,

Adding "/v1/" did fix the problem. Only the "/api/v1/mydata/retrieve" does not work, but it does not use findUserOrDie().

Thanks!

Eryk

James Myers

unread,
Feb 8, 2023, 5:44:30 PM2/8/23
to dataverse...@googlegroups.com

Cool! I reported the mydata calls in https://github.com/IQSS/dataverse/pull/9360 so hopefully the update there can include them, after which signedUrls should work for it.

Eryk Kulikowski

unread,
Feb 10, 2023, 12:30:18 PM2/10/23
to Dataverse Users Community
Hi,

Quick update: I have replaced the SWORD API calls by the native API calls in the integration tool. For that purpose, I have also created this pull request for deleting files using native API:

Kind regards,
Eryk

Eryk Kulikowski

unread,
Feb 20, 2023, 9:36:48 AM2/20/23
to Dataverse Users Community
Hi,

I have just released a new version of the dashboard (https://github.com/libis/rdm-integration/releases) with two new plugins: OSF and REDCap.

For some reason, I was not able to upload ZIP files with add/replace native API. For example, when replacing, I get an error that I cannot replace one file by multiple files (it looks like the API tries to unzip the file and add the content, i.s.o. simply adding the zip file). Maybe I am doing something wrong? All other file types work with add/replace native API. For zip files, I have implemented a workaround via SWORD API. However, the new release also supports URL signing that cannot be activated until all SWORD calls are replaced (upload for zip files and files deleting via native API).

The release notes:

Added two new plugins:
- OSF
- REDCap

Other changes:
- Replaced SWORD API file uploading by add/replace native API (except for zip files: SWORD is still used, as native API gives errors).
- URL signing (can be activated once that SWORD API is no longer used for deletes and file uploads).
- Error messages on job failure can be emailed to users (after configuring SMTP settings). Also, mails can be sent after successful completion, if chosen so by the user.
- Refactorings for reusability and readability.
- Several bug fixes.

Kind regards,
Eryk

Op vrijdag 10 februari 2023 om 18:30:18 UTC+1 schreef Eryk Kulikowski:
Reply all
Reply to author
Forward
0 new messages