Rclone now supports Dataverse

38 views
Skip to first unread message

Philip Durbin

unread,
Jul 7, 2025, 12:10:40 PMJul 7
to dataverse...@googlegroups.com
Rclone ("rsync for cloud storage") is a command-line program to sync files and directories to and from different cloud storage providers.

As of version 1.70[1] Rclone supports Dataverse! Check out the docs at https://rclone.org/doi/

We have the Renku team to thank for this new feature. (Flora Thiebaut coded[2] it up.) They added it as part of the integration between Renku and Dataverse, but we can also use Rclone directly with Dataverse without Renku in the middle. Rok Roškar (also from Renku) demo'ed this during the community call last week. You can watch the recording at https://groups.google.com/g/dataverse-community/c/2RZWXsKggSE/m/qBFGZqrKAwAJ

Thanks to the Renku team for this contribution!

Phil

p.s. I'm adding Rclone to the list of integrations the Dataverse guides in https://github.com/IQSS/dataverse/pull/11609



--

Philip Durbin

unread,
Jul 7, 2025, 4:09:35 PMJul 7
to dataverse...@googlegroups.com
I played around a little with Rclone. I was able to access data from my dataset, but I still have more to learn! I tested Rclone v1.70.2.

Being a Mac user, I started with `brew install rclone` but this eventually lead to the following error when I tried the "mount" command Rok demo'ed: "2025/07/07 15:37:13 CRITICAL: Fatal error: failed to mount FUSE fs: rclone mount is not supported on MacOS when rclone is installed via Homebrew. Please install the rclone binaries available at https://rclone.org/downloads/ instead if you want to use the rclone mount command"

Yikes! Nevermind! I ran `brew uninstall rclone` and downloaded the binary from the URL suggested in the error above.

To configure Rclone for Dataverse datasets, use the DOI remote. https://rclone.org/doi/#configuration shows you want to do but I'll highlight a couple steps:

```
% rclone config
2025/07/07 15:33:18 NOTICE: Config file "/Users/PDurbin/.config/rclone/rclone.conf" not found - using defaults
No remotes found, make a new one?
n) New remote
s) Set configuration password
q) Quit config
n/s/q> n

Enter name for new remote.
name> osh

Option Storage.
Type of storage to configure.
Choose a number from below, or type in your own value.
```

(I chose "osh" simply as a shorthand for my "Open Source at Harvard" dataset.)

When presented with a loooong list of storage options, I picked "DOI datasets" by its number (13 as of this writing).

```
Storage> 13

Option doi.
The DOI or the doi.org URL.
Enter a value.
doi> https://doi.org/10.7910/DVN/TJCLKP
```

After quitting the config thing. I was able to see the config like this:

```
% rclone config show osh
[osh]
type = doi
doi = https://doi.org/10.7910/DVN/TJCLKP
```

And list files in my dataset like this:

```
% rclone ls osh:
    21119 data/2023-01-03.tsv
      590 code/language.py
   283920 data/primary/primary-data.zip
```

I created a directory called "foo" to mount the files from my (remote) dataset into:

```
% mkdir foo
```

Again, the "rclone mount osh: foo" command didn't "just work" (see above) but at https://forum.rclone.org/t/mount-apple-m1-mbp-to-gsuite-drive-best-method-in-2024/44867/2 I found a suggestion to use nfsmount instead:

```
% rclone nfsmount osh: foo
2025/07/07 16:04:16 WARNING: context.Background: NFS writes don't work without a cache, the filesystem will be served read-only
2025/07/07 16:04:16 NOTICE: NFS Server running at 127.0.0.1:56555
```

Then, in a new terminal window, I was able to see and access my data:

```
% ls foo
code data
% find foo
foo
foo/code
foo/code/language.py
foo/data
foo/data/2023-01-03.tsv
foo/data/primary
foo/data/primary/primary-data.zip
% head -2 foo/data/2023-01-03.tsv
stars language updated issues size forks watchers repo created description
732 Java 2023-01-03 1345 123092 411 732 https://github.com/IQSS/dataverse 2013-11-01 Open source research data repository software
```

Pretty neat!

Phil

Reply all
Reply to author
Forward
0 new messages