Dataverse Support in Python Library Pooch

121 views
Skip to first unread message

Dominic Kempf

unread,
Mar 8, 2023, 10:30:40 AM3/8/23
to Dataverse Users Community
Dear Dataverse community,

I am writing to inform you that the Python Library Pooch has recently (in v1.7.0) added support for DataVerse repositories. Pooch is a BSD-3-licensed Python package for data download from Python code. Among others, it features the following features:
* On-demand download and caching in OS-specific paths
* Automatic checksum verification
* Multiple download protocols
* Built-In support for archive decompression upon download
* Optional progress bars and logging statements
* Automatic download from sources specified by DOIs

For DataVerse users, the DOI resolution feature is the most interesting bit. You can access data by just specifying a DOI for a DataVerse-hosted dataset in e.g. the following way:

```
import pooch

data = pooch.create(base_url="doi:10.11588/data/TJNQZG", path=pooch.os_cache("myproject"))
data.load_registry_from_doi()

datafile = data.fetch("nkd_fpl_valley_TF.json")
```

This will resolve the DOI, determine the data repository type (all DataVerse instances supported), query the DataVerse API for contained data files and their checksums and then on-demand download the specificed file and store the local path in the datafile variable. A second request to the same file would yield the cached version.

For more information, see the following sources:
* GitHub repository: https://github.com/fatiando/pooch

Pooch is available from PyPI and conda-forge.

I am happy to answer your questions,
Best,
Dominic

Philip Durbin

unread,
Mar 10, 2023, 11:05:20 AM3/10/23
to dataverse...@googlegroups.com
Dominic, this is fantastic! Thank you! Would you be interested in creating a pull request to add Pooch to our list of client libraries? I just opened an issue to explain how: https://github.com/IQSS/dataverse/issues/9433

Also, we're always looking for demos for our bi-weekly community calls*. If you'd like to show us what Pooch can do, I'm sure there would be interest!

Thanks!

Phil


--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/09a3151d-d420-454a-9147-bd8afead5105n%40googlegroups.com.


--

Dominic Kempf

unread,
Mar 10, 2023, 11:31:52 AM3/10/23
to Dataverse Users Community
Hey Phil,

thanks for your answer! I will prepare the PR.

Also, I have a set of slides on this ready, so I could give a short presentation at your community call. I would need to be on the 3PM UTC one for sure. I could either make the next or the one after that, depending on whether you are already booked with enough content for the next one. Just let me know.

Best,
Dominic

Philip Durbin

unread,
Apr 12, 2023, 9:29:47 AM4/12/23
to dataverse...@googlegroups.com
Dominic and I have scheduled the Pooch demo for May 2nd!

Please find the Zoom link, etc. at https://dataverse.org/community-calls

Thanks,

Phil

p.s. Dominic did make a pull request to add Pooch to the list of client libraries (thanks!) so it'll be easy to find it in the API Guide after the next release of Dataverse: https://github.com/IQSS/dataverse/pull/9445

p.p.s. In related news, my dog got a haircut yesterday. Please see attached.

Midnight.png

Philip Durbin

unread,
May 2, 2023, 2:20:11 PM5/2/23
to dataverse...@googlegroups.com
Dominic, thanks for the fantastic presentation!

To all, the talk was called "Pooch - Easily interact with data repositories from Python code."

"Demo of the Pooch library for data downloading from Python library code including its capabilities to work directly with DOIs pointing to Dataverse repositories."

I just added it to DataverseTV: https://dataverse.org/dataversetv


Dominic, I did have one more thought after we hung up. If you want you could also download auxiliary files and provenance files. That's right! In Dataverse your files can have files!

Thanks again!

Phil

ps. Docs on these:

Reply all
Reply to author
Forward
0 new messages