How to get datafiles of a specific dataset with Dataverse API

127 views
Skip to first unread message

nansu

unread,
Feb 25, 2016, 5:39:40 PM2/25/16
to Dataverse Users Community
Hi:

I want to get all the datafiles of a specific dataset with dataverse API. I have tried to use the parameter "subtree" but it does not work, since the subtree is used to narrow down the dataverse. So if I have all the metadata of a dataset, how can I extract all the metadata of the files under this dataset? Any help would be appreciate.

Philip Durbin

unread,
Feb 25, 2016, 7:00:30 PM2/25/16
to dataverse...@googlegroups.com
Sadly, the answer is a bit complicated until https://github.com/IQSS/dataverse/pull/2893 is merged and tagged in a release.

Until then, please attempt to follow my advice at https://groups.google.com/d/msg/dataverse-community/1Rp0mWetf-0/bQ97_9StCQAJ

Especially this...

curl https://apitest.dataverse.org/api/datasets/10

... to get the metadata of files once you know the dataset id. That pull request is about being able to use the DOI of the dataset instead, a fix for https://github.com/IQSS/dataverse/issues/1837

I hope this helps!

Phil


On Thu, Feb 25, 2016 at 5:39 PM, nansu <zongna...@gmail.com> wrote:
Hi:

I want to get all the datafiles of a specific dataset with dataverse API. I have tried to use the parameter "subtree" but it does not work, since the subtree is used to narrow down the dataverse. So if I have all the metadata of a dataset, how can I extract all the metadata of the files under this dataset? Any help would be appreciate.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/bb3ec39d-8e52-488f-af4e-acfbae2698fd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--

Philip Durbin

unread,
Mar 16, 2016, 2:47:10 PM3/16/16
to dataverse...@googlegroups.com
Good news! This just got *way* easier. :)

We just deployed Dataverse 4.3 to https://dataverse.harvard.edu and here's how you can get the metadata for files using a dataset's DOI:

curl https://dataverse.harvard.edu/api/datasets/:persistentId?persistentId=doi:10.7910/DVN/ARKOTI | jq '.data.latestVersion.files[0,1,3]'

See below for output. I'm using `jq` to pick out the part of the JSON I'm interested in and to limit the output to a sane number of files (three). Note that the id of each datafile is exposed, which you need to download the files using http://guides.dataverse.org/en/4.3/api/dataaccess.html

In short, it looks like the fix for https://github.com/IQSS/dataverse/issues/1837 is a good one! (Prior to this you had to know the dataset's database id.) Docs at http://guides.dataverse.org/en/4.3/api/native-api.html#datasets

Please stay tuned for when Dataverse 4.3 is tagged and available for download from https://github.com/IQSS/dataverse/releases

Phil

p.s. How the file metadata looks:

{
  "description": "Salta, Argentina field experiment on e-voting versus traditional voting. Citation: Alvarez, R. Michael, Ines Levin, Julia Pomares, and Marcelo Leiras. 2013. \"Voting Made Safe and Easy: The Impact of e-voting on Citizen Perceptions.\" Political Science Research and Methods 1(1):117-137.",
  "label": "alpl2013.tab",
  "version": 2,
  "datasetVersionId": 75170,
  "datafile": {
    "id": 2692294,
    "name": "alpl2013.tab",
    "contentType": "text/tab-separated-values",
    "filename": "14e664cd3c7-d64f88cca576",
    "originalFileFormat": "application/x-stata",
    "originalFormatLabel": "Stata Binary",
    "UNF": "UNF:6:d9ZNXvmiPfiunSAiXRpVfg==",
    "md5": "2132170a713e5a213ab87dcaea287250",
    "description": "Salta, Argentina field experiment on e-voting versus traditional voting. Citation: Alvarez, R. Michael, Ines Levin, Julia Pomares, and Marcelo Leiras. 2013. \"Voting Made Safe and Easy: The Impact of e-voting on Citizen Perceptions.\" Political Science Research and Methods 1(1):117-137."
  }
}
{
  "description": "National Survey of High School Biology Teachers. Citation: Berkman, Michael and Eric Plutzer. 2010. Evolution, Creationism, and the Battle to Control America's Classrooms. New York: Cambridge University Press.",
  "label": "BPchap7.tab",
  "version": 2,
  "datasetVersionId": 75170,
  "datafile": {
    "id": 2692295,
    "name": "BPchap7.tab",
    "contentType": "text/tab-separated-values",
    "filename": "14e664cd409-7a2dc0c380f9",
    "originalFileFormat": "application/x-stata",
    "originalFormatLabel": "Stata Binary",
    "UNF": "UNF:6:B3/HJbnzktaX5eEJA2ItiA==",
    "md5": "e8c62465ef6a1a8451a21a43ce7b264e",
    "description": "National Survey of High School Biology Teachers. Citation: Berkman, Michael and Eric Plutzer. 2010. Evolution, Creationism, and the Battle to Control America's Classrooms. New York: Cambridge University Press."
  }
}
{
  "description": "Replication code for Chapter 2 (Loading and Manipulating Data). Required data files: hmnrghts.txt, sen113kh.ord, hmnrghts.dta, pts1994.csv, and pts1995.csv.",
  "label": "chapter02.R",
  "version": 2,
  "datasetVersionId": 75170,
  "datafile": {
    "id": 2692206,
    "name": "chapter02.R",
    "contentType": "text/plain; charset=US-ASCII",
    "filename": "14e663269e2-61fc90d7afec",
    "originalFormatLabel": "UNKNOWN",
    "md5": "e9c536034e029450a79ce830e47dd463",
    "description": "Replication code for Chapter 2 (Loading and Manipulating Data). Required data files: hmnrghts.txt, sen113kh.ord, hmnrghts.dta, pts1994.csv, and pts1995.csv."
  }
}


Reply all
Reply to author
Forward
0 new messages