data access api

Alan Darnell

unread,

Dec 4, 2015, 10:28:21 AM12/4/15

to Dataverse Users Community

I'm just starting to look at the data access API and constructing curl requests in this format

curl -u api_token: https://myserver/api/access/datafile/1

I was expecting a JSON or XML response, but am getting an XHTML web page.

It isn't clear to me from the documentation, but am I guessing correctly that everything in DV 4 is treated as a file object -- the web pages for the interface, images, data files, etc -- and each file has a unique identifier? If so, am I guessing correctly too that the way I find out the file id of the object I'm interested in is to use the search API first and then grab an id value from the search results?

Or is this all completely wrong and I should not be seeing the XHTML page?

Alan

Philip Durbin

unread,

Dec 4, 2015, 3:02:18 PM12/4/15

to dataverse...@googlegroups.com

Hi! Sorry to hear you're having trouble. Lots to unpack here. :)

You're right. Generally, you should not expect XHTML responses from API calls. From the Data Access API you should typically get either binary data or XML Other APIs return JSON.

Please use `curl -i` (add -i to your command, I mean) to get helpful information for troubleshooting, especially the status code. I suspect you're getting a 404 rather than a 200.

curl -i https://apitest.dataverse.org/api/access/datafile/12 gives me this, for example:

HTTP/1.1 200 OK
Date: Fri, 04 Dec 2015 15:35:24 GMT
Access-Control-Allow-Origin: *
Content-disposition: attachment; filename="trees.png"
Content-Type: image/png; name="trees.png"
Content-Length: 8361
Connection: close

(then a bunch of binary data, the image)

The 200 response tells you that everything is ok.

(That "trees.png" file is published and unrestricted so no API token is required.)

Now on to the problem of knowing which file ID to download in the first place. It's extremely unlikely that the file ID will be 1 because files, datasets, and dataverse share a common database id namespace, if you will, in the dvobject table.

So what's the ID of the file I'm interested in? It's tricky to figure out, which I lamented about at https://github.com/IQSS/dataverse/issues/1837#issuecomment-121736332

(One excellent suggestion is to show it in the GUI: Hovering mouse over Download button does not reveal the URL of the file and the URL does not contain the file name #2416 https://github.com/IQSS/dataverse/issues/2416 .)

If you know the database id of the dataset, you can use use something like this to get a list of files and their IDs:

curl https://apitest.dataverse.org/api/datasets/10

You should see something like this (12 is the file ID):

{
    "description": "",
    "label": "trees.png",
    "version": 1,
    "datasetVersionId": 1,
    "datafile": {
      "id": 12,
      "name": "trees.png",
      "contentType": "image/png",
      "filename": "14e2142e226-7a131c009897",
      "originalFormatLabel": "UNKNOWN",
      "md5": "0386269a5acb2c57b4eade587ff4db64",
      "description": ""
    }
}

Often, you don't know the database ID of your dataset, however. (Why should you?) You know the DOI. The most reliable way to get file IDs in this case is via SWORD. Here's an example from https://github.com/IQSS/dataverse/issues/2794 of figuring out that the file ID of Weather_data.tab is 91:

murphy:dataverse pdurbin$ curl -s -u $API_TOKEN: https://apitest.dataverse.org/dvn/api/data-deposit/v1.1/swordv2/statement/study/doi:10.5072/FK2/0MOPJM | xmllint -format -
<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<id>https://apitest.dataverse.org/dvn/api/data-deposit/v1.1/swordv2/edit/study/doi:10.5072/FK2/0MOPJM</id>
<link href="https://apitest.dataverse.org/dvn/api/data-deposit/v1.1/swordv2/edit/study/doi:10.5072/FK2/0MOPJM" rel="self"/>
<title type="text">Metadata mapping test study</title>
<author>
    <name>McLellan, Evelyn (Artefactual Systems Inc.)</name>
</author>
<updated>2015-12-03T20:10:59.966Z</updated>
<entry>
    <content type="image/jpeg" src="https://apitest.dataverse.org/dvn/api/data-deposit/v1.1/swordv2/edit-media/file/92/chelan_052.jpg"/>
    <id>https://apitest.dataverse.org/dvn/api/data-deposit/v1.1/swordv2/edit-media/file/92/chelan_052.jpg</id>
    <title type="text">Resource https://apitest.dataverse.org/dvn/api/data-deposit/v1.1/swordv2/edit-media/file/92/chelan_052.jpg</title>
    <summary type="text">Resource Part</summary>
    <updated>2015-12-03T21:14:37.333Z</updated>
</entry>
<entry>
    <content type="text/tab-separated-values" src="https://apitest.dataverse.org/dvn/api/data-deposit/v1.1/swordv2/edit-media/file/91/Weather_data.tab"/>
    <id>https://apitest.dataverse.org/dvn/api/data-deposit/v1.1/swordv2/edit-media/file/91/Weather_data.tab</id>
    <title type="text">Resource https://apitest.dataverse.org/dvn/api/data-deposit/v1.1/swordv2/edit-media/file/91/Weather_data.tab</title>
    <summary type="text">Resource Part</summary>
    <updated>2015-12-03T21:14:37.334Z</updated>
</entry>
<category term="isMinorUpdate" scheme="http://purl.org/net/sword/terms/state" label="State">true</category>
<category term="locked" scheme="http://purl.org/net/sword/terms/state" label="State">false</category>
<category term="latestVersionState" scheme="http://purl.org/net/sword/terms/state" label="State">RELEASED</category>
</feed>

You're right that if the file is published you can use the Search API to figure out the file ID. Please see http://guides.dataverse.org/en/4.2.1/api/search.html#basic-search-example

This email is already too long but I wanted to at least touch on using API tokens with the Data Access API. Again, if the file is published and not restricted, you don't need an API token. If it's unpublished or restricted (or both), you'll need an API token but you'll want to use "key" as a query parameter like this...

curl https://$HOSTNAME/api/access/datafile/12?key=$API_TOKEN

... rather than the `-u api_token:` form you showed, which is how SWORD works. Please see also http://guides.dataverse.org/en/4.2.1/api/dataaccess.html#authentication-and-authorization

(Currently, the Data Access API does not support X-Dataverse-key header, which we plan to fix in https://github.com/IQSS/dataverse/issues/2662 .)

Phew! Probably more information than you want but I hope it's helpful. Please let us know what would be most helpful to you in making everything easier!

Phil

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/adac96ee-9f29-4a17-8e7b-ac48ba64ad12%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

Philip Durbin
Software Developer for http://dataverse.org
http://www.iq.harvard.edu/people/philip-durbin

Alan Darnell

unread,

Dec 4, 2015, 5:19:26 PM12/4/15

to dataverse...@googlegroups.com

Thanks Phil. I was in fact getting 404 errors as you thought. When I used the file id you suggested, I got back the result you described. Assuming that file ids are sequential and that there is probably a file id of 1 in the system, I’m guessing then that a 404 returned on a request for file id 1 is probably because I don’t have rights to that file.

The authentication token handling seems to be different from API to API — the search query is asking for a key= parameter when I try this:

https://apitest.dataverse.org/api/search?q=trees

I would have thought the search API would return all public data if an API key is not provided. Also, adding &key=token, where token is the API key for my account on apitest.dataverse.org, returns a 401 Unauthorized error, so I must be missing some information about using the search API.

I’ll work through your links and see if I can summarize what I figure out.

Thanks for the help.

Alan

You received this message because you are subscribed to a topic in the Google Groups "Dataverse Users Community" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/dataverse-community/1Rp0mWetf-0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to dataverse-commu...@googlegroups.com.

To post to this group, send email to dataverse...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/CABbxx8G9pyX2hTxT1%2Bja5Dh-w5N9vSS2FtXFQkHpkKvQudcEMA%40mail.gmail.com.

Alan Darnell

unread,

Dec 4, 2015, 6:12:57 PM12/4/15

to Dataverse Users Community

Found this error on my part -- to include the key= parameter in the URL I need to quote the whole URL So while this fails:

curl -i https://apitest.dataverse.org/api/search?q=trees&key=XXX

this works

curl -i "https://apitest.dataverse.org/api/search?q=trees&key=XXX"

My lack of knowledge re: curl, not an issue with the API.

Alan

Philip Durbin

unread,

Dec 7, 2015, 3:01:43 PM12/7/15

to dataverse...@googlegroups.com

Hi Alan,

Given your comments on the Search API, you may be interested in this issue: Consider options for opening APIs without tokens - https://github.com/IQSS/dataverse/issues/1838

You're right that there are three different ways to authenticate to Dataverse APIs (depending on the API) and I agree that this can be confusing. Here are the three ways (in the order in which they were added):

- HTTP Basic Auth (curl -u): SWORD API

- query parameter (?key=...): "native" APIs

- HTTP header (X-Dataverse-key): (most*) "native" APIs

I've been sort of wondering if we should add an FAQ section to the API Guide.

I'm glad from reading your other note that you were able to auth to the Search API.

Phil

* https://github.com/IQSS/dataverse/issues/2662

To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/01397728-9551-4CB2-82B1-A75D7234F0A8%40gmail.com.

For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward