Hi! Sorry to hear you're having trouble. Lots to unpack here. :)
You're right. Generally, you should not expect XHTML responses from API
calls. From the Data Access API you should typically get either binary
data or XML Other APIs return JSON.
Please use `curl -i` (add -i to your command, I mean) to get helpful information for troubleshooting, especially the status code. I suspect you're getting a 404 rather than a 200.
curl -i
https://apitest.dataverse.org/api/access/datafile/12 gives me this, for example:
HTTP/1.1 200 OK
Date: Fri, 04 Dec 2015 15:35:24 GMT
Access-Control-Allow-Origin: *
Content-disposition: attachment; filename="trees.png"
Content-Type: image/png; name="trees.png"
Content-Length: 8361
Connection: close
(then a bunch of binary data, the image)
The 200 response tells you that everything is ok.
(That "trees.png" file is published and unrestricted so no API token is required.)
Now on to the problem of knowing which file ID to download in the first place. It's extremely unlikely that the file ID will be 1 because files, datasets, and dataverse share a common database id namespace, if you will, in the dvobject table.
You should see something like this (12 is the file ID):
{
"description": "",
"label": "trees.png",
"version": 1,
"datasetVersionId": 1,
"datafile": {
"id": 12,
"name": "trees.png",
"contentType": "image/png",
"filename": "14e2142e226-7a131c009897",
"originalFormatLabel": "UNKNOWN",
"md5": "0386269a5acb2c57b4eade587ff4db64",
"description": ""
}
}
This email is already too long but I wanted to at least touch on using API tokens with the Data Access API. Again, if the file is published and not restricted, you don't need an API token. If it's unpublished or restricted (or both), you'll need an API token but you'll want to use "key" as a query parameter like this...
curl https://$HOSTNAME/api/access/datafile/12?key=$API_TOKEN
Phew! Probably more information than you want but I hope it's helpful. Please let us know what would be most helpful to you in making everything easier!
Phil