How do Signed URLs work?

53 views
Skip to first unread message

Philipp Conzett

unread,
Jun 23, 2024, 7:59:09 AM (13 days ago) Jun 23
to Dataverse Users Community
Based on what I've read in the Dataverse API Guide and the release notes for v6.1, my understanding of Signed URLs in Dataverse has so far been that they replace API tokens as an authorization mechanism for file previewers to access non-public files. From this assumption, I'd expect that no API token is created when I - logged in as a user with access to a given dataset - open a previewer to view a non-public file in that dataset. However, when testing this in our v6.2 test environment as well as on demo.dataverse.org (v. 6.2 build "v6.2+10451+10463+10383-iqss"), an API token is created when opening a file previewer (I've tested this for a plain text file previewer, a markdown file previewer, and a image file previewer).

Is there a bug somewhere, or have I misunderstood how Signed URLs work?

Best,
Philipp

James Myers

unread,
Jun 23, 2024, 8:39:11 AM (13 days ago) Jun 23
to dataverse...@googlegroups.com

Philipp,

 

API Keys are still used by Dataverse in the signed URL mechanism. Signed URLs use the dataverse.api.signature-secret, which is optional but recommended and should be set to a fairly long (32-64 byte) value, plus the user’s  API key  as the key for signing URLs. That said, the API Key is no longer to the tool or otherwise exposed outside of Dataverse when using signed URLs, which is the main advantage.  

 

* Note that whether to use signed URLs or API keys is managed via the previewer registration mechanism – if you haven’t reinstalled the previewers or updated your DB to make your previewers use signedURLs/have a list of allowedURLs in the json, you may still be using API Keys directly and having them sent to previewers/other tools. (Simply updating Dataverse itself doesn’t switch them).

 

FWIW: Using a user specific part (the API Key) to the overall key makes it harder for an attacker to collect enough info and limits them to working to compromise one person at a time. Conversely, adding the dataverse.api.signature-secret, which isn’t known by users, makes it harder for an attacker to use signed URLs at all – since the overall key they need to find is longer. The decision to pick the API key as the user specific part was mostly one of convenience – since it often exists already and the code will auto-generate it when needed. The signed URL mechanism could be adapted to use a second hidden user key, or the API Key could still remain hidden unless/until the user requests access to it. Even as is, signed URLs are a big improvement in that the API Key is not being sent to the previewers and the only thing an attacker able to see the browser history/network traffic can get is the signed URLs which are short-lived and only allow reading the specific data/metadata being previewed (whatever the set of allowed URLs for a tool can do, limited to the specific datasets/files the tool is called on).

 

-- Jim

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/86f27fd6-4fd5-4bda-b463-fcc42de9c77en%40googlegroups.com.

Philipp Conzett

unread,
Jun 24, 2024, 12:36:31 AM (12 days ago) Jun 24
to Dataverse Users Community
Thanks, Jim! It seems we didn't follow the instructions in the Known Limitations section of the Dataverse Previewers README file.
Best, Philipp

Philipp Conzett

unread,
Jun 28, 2024, 1:43:09 AM (8 days ago) Jun 28
to Dataverse Users Community
I realize now that I still might have misunderstood how Signed URLs work. I thought that the Signed URL mechanism was used to avoid that API tokens are created to use previewers on files that are not public. To illustrate this, I've made screen recording to show that an API token is created when a user previews a file in a dataset draft. I think especially for a user with access rights to other datasets than the user's own datasets, this can be unfortunate, as the user might forget to revoke the API token after using previewers in dataset drafts.

Best,
Philipp

James Myers

unread,
Jun 28, 2024, 9:41:32 AM (8 days ago) Jun 28
to dataverse...@googlegroups.com

Philipp,

 

The main concern that signedUrls address is not that the user can see their own API key while logged in. Before signedUrls, as is still the case on demo.dataverse.org, invoking a previewer on a restricted file (or one in a draft dataset) was done by using the API key directly in the URL used to launch the previewer. That is not so obvious when you just view the previewer on the file page (although it is still true), but is clear if you launch the previewer on a separate page, e.g. clicking the “Explore on View Image” button that shows up on the file page of a jpg image on demo.dataverse.org. When I do this with an image in a draft dataset I have there, the URL is

https://gdcc.github.io/dataverse-previewers/previewers/v1.3/ImagePreview.html?fileid=2351193&siteUrl=https://demo.dataverse.org&key=4641d0af-8dd6-4aae-bfae-d25e178edc28&datasetid=2351191&datasetversion=:draft&locale=en

The key parameter was my API key and it is in now my browser history and available for anyone to view if they have access to my machine. (The URL of the preview embedded in the file page doesn’t automatically appear in the browser history, but it could be visible to people who can see the network traffic, so they could still get the API key.)

 

In contrast, launching a previewer when signed URLs are used results in a URL like

https://gdcc.github.io/dataverse-previewers/previewers/betatest/ImagePreview.html?callback=aHR0cHM6Ly9kdi5kZXYtYXdzLnFkci5vcmcvYXBpL3YxL2ZpbGVzLzgxODcvbWV0YWRhdGEvMTE1ODkvdG9vbHBhcmFtcy81P3VudGlsPTIwMjQtMDYtMjhUMTM6MTM6NTUuMTg3JnVzZXI9cXFteWVycyZtZXRob2Q9R0VUJnRva2VuPWQ5ZmE1MjcwNzMxNGI4ZmE0OGNjNTMxNjYyMDhkY2Y3N2MzODcxMTllN2QxOWI5ZDM0ODdlZGQxZmMwMDMyZDZlNjYzN2JjN2I2OGVjOGQ2Mzg1MmJjNGZmNmE2YTJhOTg0MmJkM2Y4MjYxMmUxOGRmMTNhNTAwMmMzMGJlM2Vj&locale=en

 

If you know to decode that, you’d be able to find another URL: https://dv.dev-aws.qdr.org/api/v1/files/8187/metadata/11589/toolparams/5?until=2024-06-28T13:13:55.187&user=qqmyers&method=GET&token=d9fa52707314b8fa48cc53166208dcf77c387119e7d19b9d3487edd1fc0032d6e6637bc7b68ec8d63852bc4ff6a6a2a9842bd3f82612e18df13a5002c30be3ec

which, if you had gotten it within 5 (configurable) minutes (and if this  specific QDR dev machine weren’t also firewalled), would have allowed you to retrieve some json with a couple more signedURLs that would let you get the dataset metadata and the file bytes if you used them quickly. Since it took more than five minutes to write this email, the fact that this URL might be in the browser history or seen on the network doesn’t matter. All you could do with it is get the response:

{"status": "ERROR","message": "Bad signed URL"} (you can’t with the specific URL above because it is from a QDR dev machine that is behind a firewall.)

 

The advantages here are that there is nothing left on the browser after the user logs out that can be used to do anything except download the public metadata and the restricted bytes of that one file, and, even for those two actions, the URLs are no good after the end of the timeout anyway. And I don’t have to be aware of or change my API key to be sure that no one can impersonate me after I leave the machine.

 

In contrast, if I hadn’t changed my API key on demo, you could have come to my computer several days later and picked it up from the history and then done anything I could do, including deleting my data.

 

While never showing the API key to the logged in user when it is only being used for signedURLs could stop an attacker who could look over their shoulder (assuming they viewed it at all – no reason they need to), it wouldn’t stop someone who could get onto the user’s machine while they were logged in (since they could then manually generate an API key and copy it). At some point, we might be able to completely replace the API key mechanism with some combination of signedURLs and OIDC offline tokens (as I talked about in Mexico). Until then, simply hiding the API key unless the user explicitly requests it could be done, but that would only stop attacks where someone could see the screen while the user is logged in and for some reason looks at their API key. If that is a concern, it would not be much programming.

Philipp Conzett

unread,
Jun 28, 2024, 10:41:21 AM (8 days ago) Jun 28
to Dataverse Users Community

Thanks for this clarification, Jim! I guess to mitigate the risk of the API token being disclosed because a user forgets to revoke it and it is still displayed in on the user account page, we just need to encourage users to revoke the token regularly / once they've finished the task requiring the token.

Best, Philipp
Reply all
Reply to author
Forward
0 new messages