Question on DRS AccessMethods

62 views
Skip to first unread message

Bill Clifford

unread,
Jul 9, 2020, 3:01:29 PM7/9/20
to Cloud Work Stream
The NCI Imaging Data Commons will make DICOM data available from both Google Cloud Storage and a Google Healthcare datastore. In case you are not familiar with the latter, it is a DICOM server that supports a REST interface called DICOMweb. We plan to register all DICOM instances as DRS blobs, and DICOM series and studies as DRS bundles. I'm looking some advice on AccessMethod types.

DICOMweb is a REST API, and URLs to the Google Healthcare servers look like:
https://healthcare.googleapis.com/v1/projects/<project>/locations/<location>/datasets/<dataset>/dicomStores/<dicomStore>/dicomWeb/studies/<studyID>/series/<seriesID>/instances/<instanceID>
So, on the one hand, AccessMethods for such objects could have the 'https' type. However, I think that it is important that clients understand that the corresponding URL is a DICOMweb API request. With that knowledge, a user can do more than a simple GET of the object. The user can query about the object, query and/or get the containing objects (DICOM series, study) by truncating the instance or series and instance parts of the URL, specify how data is returned, etc. 

That is why I asked in a previous Cloud WS meeting about inventing a new AccessMethod type, e.g. 'dcm' or 'dcmweb', for such objects. I was contemplating using such a type for the AccessMethod of these objects, but...it really is just HTTP. There is no DICOMweb CLI; you just make HTTP requests. So, what are your thoughts on whether it is appropriate to use a type other than 'https' in this case? If you think that the 'type' should be https, then does this suggest that the AccessMethod should also have an 'api' field that identifies the API to use for accessing an object? Perhaps there are other cases where the API should be known.

Another question: It seems clear that our DrsObjects should include an AccessMethod having the 'gs' type for the blobs in GCS, in which case the URL would be something like 'gs://idc-tcia-...'.  However those same blobs in GCS can be accessed with a GET to a URL like 'https://storage.googleaps.com/idc-tcia-...' Do you think it makes sense to have both 'gs' and 'https' AccessMethods? Or is it expected that a client, given a 'gs' type AccessMethod, would be able to convert the gs:// style URL to an https:// URL? 

Thanks.
Bill

Kaushik Ghose

unread,
Jul 9, 2020, 7:03:42 PM7/9/20
to Bill Clifford, Cloud Work Stream
Hello Bill,

I see from this list of Official IANA schemes that s3:// is a registered scheme, though I could not find gs://. (which doesn't mean it can't be added to the list by the vendor in the future).

My personal opinion is that formats like `gs://` `s3://` are tied to particular vendors and are not, in a manner of speaking, durable or consistent. https:// on the other hand is vendor neutral and stable. I would say that an https:// format URL should be included.


Thanks
-Kaushik


--
You received this message because you are subscribed to the Google Groups "Cloud Work Stream" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ga4gh-cloud...@ga4gh.org.
To view this discussion on the web visit https://groups.google.com/a/ga4gh.org/d/msgid/ga4gh-cloud/9507b49b-0994-4b5f-97e5-d997cc71687en%40ga4gh.org.


--
This e-mail may contain confidential information. If you are not the intended recipient of this e-mail please delete this information and notify the sender.

This email may contain confidential information. Please take care in the storage and transmission of this information. If you are not this message’s intended recipient, please destroy it and notify the sender. This email is not intended to and does not create any legally binding or enforceable obligation on the part of Seven Bridges in the absence of a fully-executed contract or an express written override of this disclaimer.

Bill Clifford

unread,
Jul 10, 2020, 12:48:15 PM7/10/20
to Cloud Work Stream, Kaushik Ghose, Cloud Work Stream, Bill Clifford
Hi Kaushik,

Thanks for pointing out the IANA on this. Curiously, the iana.org table of URI schemes does not include s3! I do note that both s3 and gs are enumerated in the DRS spec

>> I would say that an https:// format URL should be included.
Included in addition to a gs URL or as the only URL?

Bill

Fore, Ian (NIH/NCI) [E]

unread,
Jul 10, 2020, 1:14:45 PM7/10/20
to Cloud Work Stream
Reviewed this thread to see if it was addressing the same issue being addressed in this proposed GA4GH Hackathon exercise. They are different, but closely enough related as part of an overall imaging use case that a GA4GH Hackathon exercise might explore then both. 

The issue raised here is about protocol type, the other issue is about content type. Issue 106 is also relevant.

The style of the hackathon is that if anyone here wanted to pick up either of those issues and work it through you would be most welcomed! The hackathon might provide the opportunity to explore the issue you're tasing here through hand-on work alongside DRS experts.

Kaushik Ghose

unread,
Jul 10, 2020, 1:26:16 PM7/10/20
to Bill Clifford, Cloud Work Stream
Hello Bill,
 
>> I would say that an https:// format URL should be included.
Included in addition to a gs URL or as the only URL?


I'm thinking in addition to the gs:// link. The concern I have with gs:// and s3:// and other schemes is that they basically ask the client to know the rules to convert the string to a https:// scheme, plus allow the client more power in how to download/seek the bytes given a special server (in case of htsget:// for example). 

What I worry about is clients going out of date because the scheme owner changes the scheme. So, suppose google decides that the mapping is no longer gs://X/Y -> https://storgae.googleapis.com/X/Y but something else. Now all clients will have to be updated. If the server does this resolution (i.e hands back an https,) only the server code has to be updated. I'm expecting there will be far fewer servers than clients, and servers will have the resources to be continually maintaining the code and keeping track of what schemes are changed and so on.

If we have an https:// link, at the very least a server can hand the bytes over to the client in the traditional manner.

Thanks
-Kaushik

Anne Deslattes Mays

unread,
Jul 10, 2020, 1:30:31 PM7/10/20
to Kaushik Ghose, Bill Clifford, Cloud Work Stream

NextFlow handles the sorting out of S3 versus G3 and I am not sure what the Azure storage prefix is but its important we don’t make things in a Standard that make it overly complicated….

 

Anne Deslattes Mays, PhD

(she/her/hers)

Principal Computational Scientist

860.837.2091 (t) | 240.328.2505 (m)

Anne.Desl...@jax.org 

https://orcid.org/0000-0001-7951-3439

 

The Jackson Laboratory

Bar Harbor, ME | Farmington, CT | Sacramento, CA

www.jax.org

The Jackson Laboratory: Leading the search for tomorrow's cures

---

The information in this email, including attachments, may be confidential and is intended solely for the addressee(s). If you believe you received this email by mistake, please notify the sender by return email as soon as possible.

Bill Clifford

unread,
Jul 16, 2020, 12:37:59 PM7/16/20
to Cloud Work Stream, Anne Deslattes Mays, Cloud Work Stream, Kaushik Ghose, Bill Clifford
In a separate thread, David Glazer said:
"My answer depends a little bit on how standard DICOMweb is as a standard. If it's reasonably well adopted, so that clients can indeed count on DICOMweb URLs from different sources all behaving the same way, I like the idea of adding a new AccessMethod type="dicom". (I'm agnostic on how we spell it.)

For your other question  on gs (and s3 for that matter) -- I think it's fine as is. Returning a GCS url as type="https" would be less useful to clients, and clients that understand type="gs" should be able to manipulate URLs as needed."
and Brian O'Connor commented:
"...I think I agree with David here.  To add the access method type of dicomweb, want to make a PR to add this, Bill?

For access methods related to gs and s3, I think it makes sense to not switch over to just https for everything since, like David said, we would need some sort of extra hint for clients to know how to deal with the URLs."

Thanks everyone for your help on this,
Reply all
Reply to author
Forward
0 new messages