How to load S3 hosted private files into IGV

2,026 views
Skip to first unread message

twe...@gmail.com

unread,
Feb 4, 2013, 9:54:01 PM2/4/13
to igv-...@googlegroups.com
Hi,

We store our BAM and BAI files in AWS S3. It's trivial to load such BAM file into IGV if they are made public (all readable). However, we have use cases that we can't assign all readable permission to BAM and BAI files. When the file is not publically readable, how do we load them into IGV? If any one has experience in dealing with the scenario, please advise.

Thanks!

-Wei Tao

Jim Robinson

unread,
Feb 5, 2013, 1:14:16 PM2/5/13
to igv-...@googlegroups.com
IGV supports basic authentication,  so if it gets a basic authentication challenge it should open a password dialog.   Theoretically it should also work with digest and other common authentication schemes, but we don't test those here.

Jim

Wei Tao

unread,
Feb 15, 2013, 8:39:45 PM2/15/13
to Jim Robinson, igv-...@googlegroups.com
Hi Jim,

Sure.

Actually all file types are affected. I just want to use a simple BED file to illustrate the problem. When I load the BAM files, I got the same error.

Thanks!

-Wei


On Fri, Feb 15, 2013 at 8:32 PM, Jim Robinson <jrob...@broadinstitute.org> wrote:
Hi Wei,

I'm happy to answer questions, but could you post them to the google group thread?  Its helpful for others to see the answers.

This specific problem is not with a BAM but with a BED file,   http://sample.tsibiocomputing.com/target_bed_files/CCCP.bed.

Thanks

Jim

Hi Jim,

I was able to set up basic auth to access BAM and BAI files hosted privately at AWS S3. For the same exact file, I was able to load it into IGV using "load from URL" if it's publically readable. But if I keep it private and load it using basic auth, I passed the username and password challenge, but then I got the following error: "Unable to parse header with error: Invalid Http response". Log file is attached. Could you please take a look and tell me what went wrong?

Thanks!

-Wei



--
 
---
You received this message because you are subscribed to the Google Groups "igv-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to igv-help+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 



Jim Robinson

unread,
Feb 15, 2013, 8:46:40 PM2/15/13
to igv-...@googlegroups.com
I'm actually not sure, we haven't tested basic-auth no S3, actually I
was not aware that S3 even supported that. Basic authentication works
in general with IGV, many people use it and we ourselves use it
extensively. So this would appear to be S3 specific.

I think you need to grab all the details of these requests and responses
to see what is going on. I use Charles Proxy to do that, but I'm sure
there are other programs. One thing it would tell us, for example, is
what the response code is for the failed request.

Jim

Wei Tao

unread,
Feb 15, 2013, 8:57:35 PM2/15/13
to igv-...@googlegroups.com
Hi Jim,

No, S3 does not support basic auth. We use a service at s3auth.com to work around that. I will give Charles Proxy a try and let you know.

Thanks!

-Wei


--

--- You received this message because you are subscribed to the Google Groups "igv-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to igv-help+unsubscribe@googlegroups.com.

Wei Tao

unread,
Feb 15, 2013, 9:02:13 PM2/15/13
to igv-...@googlegroups.com
Hi Jim,

Any suggestions on how to configure charles to work with igv?

Thanks!

-Wei

Jim Robinson

unread,
Feb 15, 2013, 9:05:45 PM2/15/13
to igv-...@googlegroups.com
You shouldn't need to do anything, just start it up and start using IGV.  If you don't see the request/response in Charles configure the proxy in IGV.
To unsubscribe from this group and stop receiving emails from it, send an email to igv-help+u...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.


--
 
---
You received this message because you are subscribed to the Google Groups "igv-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to igv-help+u...@googlegroups.com.

Wei Tao

unread,
Feb 15, 2013, 9:29:52 PM2/15/13
to igv-...@googlegroups.com
Hi Jim,

Attached is the session log file generated by Charles. The loading of a BAM file failed, but loading of a BED file was successful this time. Please let me know if you find the information you need to troubleshoot this.

Thanks!

-Wei
igv_charles.chls

Jim Robinson

unread,
Feb 15, 2013, 11:39:33 PM2/15/13
to igv-...@googlegroups.com
A couple of things I noticed

The BAM requests are missing the range-byte headers. This doesn't make
sense to me, IGV never does a get on a bam file without these headers.

There are several "400" bad request responses.

The response for one of the "gets" for
http://sample.tsibiocomputing.com/Sample_724-082-048.aln.sorted.realned.recaled.bam.bai
is a 401 (unauthorized). So the server didn't like the credentials sent.

I don't understand how the authorization server you are using works, I
suppose its a proxy of some kind? If it works by forwarding the request
be sure that it forwards the entire request, including all headers
(crucially the Range header). The fact that the non-indexed bed worked,
while BAM did not, suggests that this might be the root of the problem.

Jim

Nik Krumm

unread,
Mar 30, 2014, 10:01:55 PM3/30/14
to igv-...@googlegroups.com
Hi Jim, Wei, and anyone else stumbling upon this:

I, too, was looking to do this. What I have are a bunch of index bam files on a private S3 bucket. Although IGV provides HTTP access to files and AWS S3 provides an HTTP API, the two are not immediately compatible: 
 - the S3 API expects "signed" requests, created by hashing the path and current time. While it is possible to create this signed URL and feed that to IGV, for indexed files, the subsequent GET request for the index (e.g., the .bam.bai in the same folder) will fail, as IGV does not correctly sign the URL. This is what is likely causing the 401 unauthorized request in the logs above. 
- Range requests-- as noted above, the byte-range header is missing. I'm not totally sure why, but it's possible that the hosted S3auth service (which provides effectively a proxy between S3 and HTTP) is not forwarding these appropriately.

In light of this I wrote a up a quick and dirty flask-based server ("S3proxy") which reads S3 objects via the boto AWS library and serves them up via HTTP/REST. From the limited amount of testing I've done so far, this seems to work quite well with IGV-- by passing the S3proxy URL to IGV, IGV can manipulate the /bam/bai endings and find the index file; additionally, as boto can handle byte-range requests, the streaming functionality works as expected. 

The app is hosted here and has a small readme with instructions: https://github.com/nkrumm/s3proxy

Thanks, and let me know if you test it!
~Nik

ps. Jim, if you're willing, I have a few other questions regarding the HTTP interface for IGV-- I'll start a separate thread to address those. Mostly about the size of the byte-range request..

Jim Robinson

unread,
Mar 30, 2014, 11:27:35 PM3/30/14
to igv-...@googlegroups.com
Hi Nik,

Nice work and thanks for posting it here! Another solution, which we
use here with our private S3 files, is to explicitly specify the urls
(signed in this case) to both the bam and index files. That is in fact
why the "index" parameter was added.

Jim
Message has been deleted

Mosh

unread,
May 18, 2014, 5:56:23 AM5/18/14
to igv-...@googlegroups.com
Hi Jim and all,
I also encountered this problem. The solution provided by Nik is not appropriate for my needs since I can't use a proxy as it is not secure.
Jim - can you please elaborate what do you mean by saying you use "index" parameter?
Do you know if IGV group is planning on adding better support for this issue e.g. by providing plugin support for  IGV? Adding plugin support would allow to write customer-specific code so that the request to S3 would contain the necessary signing of the request.

Best,
Mosh

James Robinson

unread,
May 18, 2014, 7:36:42 AM5/18/14
to igv-help
Hi Mosh,

By "index" I mean we use the index parameter to explicitly specify the path to a signed URL for the index file.    Essentially the scheme is this:  a service runs on an Amazon instance which has credentials to access the files.   Customers authenticate with this service, which then generates signed URLs to the bam & index and redirects requests for "whatever.bam" and "whatever.bam.bai" to those.  The signed URLs have timeouts, of course.

What is your authentication scheme?  You say you "encountered this problem" but don't say what it is,  the addition of an "explicit" index parameter solves the problem referenced.

I do not anticipate adding plugin support in the near future,  there are higher priority tasks in the queue.   The source is open, of course, and I do try to accommodate pull requests.

Finally,   I'm going on vacation from 2 weeks from Monday May 18 so responses will be delayed until my return


Mosh

unread,
May 19, 2014, 2:05:06 AM5/19/14
to igv-...@googlegroups.com
Hi Jim, thank you for your answer.
The problem I referred to was the one described by Nik:
"the S3 API expects "signed" requests, created by hashing the path and current time. While it is possible to create this signed URL and feed that to IGV, for indexed files, the subsequent GET request for the index (e.g., the .bam.bai in the same folder) will fail, as IGV does not correctly sign the URL. This is what is likely causing the 401 unauthorized request in the logs above."

I can generated the initial URL which is signed but the subsequent requests to the BAM and index file should be signed as well according to Amazon protocol as described here http://docs.aws.amazon.com/AmazonS3/latest/API/sig-v4-authenticating-requests.html 
Since I don't have any control on the http requests generated by IGV I can't sign them properly. I will check the source and try adding the support myself.

Thanks again and enjoy your vacation,
Mosh

James Robinson

unread,
May 19, 2014, 8:11:18 AM5/19/14
to igv-help
Hi Mosh, we are not talking about the same thing.   We do not support Amazon credentials directly, I am referring to using signed URLs as described here:  http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/private-content-signed-urls-overview.html.    You create them  outside of IGV for the "get" method for both bam and index file and supply both URLs to IGV.   Typically these are used in a generated link with "file" and "index" parameters as described here:  http://www.broadinstitute.org/software/igv/ControlIGV.

The problem you reference above was solved by adding the "index" paramter to the html link options,  before IGV would generate the index url by adding ".bai" on to the end of the signed bam URL.   This will not work, however, as the index URL also needs signed.   Thus the explicit index parameter.

If you want to add support for direct passing of Amazon credentials look at the class org.broad.igv.util.HttpUtils.

Jim



--

---
You received this message because you are subscribed to the Google Groups "igv-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to igv-help+u...@googlegroups.com.

Mark Monroe

unread,
Jun 24, 2015, 10:55:33 AM6/24/15
to igv-...@googlegroups.com
I am running into this same issue. I can generate authenticated URL's to a BAM file (anyone with the url can reach the file) but I need some way to pass two separate URL's to IGV. One URL for the BAM file and one URL for the BAI file.

Is there anyway to specify the URL that is to be used for the BAI file? In general, programs should never assume the location of a file based on that of another file.

Thanks,
Mark

Jim Robinson

unread,
Jun 25, 2015, 9:47:05 AM6/25/15
to igv-...@googlegroups.com
Yes, this was solved long ago,  see the user guide for creating links:  http://www.broadinstitute.org/software/igv/ControlIGV.   If you are just pasting in from "Load from URL",  the index load will fail then prompt you for the URL.   I agree with your editorial comment but that is how virtually all software around BAM files work.

Ben Brulotte

unread,
Sep 26, 2016, 3:42:54 PM9/26/16
to igv-help
Is it possible to pass signed urls for the bam and index through a listener port without using a session file?

I'm not having success with this format : http://localhost:port/load?file=signedURLtobam&index=signedURLtoindex

I can get things to load when using a session file, but because of the quirkiness of the merge option I'd rather not depend on those if there's another way. 


Thanks,
Ben

James Robinson

unread,
Sep 27, 2016, 5:22:16 PM9/27/16
to igv-help
Ben,  that should work,  how is it failing?   Could you post something closer to the actual urls?  You can change the name of the files but keep the extensions  (.bam) and parameters.  I realize they won't load for me but the problem is likely in the details of parsing the bits of the signed url and it would be helpful to have it.

James Robinson

unread,
Sep 27, 2016, 11:16:37 PM9/27/16
to igv-help
Try URL encoding the file strings,  this is neccessary as they wil include parameters.  For example


On Mon, Sep 26, 2016 at 12:42 PM, Ben Brulotte <bbrul...@gmail.com> wrote:

--

---
You received this message because you are subscribed to a topic in the Google Groups "igv-help" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/igv-help/QCJrb9Psu3Y/unsubscribe.
To unsubscribe from this group and all its topics, send an email to igv-help+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/igv-help/0ffd9781-b6e7-4b0a-8851-4eadbce1c0fd%40googlegroups.com.
Message has been deleted

Ben Brulotte

unread,
Sep 28, 2016, 1:30:33 AM9/28/16
to igv-help
Thanks Jim, 
that did the trick
-Ben
To unsubscribe from this group and all its topics, send an email to igv-help+u...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages