support for cram files

552 views
Skip to first unread message

Venkata Suresh kumar S

unread,
May 6, 2014, 6:53:54 AM5/6/14
to igv-...@googlegroups.com
Is IGV team planning for supporting compressed alignment (and sequence) files? for eg. CRAM loss less compressed files.

Jim Robinson

unread,
May 7, 2014, 2:11:39 PM5/7/14
to igv-...@googlegroups.com
Hi,

We do plan to support it when Picard does, but I don't have an estimated time for that.  You can monitor the samtools mailing list for updates.

Jim

Nicolas Stransky

unread,
Sep 3, 2014, 10:28:12 AM9/3/14
to igv-...@googlegroups.com
Hi Jim,
Any update on this? :)
Thanks,
Nico

Jim Robinson

unread,
Sep 3, 2014, 10:39:20 AM9/3/14
to igv-...@googlegroups.com
Coincidentally I just pinged the Picard team about this recently, they are working on it but its not ready yet.  

Jim

Erh-Chan Yeh

unread,
Sep 1, 2015, 5:28:09 AM9/1/15
to igv-help
Hi, Jim,

Is there any update on CRAM support?

Thanks.

Erh-Chan

Jim Robinson於 2014年9月3日星期三 UTC+8下午10時39分20秒寫道:

James Robinson

unread,
Sep 4, 2015, 1:16:08 PM9/4/15
to igv-help
No updates.  Higher priorities,  and the htsjdk still does not support cram indexes.

--

---
You received this message because you are subscribed to the Google Groups "igv-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to igv-help+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/igv-help/f46ecba6-0868-4fd3-9cd0-b5232b807859%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Brendan

unread,
Sep 14, 2015, 11:19:05 AM9/14/15
to igv-help
Hi Jim,

I understand that this isn't currently a high priority, but I'd like to chime in and also express interest in this feature.

Brendan

Jim Robinson

unread,
Sep 14, 2015, 12:08:10 PM9/14/15
to igv-...@googlegroups.com
Every request bumps it up a little.  

Erh-Chan Yeh

unread,
Sep 15, 2015, 4:20:08 AM9/15/15
to igv-help
According to what I read in GATK forum, the latest release (build 1.139) of HTSJDK can read .cram.crai in addition to .cram.bai. Just FYI.

jtra...@counsyl.com

unread,
Nov 17, 2015, 12:49:34 AM11/17/15
to igv-help
Hi Jim,

Do you have a sense of how IGV will handle reference genomes?  I.e. say you point IGV to http://www.example.com/best_alignment_ever.cram and it references a custom genome, how will IGV handle finding that genome? Will it follow the same convention as samtools/htslib and use a REF_PATH and REF_CACHE setting?

We're also very interested in getting IGV to work with CRAM files and it would make for a big improvement in our workflow.

Thank you all for providing an awesome product!

Jeff

Jim Robinson

unread,
Nov 17, 2015, 12:09:49 PM11/17/15
to igv-...@googlegroups.com
Hi Jeff,

This is one of the problems I need to solve to support CRAM,  which I hope to have by the end of the year.    I was unaware of the convention you are referencing, I will check it out,  but ultimately I might have to ask the user as there is no guarantee a referenced genome will be reachable to the user loading the CRAM.  There is also the problem of different "genomes" with identical sequence,  so in the end I think we will have to at least allow the user to specify this.

Open to any suggestions on this, the question is timely.

Jim
--

---
You received this message because you are subscribed to the Google Groups "igv-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to igv-help+u...@googlegroups.com.

Brendan

unread,
Dec 10, 2015, 12:08:08 PM12/10/15
to igv-help
Hi Jim,

I'm not sure if this will be helpful, but since I'd still love to see this feature, I'll describe a typical use-case for myself and my collaborators.

We work almost entirely in human data, and have CRAM files indexed against an extended version of the human genome (adding contaminants etc.).  The coordinates do match up exactly to GRCh38.  I typically view data on a computer which does Not have access to the original extended genome, but don't really care about visualizing read coverage over the extra bits.  I've also started programmatically sending read data to IGV by way of hyperlinks, so being prompted for every file seems less than ideal.  I expect these are all representative of many other researchers.

I'm also not sure exactly how HTSLib works, but from my understanding of CRAM, the compression it gleans is from saving read coordinates and the difference between the read and reference, and throwing away the read sequence.  Therefore in loading reads into IGV, I'm not sure you'd even need to extract out the original read sequence - loading the coordinates should be enough.  In short, if the reads match a genome the user has in their IGV, I'm not sure you'd even need the original genome fasta.  There may be an engineering bottleneck I'm not aware of though.

So to bring it all together, I think what would work is letting IGV users specify with a preference parameter which local genome to target for a given CRAM genome, as a full file path or using your resource management stuff.  If a preference parameter has not been set for a given CRAM genome, then you could prompt.  Additionally you could have a preference checkbox to use the environment paths described above.

Thanks again for all your work, love IGV!
Brendan

Jim Robinson

unread,
Dec 10, 2015, 12:28:19 PM12/10/15
to igv-...@googlegroups.com
Hi Brendan,

Thanks for your input. I have another thread going with the htsjdk
team on how to do this, but I think we are just going to take the IGV
geome the user has loaded as the reference, ignoring what's specified in
the CRAM, exactly as you are suggesting. This requires some changes to
the libraries, but work is underway on it.

Jim

Bob Handsaker

unread,
Apr 25, 2016, 10:38:00 AM4/25/16
to igv-help
Am checking to see if IGV supports cram files yet and found this thread.
Is there current support for cram files in IGV?  If not, do you have an estimated time frame?
Thanks,
Bob

Jim Robinson

unread,
Apr 25, 2016, 11:25:32 AM4/25/16
to igv-...@googlegroups.com
Bob,  not yet.   I made an attempt at this and it is really complex.  The htsjdk has some support but it doesn't support http,  and doesn't really support range-byte requests for sequence (its coded in such a way that you have to read the entire sequence in memory).   I have it on my calendar to have another go of it this spring,  I'm finishing up a round of work on igv.js now.

If you know of another open-source Java library that reads CRAM files I would be interested in having a look,  otherwise I'll try to modify the htsjdk code.

Jim
--

---
You received this message because you are subscribed to the Google Groups "igv-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to igv-help+u...@googlegroups.com.

Bob Handsaker

unread,
Apr 25, 2016, 11:32:05 AM4/25/16
to igv-help
Ugh. Genome STRiP now supports cram (using htsjdk). I had to work around a couple of things, but it seems to be robust for us so far.
For the nonce, I guess I'm stuck with good old samtools view.
-Bob

Jim Robinson

unread,
Apr 25, 2016, 11:45:28 AM4/25/16
to igv-...@googlegroups.com
Does Genome STRiP support http/https?

Bob Handsaker

unread,
Apr 25, 2016, 11:50:03 PM4/25/16
to igv-help
It does, but I doubt this functionality is used very much.  It enables directly reading data from AWS S3, for example.
We didn't explicitly test cram over http.
-Bob

Yossi Farjoun

unread,
May 3, 2016, 3:19:56 PM5/3/16
to igv-help
Bump!! It would be great to have support for CRAM. If there are specific features you need for this in htsjdk, perhaps you could open an issue or two in the github repository and we could try to help.

Simon

unread,
Nov 16, 2016, 12:41:27 PM11/16/16
to igv-help

It's such a bad hack that I almost feel bad mentioning it, but for folks desperate enough, I did make a port of IGV that can read CRAM files some time ago. I made this because I needed it and wasn't particularly intending it for others to use, but at the time I didn't think it'd be so long before CRAM got officially supported.

Try it out at your own risk:


Please note it will *not* read network based files, it can only handle files read from the local file system.

Whatever you do please don't bother the official IGV team with bugs you find in it ... !

Cheers,

Simon

James Robinson

unread,
Nov 16, 2016, 1:24:09 PM11/16/16
to igv-help
Hi Simon,  thanks.  CRAM files are now supported in the development branch of IGV.  Performance over the network is slow, but can be greatly sped up by downloading the fasta locally.  The CRAM support will be released by the end of the year, but can be accessed now via the snapshot build:  http://software.broadinstitute.org/software/igv/download_snapshot

Please open a git issue if you find bugs.

Jim


--

---
You received this message because you are subscribed to the Google Groups "igv-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to igv-help+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/igv-help/f19fe1d2-6d29-4e8f-8312-7e9bf1092bd2%40googlegroups.com.

Simon

unread,
Nov 16, 2016, 3:32:47 PM11/16/16
to igv-help
>  CRAM files are now supported in the development branch of IGV.

That is fantastic news! I will try it out. This is one of the few things holding us back from deploying CRAM much more widely.

Cheers / thanks!

Simon

James Robinson

unread,
Nov 16, 2016, 3:37:13 PM11/16/16
to igv-help
As mentioned performance is slow if accessing the fasta reference over the network (i.e. the IGV "genome").   This is mainly because the htsjdk requires the entire sequence to be downloaded before decoding records.   Performance is much better if you first load a local fasta as the reference (Genomes > Load genome from file...)  then load the CRAM.  

--

---
You received this message because you are subscribed to the Google Groups "igv-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to igv-help+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages