mitochondrial variants

619 views
Skip to first unread message

Mark Samuels

unread,
Jul 26, 2014, 10:43:50 AM7/26/14
to igv-...@googlegroups.com
We generated a bam file for the mitochondrial genome from our whole exome data (off target reads, but there are still a lot of them). I can visualize the alignment in IGV, but there seems to be no gene annotation, just the DNA sequence. Is there a way to see the genes, both tRNA, rRNA and protein, so that we can assess the variants potential function? There are too many to do manually. Note that UCSC also does not do a good job of annotating the mito genome, although ENSEMBL does.

If this issue has been covered previously apologies, just point me to the relevant discussion.

Thanks!

Jim Robinson

unread,
Jul 26, 2014, 11:26:29 AM7/26/14
to igv-...@googlegroups.com
Hi,  if you can find annotation files in a supported format (e.g. bed or gff) you can load them through file menu. 

Jim

We generated a bam file for the mitochondrial genome from our whole exome data (off target reads, but there are still a lot of them). I can visualize the alignment in IGV, but there seems to be no gene annotation, just the DNA sequence. Is there a way to see the genes, both tRNA, rRNA and protein, so that we can assess the variants potential function? There are too many to do manually. Note that UCSC also does not do a good job of annotating the mito genome, although ENSEMBL does.

If this issue has been covered previously apologies, just point me to the relevant discussion.

Thanks!
--

---
You received this message because you are subscribed to the Google Groups "igv-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to igv-help+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/igv-help/8203dd27-ffb0-4e69-a1cd-49d28808bf81%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Mark Samuels

unread,
Jul 28, 2014, 8:43:43 AM7/28/14
to igv-...@googlegroups.com
Thanks Jim, I'm not familiar with gff format but will look into it. A bed file wouldn't have translation information would it, just start/stop positions, which I suppose could be translated by hand. I'm surprised at how inconvient this is given how important mitochondrial genetic diseases are.

Mark

Jim Robinson

unread,
Jul 28, 2014, 10:51:15 AM7/28/14
to igv-...@googlegroups.com
If you can give me some links to sources I'll see what I can do.   The UCSC gene tables would have translation information,  but if their annotations are not adequate that wouldn't help.

Mark Samuels

unread,
Aug 11, 2014, 4:08:37 PM8/11/14
to igv-...@googlegroups.com
Jim, UCSC annotation is funny, RefSeq is worse. I don't know why. But at this web site (see under my name below) there is a complete annotation of the mito genome. It may be off by 1 bp from the one you currently use, but it should fit either hg19 or b37. If this annotation could be added to IGV, that would help all medical geneticists in the world, all of whom so far as I know are struggling with calling mitochondrial variants in relevant disease patients. There are some commercial solutions but pricey.

Thanks!
Mark

Jim Robinson

unread,
Aug 11, 2014, 4:36:22 PM8/11/14
to igv-...@googlegroups.com
Hi Mark,

I think it would be preferable to add this as a distinct genome, as it includes a reference sequence.   I'm concerned it would cause some confusion if it was tacked onto hg19 or b37.   What do you think?

Jim

Jim Robinson

unread,
Aug 11, 2014, 11:13:58 PM8/11/14
to igv-...@googlegroups.com
Mark,

I couldn't find any downloadable annotation files, but I did manage to create a genbank file by copy/paste.   Could you have a look and tell me if its what you expect.   To do so download the file below to your local file, then load it as a genome (menu Genomes > Load from file...").  Be sure to view the Annotations as expanded or squished (right click to select those options).   There are a lot of overlapping annotations and you won't see them all unless you do that.

If this file looks good I can add it to the hosted list of genomes.

File is at:  http://www.broadinstitute.org/igvdata/nc_012920.gbk

Jim

Mark Samuels

unread,
Aug 18, 2014, 2:41:23 PM8/18/14
to igv-...@googlegroups.com
We will look at this Jim. I agree it seems easiest to create as a separate genome. I can tell you that this problem has not been trivially resolved by anyone else, in the literature people who do mitochondrial genomics seem to create their own tools, but a central solution would be optimal since medical geneticists everywhere are using IGV to visualize when they need to, and all medical genetics units get some mitochondrial disorder patients, who may carry mutations in either the nuclear or mitochondrial genome. I was surprised to see how difficult it's been to do this. I'll let you know, if it seems to work it would be worth some general information distribution so people know the annotation is available - but not yet.

Thanks!
Mark

Jim Robinson

unread,
Aug 20, 2014, 9:25:21 PM8/20/14
to igv-...@googlegroups.com
Mark, my first question for you is does the file referenced below contain the annotations you expect?  This is the genbank record you referenced, but I can't judge the quality of the annotations.   If we confirm they are correct we can add them as a separate genome and/or part of our "hg19" reference, perhaps as an optional track.

Jim

--

---
You received this message because you are subscribed to the Google Groups "igv-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to igv-help+u...@googlegroups.com.

Mark Cowley

unread,
Oct 29, 2014, 7:47:02 PM10/29/14
to igv-...@googlegroups.com
Hi,
I've just noticed the same thing. I've aligned my reads to b37 (thus the chromosome is 'MT'), and I suspect that the standard gene annotations in IGV might use 'chrM', 'M' or similar?

cheers,
Mark

Jim Robinson

unread,
Oct 29, 2014, 9:28:54 PM10/29/14
to igv-...@googlegroups.com
Hi Mark and Mark,

Actually I'm not sure what happened to the genes,  we take our default annotations from UCSC.   I'll look into it, but maybe UCSC refSeq is not the best source of annotations from chrM?   If you can supply or point me to another annotation file in a supported format (gff, gtf, bed)  I can add it to the "Load from server" menu.   This will enable you to easily load the annotations.   I should add you can also do that by just loading the file,  but it would benefit the larger community to have it in "Load from server".

The MT vs chrM thing shouldn't cause a problem,  IGV recognizes those as aliases.

Jim

Jim Robinson

unread,
Oct 29, 2014, 10:27:41 PM10/29/14
to igv-...@googlegroups.com
OK, further update.   I think the best solution for this is to list the Mito sequence as a separate "genome",  and use the genbank record for sequence and annotations.   This would be unambiguous and hopefully provide better annotations.   To that end I added the genbank record "NC_012920".   To load it select the "More..." at the bottom of the genome dropdown list,  then search for "Mito", select the entry, and hit o.k.   The sequence will show as NC_012920,  however I've defined this to be an alias for chrM and MT,  so bams and annotations using that name should work.

Right click in the annotations track and select "expand" to see them all.   IGV is not picking up a name field from the genbank record,  I will look into that,  but you can mouse over the annotations to see what they are.

If this  isn't what you need I can add other genbank records, just send me the record id.

Jim

Mark Cowley

unread,
Nov 10, 2014, 6:52:14 AM11/10/14
to igv-...@googlegroups.com
Hi Jim,
Thanks for making this change. This is a pretty good work around, which just requires you to have the same data loaded into 2 sessions.

There appears to be an annotation bug for chrM at UCSC genome browser, as there are no RefSeq genes on chrM… There is data in the UCSC known Gene and GENCODE tracks. I’ve attached a screenshot.

cheers,
Mark


--

---
You received this message because you are subscribed to the Google Groups "igv-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to igv-help+u...@googlegroups.com.

Jim Robinson

unread,
Nov 10, 2014, 10:24:17 AM11/10/14
to igv-...@googlegroups.com
You can load those tracks in IGV as well (for hg19).  Select "File > Load from Server... Annotations".

Mark Samuels

unread,
Jan 20, 2015, 12:13:23 PM1/20/15
to igv-...@googlegroups.com, Dan Spiegelman
Hi Jim,

It's been a few months, sorry I got preoccupied by other grant proposals. I have gone back to this and there is still a problem, once I have NC_012920 loaded as the genome, IGV is not recognizing my bam files from GATK analysis. Here is the file name, is there a trivial formatting issue in the name that I can hand edit perhaps (the S16069 is our internal sample id number)?  MT.S16069.ref_MT.MT_L30.bam

For the other approach you suggested, when I try to look at the mitochondrial genome in the complete genome view, there are also some complications. The various gene annotation files that you suggested manually loading (UCSC, ENSEMBL, GenCode but only up to v18, v19 is seemingly not available) do include mitochondrial genes but they are not all in alignment with each other and with the various genome assemblies, particularly ENSEMBL is ok with hg19 but not with b37, whereas our bam file is aligned correctly with b37 not with hg19!

So things are still not quite sorted out although there is progress. If we could figure out how to get the NT_012920 genome to recognize our bam files (without having to repeat the entire GATK analysis using that as the reference genome, which is not feasible with our semi-automated pipeline), then I think it could work.

Thanks,
Mark

Jim Robinson

unread,
Jan 20, 2015, 12:24:25 PM1/20/15
to igv-...@googlegroups.com, Dan Spiegelman
Hi,

I can't answer the question about your bam file because I don't know what you aligned it against.  The genbank file I loaded on the server has a single squence, name is NC_012920.  This of course must match the sequence name in your bam file.   If it doesn't you can doenload the genbank file yourself and change the sequence name to one that matches the corresponding sequence in your bam file.

Jim

--

---
You received this message because you are subscribed to the Google Groups "igv-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to igv-help+u...@googlegroups.com.

Mark Samuels

unread,
Jan 23, 2015, 12:20:54 PM1/23/15
to igv-...@googlegroups.com, Dan.Spi...@crchum.qc.ca
Jim, thanks for the response. We aligned against b37 I think, I presume that that includes the mitochondrial genome even if not annotated with genes since it shows the reads in correct alignment in IGV with b27 genome loaded. I don't quite understand the bam file naming convention, but I've asked my local bioinformaticist (Dan Spiegelman) if he can see a way to rename his output file so that IGV would recognize it using the NC_xxx genome you uploaded. I'll let you know if some version of that works, if so it would be useful to many other groups potentially. I understand there is still a problem of mito genome sequences duplicated non-functionally in the nuclear genome, so SNPs could potentially be called in those duplicates. But those would be systematic presumably, hence would appear in multiple exome samples from any one group's pipeline, and would not be unique to cases from a particular phenotype study.

Mark

Jim Robinson

unread,
Jan 23, 2015, 12:23:10 PM1/23/15
to igv-...@googlegroups.com
Hi Mark,

If you can ask your bioinformaticist to output the bam header and send
it to me I can make this work without the need to modify your bam
file. What I need to know is the sequence name for the mitochondria in
your bam file. This information is in the header of the bam file.

Jim
Reply all
Reply to author
Forward
0 new messages