Why does hg19 reference genome have outdated mitochondrial sequence?

1,185 views
Skip to first unread message

Dan Richards

unread,
Sep 17, 2012, 6:26:36 PM9/17/12
to gen...@soe.ucsc.edu

Can the h19 UCSC human reference genome sequence be updated to use the current NC_012920.1 mitochondrial sequence (same as http://www.ncbi.nlm.nih.gov/nuccore/J01415.2 from MitoMap) which has been in the GRCh37 reference genome build since way back in patch release 2?

 

Currently hg19 contains the outdated NC_00180 human mitochondrial sequence instead, which causes many issues for people using the hg19 sequence as the reference sequence for aligning and calling variants from NGS DNA resequencing data. Specifically since those mitochondrial sequences differ in length and likewise mitochondrial gene positions, to correctly annotate mitochondrial variants, researchers need to know more than whether they used the current human reference genome – they need to know if they used hg19 or GRCh37 which is a detail many scientists who are analyzing variant data do not know.

 

The GRCh38 build next year of the reference genome will include NC_012920.1 too, so could the default hg19 download be updated to include the current mitochondrial sequence now to improve accuracy and consistency of human genome variant interpretation? If hg19 must remain fixed, could at least a warning be displayed to alert folks that the mitochondrial sequence is out of date and provide them with a download URL to hg19 with the current sequence – many folks do not realize they have outdated alignment data until long after they’ve aligned data relative to hg19 and discover alignment issues.

 

Sincerely,

Dan Richards

 

 

Brooke Rhead

unread,
Sep 19, 2012, 5:59:40 PM9/19/12
to Dan Richards, gen...@soe.ucsc.edu
Hi Dan,

We do display a note about chrM on our hg19/GRCh37 gateway page
(http://genome.ucsc.edu/cgi-bin/hgGateway?db=hg19):

"Note on chrM
Since the release of the UCSC hg19 assembly, the Homo sapiens
mitochondrion sequence (represented as "chrM" in the Genome Browser) has
been replaced in GenBank with the record NC_012920. We have not replaced
the original sequence, NC_001807, in the hg19 Genome Browser. We plan to
use the Revised Cambridge Reference Sequence (rCRS) in the next human
assembly release."

Unfortunately, we do not have the resources to devote to rebuilding the
hg19 genome browser (including all of the tracks) with the preferred
mitochondrial genome sequence. We would be reluctant to alter only
downloads files for hg19, as that might exacerbate the confusion about
which version of chrM is used where.

Your point about documenting the problem is well taken, however. We
have just added the chrM note to the README files on these three
downloads pages:

http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/
http://hgdownload.cse.ucsc.edu/goldenPath/hg19/chromosomes/
http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/

If you see some more places that documentation would be helpful, or if
you have other suggestions on how to mitigate the problem, please reply
to gen...@soe.ucsc.edu.

--
Brooke Rhead
UCSC Genome Bioinformatics Group


On 9/17/12 4:26 PM, Dan Richards wrote:
> Can the h19 UCSC human reference genome sequence be updated to use the
> current NC_012920.1 mitochondrial sequence (same
> ashttp://www.ncbi.nlm.nih.gov/nuccore/J01415.2 from MitoMap) which has
> --
>
>
>

Brooke Rhead

unread,
Sep 19, 2012, 7:14:01 PM9/19/12
to Dan Richards, gen...@soe.ucsc.edu
Hi again, Dan,

I want to clarify one thing regarding about this comment:

> they need to know if they used hg19 or GRCh37 which is a detail many
scientists who are analyzing variant data do not know

hg19 and GRCh37 (before any patches) are the same thing. We don't have
*any* patches in our downloads, and we don't display any patches in the
Genome Browser (aside from in the GRC Patch Release track on the
hg19/GRCh37 assembly).

--
Brooke Rhead
UCSC Genome Bioinformatics Group


Reply all
Reply to author
Forward
0 new messages