CDS does not start with ATG(140/36558)

241 views
Skip to first unread message

wan...@genetics.ac.cn

unread,
May 8, 2013, 11:35:55 PM5/8/13
to gen...@soe.ucsc.edu
Hi, all!

         I downloaded from the table broswer refGene sequence file(sequence type: genomic) with only CDS Exons included. But I found that among the total 36558 entries, 140 are not starting with "ATG", which is supposed to be the initial string in CDS. For example, the file has: 

hg19_refGene_NM_001271872 range=chr1:206516200-206579957 5'pad=0 3'pad=0 strand=+ repeatMasking=none   GAAGGATCAGAATGTTCTCTCTCCAGTCAACTGCTGGAATCTCCTCTTAAACCAGGTGAAGCGGGAAAGCAGGGACCATACCACCCTGAGTGACATCTACCTGAATAATATCATTCCTCGATTTGTACAAGTCAGCGAGGACTCAGGAAGACTCTTTAAAAAGAGTAAAGAAGTCGGCCAGCAGCTCCAAGATGATTTGATGAAGGTCCTGAACGAGCTCTACTCGGTGATGAAGACATATCACATGTACAATGCCGACAGCATCAGTGCTCAGAGCAAACTAAAGGAGGCGGAGAAGCAGGAGGAGAAGCAAATTGGTAAATCGGTAAAGCAGGAGGACCGGCAGACCCCACGCTCCCCTGACTCCACGGCCAACGTTCGCATTGAGGAGAAACATGTCCGGAGGAGCTCAGTGAAGAAGATTGAGAAGATGAAGGAGAAGCGTCAAGCCAAGTACACGGAGAATAAGCTGAAGGCCATCAAAGCCCGGAATGAGTACTTGCTGGCTTTGGAGGCAACCAATGCATCTGTCTTCAAGTACTACATCCATGACCTATCTGACCTTATTGATTGTTGTGACTTAGGCTACCATGCAAGTCTGAACCGGGCTCTACGCACCTTCCTCTCTGCTGAGTTAAACCTGGAACAGTCGAAGCATGAGGGTCTGGATGCCATCGAGAATGCAGTAGAAAACCTGGATGCCACCAGTGACAAGCAGCGCCTCATGGAGATGTACAACAACGTCTTCTGCCCCCCTATGAAGTTTGAGTTTCAGCCCCACATGGGGGATATGGCTTCCCAGCTCTGTGCCCAGCAGCCTGTCCAGAGTGAGCTGGTACAGAGATGCCAACAACTGCAGTCTCGCTTATCCACTCTAAAGATTGAAAACGAAGAGGTAAAGAAGACAATGGAGGCCACCCTGCAAACCATCCAGGACATTGTGACTGTCGAGGACTTTGATGTGTCTGACTGCTTCCAGTACAGCAACTCCATGGAGTCCGTCAAGTCCACGGTCTCTGAAACCTTCATGAGCAAGCCCAGCATTGCTAAGAGGAGAGCCAACCAGCAAGAGACAGAGCAGTTTTATTTCACAGTAAGGGAGTGCTATGGCTTTTAA

        Is there something wrong? Or I missed something? Who could please explain it? Thank you very much!

        Another question is about the repeat mask. what exactly does this mean? 

Best, 

Yi



Pauline Fujita

unread,
May 9, 2013, 7:04:41 PM5/9/13
to wan...@genetics.ac.cn, UCSC Genome Browser discussion list
Hi Yi,

In the case of NM_001271872 the problem is that the human reference
hg19 has a deletion that includes the first exon of NM_001271872. This
has been repaired in a patch which you can see by turning on the "GRC
patch" and "GRC incident" tracks.

For the rest of the discrepancies you're seeing, they are likely to be
caused by a difference between the hg19 reference sequence and the
sequence of whomever contributed the mRNA to RefSeq (i.e. RefSeq
mRNA's come from many different individuals whose genome sequence
doesn't necessarily match the reference sequence, either because there
is an error in the assembly of the reference sequence, or because of
polymorphism in the population.)

Best regards,

Pauline Fujita
UCSC Genome Bioinformatics Group
http://genome.ucsc.edu
> --
>
>
>
Reply all
Reply to author
Forward
0 new messages