Hi Yi,
I'll try to answer each of your questions:
> I noticed that the [cdsStart, cdsEnd] interval could be just covered
> by first exon alone in some records in refGene, whose transcript
> actually include more than one exton. I thought this should be a
> rare phenomenon, or I misunderstood the field meaning?Does it have
> something to do with exonFrames field?
It is possible for only the first exon to contain coding sequence; the
other exons can be comprised of untranslated sequence. The exonFrames
field indicates which reading frame each coding exon is in.
> I also noticed that the cdsStart and cdsEnd could actually be the
> same in some records. How to understand this?
This means that the record is for a non-coding gene.
> The third question is what the value of the cdsStartStat and
> cdsEndStat fields mean?
The information comes from the CDS field of the Genbank records (such as
this one:
http://www.ncbi.nlm.nih.gov/nuccore/NM_000454?report=GenBank).
Here is an explanation from NCBI of the notation used in the CDS field:
http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord#CDSB
> The fourth question is about the score. what does this mean?
The score field is not used in this table.
> Is there more detialed and systematic description about the refGene
> table?
Our documentation of the track is here:
http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=refGene
The data comes from The RefSeq project:
http://www.ncbi.nlm.nih.gov/RefSeq/
I hope this is helpful. If you have further questions for UCSC, please
contact us again at
gen...@soe.ucsc.edu.
--
Brooke Rhead
UCSC Genome Bioinformatics Group
On 10/10/12 3:21 AM,
wan...@genetics.ac.cn wrote:
>
> Dear Friends,
>
>
> I read the table file and cheked the FAQs without answer to my
> question.
>
> I noticed that the [cdsStart, cdsEnd] interval could be just
> covered by first exon alone in some records in refGene, whose transcript
> actually include more than one exton. I thought this should be a rare
> phenomenon, or I misunderstood the field meaning?Does it have something
> to do with exonFrames field?
>
> I also noticed that the cdsStart and cdsEnd could actually be
> the same in some records. How to understand this?
>
> The third question is what the value of the cdsStartStat and
> cdsEndStat fields mean?
> The fourth question is about the score. what does this mean?
>
> Is there more detialed and systematic description about the
> refGene table?
>
> Thank you!
>
> Best Wishes,
>
> Yi
>
>
> *Database:* hg19 *Primary Table:* refGene *Row Count:* 43,726
> *Format description:* A gene prediction with some additional info.
> field example SQL type info description
> bin 612 smallint(5) unsigned range
> <
http://genome.ucsc.edu/cgi-bin/hgTables?hgsid=302924629&hgta_database=hg19&hgta_histoTable=refGene&hgta_doValueRange=bin>
> Indexing field to speed chromosome range queries.
> name NM_006781 varchar(255) values
> <
http://genome.ucsc.edu/cgi-bin/hgTables?hgsid=302924629&hgta_database=hg19&hgta_histoTable=refGene&hgta_doValueHistogram=name>
> Name of gene (usually transcript_id from GTF)
> chrom chr6_apd_hap1 varchar(255) values
> <
http://genome.ucsc.edu/cgi-bin/hgTables?hgsid=302924629&hgta_database=hg19&hgta_histoTable=refGene&hgta_doValueHistogram=chrom>
> Reference sequence chromosome or scaffold
> strand - char(1) values
> <
http://genome.ucsc.edu/cgi-bin/hgTables?hgsid=302924629&hgta_database=hg19&hgta_histoTable=refGene&hgta_doValueHistogram=strand>
> + or - for strand
> txStart 3614545 int(10) unsigned range
> <
http://genome.ucsc.edu/cgi-bin/hgTables?hgsid=302924629&hgta_database=hg19&hgta_histoTable=refGene&hgta_doValueRange=txStart>
> Transcription start position
> txEnd 3654041 int(10) unsigned range
> <
http://genome.ucsc.edu/cgi-bin/hgTables?hgsid=302924629&hgta_database=hg19&hgta_histoTable=refGene&hgta_doValueRange=txEnd>
> Transcription end position
> cdsStart 3614545 int(10) unsigned range
> <
http://genome.ucsc.edu/cgi-bin/hgTables?hgsid=302924629&hgta_database=hg19&hgta_histoTable=refGene&hgta_doValueRange=cdsStart>
> Coding region start
> cdsEnd 3653868 int(10) unsigned range
> <
http://genome.ucsc.edu/cgi-bin/hgTables?hgsid=302924629&hgta_database=hg19&hgta_histoTable=refGene&hgta_doValueRange=cdsEnd>
> Coding region end
> exonCount 14 int(10) unsigned range
> <
http://genome.ucsc.edu/cgi-bin/hgTables?hgsid=302924629&hgta_database=hg19&hgta_histoTable=refGene&hgta_doValueRange=exonCount>
> Number of exons
> exonStarts 3614545,3617947,3618422,361... longblob Exon start positions
> exonEnds 3614566,3617968,3618443,361... longblob Exon end positions
> score 0 int(11) range
> <
http://genome.ucsc.edu/cgi-bin/hgTables?hgsid=302924629&hgta_database=hg19&hgta_histoTable=refGene&hgta_doValueRange=score>
> score
> name2 C6orf10 varchar(255) values
> <
http://genome.ucsc.edu/cgi-bin/hgTables?hgsid=302924629&hgta_database=hg19&hgta_histoTable=refGene&hgta_doValueHistogram=name2>
> Alternate name (e.g. gene_id from GTF)
> cdsStartStat incmpl enum('none', 'unk', 'incmpl', 'cmpl') values
> <
http://genome.ucsc.edu/cgi-bin/hgTables?hgsid=302924629&hgta_database=hg19&hgta_histoTable=refGene&hgta_doValueHistogram=cdsStartStat>
> enum('none','unk','incmpl','cmpl')
> cdsEndStat cmpl enum('none', 'unk', 'incmpl', 'cmpl') values
> <
http://genome.ucsc.edu/cgi-bin/hgTables?hgsid=302924629&hgta_database=hg19&hgta_histoTable=refGene&hgta_doValueHistogram=cdsEndStat>
> enum('none','unk','incmpl','cmpl')
> exonFrames 1,1,1,1,1,1,1,1,2,1,1,1,1,0, longblob Exon frame {0,1,2}, or
> -1 if no frame for exon
>
>
>
>
> --
>
>
>