Question regarding multiple entries with same transcript ID (refseq ID)

104 views
Skip to first unread message

Dr.Rahul Nahar

unread,
Dec 7, 2012, 1:21:44 AM12/7/12
to gen...@soe.ucsc.edu

Hi

 

I had downloaded the refgene file (hg19 based co-ordinates) from the UCSC Table browser 2 days back.

However I see that there are multiple entries in it with same transcript ID and Gene Symbol but different chromosomal location and even strand info.  I am not sure which one is correct / most updated and which one should I use for my annotations. Some examples are below. There are approximately 600 such transcripts.

 

NM_001005277    chr1    +       367658  368597  OR4F16

NM_001005277    chr1    -       621095  622034  OR4F16

NM_001005277    chr5    +       180794287       180795226       OR4F16

 

NM_001001722    chrY    -       19990139        19992099        CDY2B

NM_001001722    chrY    +       20137667        20139627        CDY2B

 

NM_000513       chrX    +       153485202       153499470       OPN1MW

NM_000513       chrX    +       153448084       153462352       OPN1MW

 

NM_001001435    chr17   +       34538467        34540274        CCL4L1

NM_001001435    chr17   +       34640033        34641840        CCL4L1

 

Could you please suggest how to obtain a file with unique chromosomal location entry for each refseq/transcript ID.

 

Thanks

Regards

--

Rahul Nahar, PhD

Scientist

Ocimum Biosolutions

Hyderabad, India.



This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender by e-mail and delete the original message. Further, you are not to copy, disclose, or distribute this e-mail or its contents to any other person and any such actions that are unlawful. This e-mail may contain viruses. Ocimum Biosolutions has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment.
The information contained in this email and any attachments is confidential and may be subject to copyright or other intellectual property protection. If you are not the intended recipient, you are not authorized to use or disclose this information, and we request that you notify us by reply mail or telephone and delete the original message from your mail system.

OCIMUMBIO SOLUTIONS (P) LTD

Luvina Guruvadoo

unread,
Dec 7, 2012, 6:12:54 PM12/7/12
to Dr.Rahul Nahar, gen...@soe.ucsc.edu
Hi Rahul,

The duplicate entries you are seeing are from refSeq genes that aligned to more than one location. There are many such entries. Note this part of the RefSeq track description (http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=refGene): "RefSeq RNAs were aligned against the human genome using blat; those with an alignment of less than 15% were discarded. When a single RNA aligned in multiple places, the alignment having the highest base identity was identified. Only alignments having a base identity level within 0.1% of the best and at least 96% base identity with the genomic sequence were kept."

This previously answered question provides some more information:
https://lists.soe.ucsc.edu/pipermail/genome/2010-November/024242.html

I hope this helps. If you have further questions please feel free to contact the mailing list again at gen...@soe.ucsc.edu.

---
Luvina Guruvadoo
UCSC Genome Bioinformatics Group
--
 
 
 

Reply all
Reply to author
Forward
0 new messages