Question regarding multiple entries with same transcript ID (refseq ID)

104 views

Skip to first unread message

Dr.Rahul Nahar

unread,

Dec 7, 2012, 1:21:44 AM12/7/12

to gen...@soe.ucsc.edu

I had downloaded the refgene file (hg19 based co-ordinates) from the UCSC Table browser 2 days back.

However I see that there are multiple entries in it with same transcript ID and Gene Symbol but different chromosomal location and even strand info. I am not sure which one is correct / most updated and which one should I use for my annotations. Some examples are below. There are approximately 600 such transcripts.

NM_001005277 chr1 + 367658 368597 OR4F16

NM_001005277 chr1 - 621095 622034 OR4F16

NM_001005277 chr5 + 180794287 180795226 OR4F16

NM_001001722 chrY - 19990139 19992099 CDY2B

NM_001001722 chrY + 20137667 20139627 CDY2B

NM_000513 chrX + 153485202 153499470 OPN1MW

NM_000513 chrX + 153448084 153462352 OPN1MW

NM_001001435 chr17 + 34538467 34540274 CCL4L1

NM_001001435 chr17 + 34640033 34641840 CCL4L1

Could you please suggest how to obtain a file with unique chromosomal location entry for each refseq/transcript ID.

Thanks

Regards

Rahul Nahar, PhD

Scientist

Ocimum Biosolutions

Hyderabad, India.

This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender by e-mail and delete the original message. Further, you are not to copy, disclose, or distribute this e-mail or its contents to any other person and any such actions that are unlawful. This e-mail may contain viruses. Ocimum Biosolutions has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment.

The information contained in this email and any attachments is confidential and may be subject to copyright or other intellectual property protection. If you are not the intended recipient, you are not authorized to use or disclose this information, and we request that you notify us by reply mail or telephone and delete the original message from your mail system.

OCIMUMBIO SOLUTIONS (P) LTD

Luvina Guruvadoo

unread,

Dec 7, 2012, 6:12:54 PM12/7/12

to Dr.Rahul Nahar, gen...@soe.ucsc.edu

Hi Rahul,

The duplicate entries you are seeing are from refSeq genes that aligned to more than one location. There are many such entries. Note this part of the RefSeq track description (http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=refGene): "RefSeq RNAs were aligned against the human genome using blat; those with an alignment of less than 15% were discarded. When a single RNA aligned in multiple places, the alignment having the highest base identity was identified. Only alignments having a base identity level within 0.1% of the best and at least 96% base identity with the genomic sequence were kept."

This previously answered question provides some more information:
https://lists.soe.ucsc.edu/pipermail/genome/2010-November/024242.html

I hope this helps. If you have further questions please feel free to contact the mailing list again at gen...@soe.ucsc.edu.

---
Luvina Guruvadoo
UCSC Genome Bioinformatics Group