Query regarding Annotation variation

6 views
Skip to first unread message

Meenakshi Bagadia

unread,
Jul 5, 2016, 11:27:49 AM7/5/16
to gen...@soe.ucsc.edu
Hi


I have downloaded the enseGene file 
for horse for equcab2 assembly. This file will have ensembl transcript  id  with their coordinates.

Example : 734 ENSECAT00000028519.1 chrUn - 19588855 19588985 19588985 19588985 1

However< when i download the same data from ensembl genome browser. For same transcript Id 
I am having different chromosomal distribution.

Example : ENSECAG00000026507 ENSECAT00000028519 Un0020 122388 122517 -1 122388 122517 122517




these are the  cases, where the chromosome number is not numeric (eg. chr Un---).

I looked into the chromosomal distribution too , In UCSC , their is only 1 chromosome starting with chr Un , rest are either numeric or chrM, chrX. 

But in Ensembl database, their are multiple chromosome starting with chrUn--- with smaller lengths .

It seems like in Ensembl the chrUn is chopped down into multiple smaller fragments.


Can you please shed some light how exactly should I convert ensembl to ucsc coordinates .
Does ENSECAT00000028519 transcript id present on Un0020 122388 122517 -1  is same as present on chrUn - 19588855 19588985 , in terms of coordinates?? 


This annotation difference is present in other genomes too.
For example Dog

 ensGene file from Ucsc for CamFam3
585  ENSCAFT00000004562.2  chrUn_AAEX03021281 - 2065 11350 2065 11350  ENSCAFG00000002853.4
585  ENSCAFT00000015155.3  chrUn_AAEX03022712 + 14069 15009 15009 15009  ENSCAFG00000009543.3

Ensembl genome browser
ENSCAFG00000002853 ENSCAFT00000004562 AAEX03021281.1 21 11350 -1
ENSCAFG00000009543 ENSCAFT00000015155 AAEX03022712.1 14070 15009 1
 
second transcript id is same in terms of coordinates in both the versions but first one is varying.


I want the coordinates of ensembl genes in terms of ucsc chromosomal distribution as rest of my data is in terms of ucsc chromosomal distribution .



Meenakshi Bagadia
PhD student
IISER Mohali




Christopher Lee

unread,
Jul 14, 2016, 12:38:40 PM7/14/16
to Meenakshi Bagadia, UCSC Genome Browser Discussion List

Hi Meenakshi,

Thank you for your question about Ensembl transcripts in terms of UCSC coordinates.

The file you downloaded, http://hgdownload.soe.ucsc.edu/goldenPath/equCab2/database/ensGene.txt.gz, does
indeed contain the correct UCSC coordinates of Ensembl transcripts. The reason you do not see chromosomes
like chrUn0020 at UCSC is because we lumped all unplaced transcripts onto a single chromosome, which we
call chrUn.

That being said, we are still unsure of what exactly you are asking, and what you need these
coordinates for. Is it possible for you to provide a little more clarification as to what
you are looking for? You can reply to me personally if you don't want to share with the list.

Thank you again for your inquiry and using the UCSC Genome Browser. If you have any further
questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived
on a publicly-accessible forum. If your question includes sensitive data, you may send it instead
to genom...@soe.ucsc.edu.

Christopher Lee
UCSC Genomics Institute


--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser discussion list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.

Reply all
Reply to author
Forward
0 new messages