Re: [genome] danRer10 - UCSC Table browser - Ensembl Genes (Track)

39 views
Skip to first unread message

Luvina Guruvadoo

unread,
Sep 8, 2016, 4:45:22 PM9/8/16
to Neel Aluru, gen...@soe.ucsc.edu
Hello Neel,

Thank you for your question. There are a couple of ways to do this. The most efficient method would be to download the original GTF file from Ensembl, then use 'sed' to convert the Ensembl names into UCSC names with the following command:

wget -O /dev/stdout \
ftp://ftp.ensembl.org/pub/release-85/gtf/danio_rerio/Danio_rerio.GRCz10.85.gtf.gz \
  | zcat | sed -e 's/^\([0-9]\)/chr\1/; s/^MT/chrM/; s/^\(KN[0-9][0-9]*\).\([0-9]\)/chrUn_\1v\2/;' \
    | gzip -c > danRer10.ensGene.gtf.gz

Alternatively, you could use the Table Browser and select "selected fields from primary and related tables" as your output format, then join the fields from the ensGtp or ensemblToGeneName tables. This output, however, will not be in GTF format.

If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Regards,
Luvina

--
Luvina Guruvadoo
UCSC Genome Browser

http://genome.ucsc.edu




On Thu, Sep 1, 2016 at 12:45 PM, Neel Aluru <nal...@gmail.com> wrote:
Hello,

I am using UCSC table browser to obtain Ensembl Genes in GTF format. I am choosing "Genes and Gene Predictions" (group) --- "Ensembl Genes" (track) and "ensGene" (table).  But the resulting output contains ENSDART id's in both gene_id and transcript_id fields. Is it possible to get gene_id's (ENSDARG)?


I would really appreciate if you can point out how to get the ENSDARG id's.


Here is the sample output with ENSDARG and ENSDART id's in bold:

chr10	danRer10_ensGene	exon	33790782	33790808	0.000000	+	.	gene_id "ENSDART00000161430.1"; transcript_id "ENSDART00000161430.1"; 
chr10	danRer10_ensGene	start_codon	33800325	33800327	0.000000	+	.	gene_id "ENSDART00000161430.1"; transcript_id "ENSDART00000161430.1"; 
chr10	danRer10_ensGene	CDS	33800325	33800330	0.000000	+	0	gene_id "ENSDART00000161430.1"; transcript_id "ENSDART00000161430.1"; 
chr10	danRer10_ensGene	exon	33800238	33800330	0.000000	+	.	gene_id "ENSDART00000161430.1"; transcript_id "ENSDART00000161430.1"; 
chr10	danRer10_ensGene	CDS	33800334	33800346	0.000000	+	0	gene_id "ENSDART00000161430.1"; transcript_id "ENSDART00000161430.1"; 


Thank you,
Neel



--
Neel Aluru, Ph.D.
Assistant Scientist
Biology Department
Woods Hole Oceanographic Institution
Redfield building (Room # 304), MS#32
Woods Hole, MA 02543
USA

--


Reply all
Reply to author
Forward
0 new messages