[Genome] get genes informations having ensemblID with mysql query

84 views
Skip to first unread message

Rispoli Rossella

unread,
Oct 5, 2011, 12:31:58 PM10/5/11
to gen...@soe.ucsc.edu
Hi,
We have a local installation of UCSC that was updated yesterday, and I
have problem querying the mysql DB and I would like to know if you can
help me.

I want use the mysql tables to retrieve informations starting from a
list of ensembl gene ID. To do this I use the tables: ensGene,
knownToEnsembl, kgXref with the following query:

>>select QUERY.name,QUERY.name2,QUERY.geneSymbol,QUERY.refseq from
(select X.*, G.* from ensGene as G, knownToEnsembl as KE, kgXref as
X where G.name=KE.value
and KE.name=X.kgID and G.name2='ensID') as QUERY;

For some of this ensemblId if I query the hg19 DB I don't get any
results, instead of if I query hg18 DB they can be found.

But when I search the same ensemblID through the web interface I see
that they are present in the ensembl gene track.
although in the visualization in hg19 the results are titled
EnsemblGene, in the hg18 EnsGene (I don't know if this may be relevant).

Is there anything missing in my query?

here are some of the ensemblID for which I see this problem:
ENSG00000026103
ENSG00000030110
ENSG00000104725
ENSG00000104774

Thanks in advances, best regard

Rossella

----------------------------------------------------------------------------
Rossella Rispoli,
Bioinformatics Core Group
Telethon Institute of Genetics and Medicine (TIGEM)
Via P. Castellino 111, 80131, Naples, Italy
Tel: +39 081 6132 498
Fax: +39 081 6132 351
Web: http://bioinformatics.tigem.it/
-----------------------------------------------------------------------------

Hiram Clawson

unread,
Oct 5, 2011, 1:40:46 PM10/5/11
to Rispoli Rossella, gen...@soe.ucsc.edu
Good Morning:

The examples you mention I believe exist both in hg18 and hg19.
This shell procedure obtains results from each database:

for G in ENSG00000026103 ENSG00000030110 ENSG00000104725 ENSG00000104774
do
echo -n "hg19 ${G}: "
hgsql -N -e "select X.*, G.* from ensGene as G, knownToEnsembl as KE,
kgXref as X where G.name=KE.value and KE.name=X.kgID and
G.name2=\"${G}\" limit 1;" hg19
echo -n "hg18 ${G}: "
hgsql -N -e "select X.*, G.* from ensGene as G, knownToEnsembl as KE,
kgXref as X where G.name=KE.value and KE.name=X.kgID and
G.name2=\"${G}\" limit 1;" hg18
done

However, please keep in mind. The Ensembl gene track has more
annotations than in the UCSC gene track.
Not all Ensembl gene annotations have a corresponding UCSC gene.
Not all UCSC genes have a corresponding Ensembl gene.

The counts are:

hg18 Ensembl genes v54 May 2009: 63,280, UCSC genes Aug 2009: 66,803, knownToEnsembl: 60,456
hg19 Ensembl genes v63 Jun 2011: 173,742, UCSC genes Oct 2009: 77,614, knownToEnsembl: 75,160

The knownToEnsembl counts are the number of UCSC genes that correspond to
an Ensembl transcript ID. A single UCSC gene can correspond to a number of
different Ensembl transcript IDs. The counts of the unique number of Ensembl transcript
IDs in the knownToEnsembl tables are: hg18: 30,209, hg19: 46,319
and the number of Ensembl transcripts in the table ensPep are: hg18: 47,509, hg19: 90,720
Therefore, the coverage of Ensembl genes via knownToEnsembl is:
hg18: 30209/47509 == %63, hg19: 46319/90720 == %51

You will not always find UCSC genes for Ensembl transcripts.

--Hiram
Reply all
Reply to author
Forward
0 new messages