Gene Ontologies annotation incomplete in Biomart?

383 views
Skip to first unread message

Joe Dougherty

unread,
Jan 29, 2014, 9:28:12 AM1/29/14
to biomar...@googlegroups.com
Hi,

I was querying biomart through R to try and pull down all ensemble gene IDs associated with a particular GO term. - in this case muscle development
e.g.:
ensembl = useMart("ensembl",dataset="hsapiens_gene_ensembl") 
gene.data <- getBM(attributes=c('hgnc_symbol', 'ensembl_gene_id', 'go_id'), filters = 'go_id', values ="GO:0060537", mart = ensembl)

And I noticed this would only return one gene for me.  However, looking at the term GO:0060537 at the Amigo portal returns a much larger number of homo sapiens genes.

I next checked at the biomart webportal, again just querying with GO:0060537, and again, it only returns one gene (Srpk3).  Does Biomart not contain the full GO annotation? Or is there some fluke just with this one GO term?

Joe



Arek Kasprzyk

unread,
Jan 30, 2014, 8:09:38 AM1/30/14
to Joe Dougherty, biomar...@googlegroups.com
Hi Joe,
Quick lookup on the Ensembl site with GO:0060537


returns a bunch of transcripts that belong to the SRPK3 gene. So it appears that BioMart Ensembl data is in sync with the Ensembl site.

If you need a clarification why Ensembl is not showing the same data as the Amigo portal, it is best if you contact Ensembl experts directly



a








--
You received this message because you are subscribed to the Google Groups "biomart-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to biomart-user...@googlegroups.com.
Visit this group at http://groups.google.com/group/biomart-users.
For more options, visit https://groups.google.com/groups/opt_out.



--


"In prosperity, our friends know us; in adversity, we know our friends"

― John Churton Collins 



Thomas Maurel

unread,
Jan 30, 2014, 8:13:30 AM1/30/14
to Joe Dougherty, biomar...@googlegroups.com
Dear Joe,

If you use the "go_id" filter, biomart will return all the genes linked to this GO term in ensembl. You can see on the amigo website that GO:0060537 is also only linked to the human gene SRPK3: http://amigo.geneontology.org/cgi-bin/amigo/term-assoc.cgi?term=GO:0060537&speciesdb=all&taxid=9606.
The other genes that you get from the amigo website are actually genes linked to children of the GO term GO:0060537. You can get these genes back in biomart by using the "go_parent_term" filter, this will return all the genes linked to GO:0060537 as well as the genes linked to the children terms.

gene.data <- getBM(attributes=c('hgnc_symbol', 'ensembl_gene_id', 'go_id'), filters = 'go_parent_term', values ="GO:0060537", mart = mart)

> gene.data[0:10,]
   hgnc_symbol ensembl_gene_id      go_id
1        FOXL2 ENSG00000183770 GO:0002074
2        PITX2 ENSG00000164093 GO:0002074
3      CACNA1S ENSG00000081248 GO:0002074
4         TBX3 ENSG00000135111 GO:0003167
5       NKX2-5 ENSG00000183072 GO:0003168
6         RYR2 ENSG00000198626 GO:0003220
7       NKX2-5 ENSG00000183072 GO:0003221
8        ZFPM2 ENSG00000169946 GO:0003221
9         DLL4 ENSG00000128917 GO:0003222
10       FOXH1 ENSG00000160973 GO:0003222

Hope this helps,
Thomas
--
Thomas Maurel
Bioinformatician - Ensembl Production Team
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
United Kingdom

Bernard Thienpont

unread,
Jun 11, 2014, 9:19:52 AM6/11/14
to biomar...@googlegroups.com, jod...@gmail.com, mau...@ebi.ac.uk
Hi Thomas,

I am experiencing a similar issue, as I would like to annotate each gene also with the parents of GO terms associated with it (and not only to the direct GO term it is annotated with).
gene.data <- getBM(attributes=c('hgnc_symbol', 'ensembl_gene_id', 'go_parent_term'), filters = 'go_parent_term', values ="GO:0060537", mart = mart)
doesn't work because go_parent_term is not a list attribute. Would you happen to know a simple work-around?

thanks heaps in advance for your help - have been biting my teeth on this for a few hours now...

Bernard

Thomas Maurel

unread,
Jun 11, 2014, 10:44:12 AM6/11/14
to Bernard Thienpont, biomar...@googlegroups.com, jod...@gmail.com
Dear Bernard,

I am afraid that this is not possible with the ensembl mart at the moment. The "go_parent_term" filter is actually using an other mart called "ontology_mart" in the background to get all the GO children terms. This is the reason why we don't have a "go_parent_term" attribute in the ensembl mart.

Regards,
Thomas

Bernard Thienpont

unread,
Jun 11, 2014, 10:50:41 AM6/11/14
to biomar...@googlegroups.com, bernardt...@yahoo.com, jod...@gmail.com, mau...@ebi.ac.uk
OK, thanks for your help.
Reply all
Reply to author
Forward
0 new messages