[GOe database] EnsEMBL / Affymetrix outdated?

10 views
Skip to first unread message

Gaj Stan (BIGCAT)

unread,
Oct 29, 2008, 5:28:29 AM10/29/08
to go-e...@googlegroups.com, Eijssen L (BIGCAT)

Hello Nathan,

 

I’m messing a bit with the GO-Elite databases (more specific, the files: Ensembl.txt and Ensembl-Affymetrix.txt). My thought was to generate an annotation table that I could import into R to synchronize the Affymetrix/EnsEMBL annotation used in GO-Elite with my human data.

 

During this I found some oddities and I hope you can shed some light on them:

 

- For 207 Affymetrix IDs the GO-Elite tool is able to associate 170 of these unique IDs with an EnsEMBL ID coupled to 150 unique GO-processes. I’m running the latest GO-conversion tables, etc. Using my own table (derived from your databases), I can only associate 149 unique reporters with one or more EnsEMBL IDs. I’ve uploaded the files (input, denominator, mappfinder results, databases used) to http://ftp2.bigcat.unimaas.nl/~stan.gaj/goelite/bugs/102908/

 

- Regarding this, I thought there might have been an error in my script. During the debugging phase I noticed that the ‘Ensembl-Affymetrix”-table contained EnsEMBL IDs that were not present in the EnsEMBL.txt. I can’t seem to find a way to update the EnsEMBL database through the GO-Elite GUI, and will look in the documents if I can do it manually. A few examples of EnsEMBL IDs I can’t find:

 

ENSG00000197316      1555498_at

ENSG00000203964      1555547_at

ENSG00000165012      234686_at

 

If I look up those genes in the (current) EnsEMBL database I always found the following:

 

Ensembl gene: ENSG00000165012
gene ENSG00000165012 is no longer in the Ensembl database and it has not been mapped to any newer identifiers

 

Looking up the AffyIDs in EnsEMBL got me the following:

 

1555498_at     ENSG00000196914

1555547_at     No Hit

234686_at       ENSG00000218045

 

For me, this clearly indicates that the EnsEMBL-Affymetrix file is somehow outdated. I’ve tried updating the Affymetrix databases through the GUI, but that part claims that there is no new annotation file available.

 

Any suggestions on how to proceed?

 

  -- Stan

Nathan Salomonis

unread,
Oct 29, 2008, 11:57:19 AM10/29/08
to go-e...@googlegroups.com, Eijssen L (BIGCAT)
Hey Stan,

What you find is expected but quickly customized on your end if you like in one of two ways. First, the reason there are probeset-Ensembl relationships that are unsupported is that, as you know, Ensembl IDs get retired or changed. Affy-Ens is extracted from (A) BioMart or (B) Affymetrix CSV annotations files. For the DBs provided with GO-Elite, if you update the databases by adding new Affymetrix CSV files for that species to the BuildDB directory and select the ‘Include gene associations from gene and uid-gene files in current directories’ option, all old and new relationships will be present in the new file.

The same is true for the Ensembl gene and EntrezGene tables, but NOT for any of the gene-GO or gene-MAPP lists. The latter two are the most important, because this tells us how many genes are in a pathway/GO term. Thus even if there were a discrepancy in the Affy-Ens table, it shouldn’t matter since the Ens has to exist in the most to date Ens-GO table.

If you if want to make sure you are dealing with the most up-to-date annotations from Ensembl, download the Ensembl relationships from BioMart as described at:
http://groups.google.com/group/go-elite/web/how-to-make-new-species-databases-for-go-elite?
See “Method B)”

However, these lack reliable Affymetrix-Ensembl relationships (we can talk about why later) so to augement or replace these you can download CSV files from Affymetrix for that species and run the updater to make a new database of just these IDs. Still, some Ensembl relationships will be out-of-date from Affy’s end with those downloaded directly from Ensembl.

Does this address everything?

Best,
Nathan



On 10/29/08 2:28 AM, "Gaj Stan (BIGCAT)" <Stan...@BIGCAT.unimaas.nl> wrote:

Hello Nathan,
 
I’m messing a bit with the GO-Elite databases (more specific, the files: Ensembl.txt and Ensembl-Affymetrix.txt). My thought was to generate an annotation table that I could import into R to synchronize the Affymetrix/EnsEMBL annotation used in GO-Elite with my human data.
 
During this I found some oddities and I hope you can shed some light on them:
 
- For 207 Affymetrix IDs the GO-Elite tool is able to associate 170 of these unique IDs with an EnsEMBL ID coupled to 150 unique GO-processes. I’m running the latest GO-conversion tables, etc. Using my own table (derived from your databases), I can only associate 149 unique reporters with one or more EnsEMBL IDs. I’ve uploaded the files (input, denominator, mappfinder results, databases used) to http://ftp2.bigcat.unimaas.nl/~stan.gaj/goelite/bugs/102908/
 
- Regarding this, I thought there might have been an error in my script. During the debugging phase I noticed that the ‘Ensembl-Affymetrix”-table contained EnsEMBL IDs that were not present in the EnsEMBL.txt. I can’t seem to find a way to update the EnsEMBL database through the GO-Elite GUI, and will look in the documents if I can do it manually. A few examples of EnsEMBL IDs I can’t find:
 
ENSG00000197316      1555498_at
ENSG00000203964      1555547_at
ENSG00000165012      234686_at
 
If I look up those genes in the (current) EnsEMBL database I always found the following:
gene ENSG00000165012 is no longer in the Ensembl database and it has not been mapped to any newer identifiers
 
Looking up the AffyIDs in EnsEMBL got me the following:
 
1555498_at     ENSG00000196914 <http://www.ensembl.org/Homo_sapiens/geneview?gene=ENSG00000196914>
1555547_at     No Hit
234686_at       ENSG00000218045 <http://www.ensembl.org/Homo_sapiens/geneview?gene=ENSG00000218045>

Gaj Stan (BIGCAT)

unread,
Oct 30, 2008, 2:44:42 AM10/30/08
to go-e...@googlegroups.com

Hi Nathan,

 

Yup. Your answer covers it all. I understood the cause of this, but I was not able to update the Ens-Affy table using the CSV Affymetrix updater. I didn’t know that you made GO-Elite also compatible with BioMart output (but come to think of it, I actually should have thought about that (-; ) and will give that approach a try.

 

Thanks for the quick reply!

  -- Stan

Nathan Salomonis

unread,
Oct 30, 2008, 2:52:10 AM10/30/08
to go-e...@googlegroups.com
Was there a problem with the Affymetrix CSV update? This should work fine.
Best,
Nathan

Gaj Stan (BIGCAT)

unread,
Oct 30, 2008, 3:05:23 AM10/30/08
to go-e...@googlegroups.com

Well, grabbing updates from WikiPathways works fine. But, if I select:

- Include Gene associations ... : YES

- Extract gene-GeneOntology information from Affy: YES

- Save options: Overwrite previous

- Build gene-MAPP associations from WikiPathways: NO

And species Homo Sapiens.

 

When I start the update I immediately get the message “Finished Parsing the Latest Affymetrix CSV Annotations” (within <1 sec).  I don’t get the impression that it actually makes a connection to the affymetrix website (especially since both my firewall and TCPVIEW only show GO-Elite connect to something if I include the Wikipathways option). Is this normal behaviour? And no, GO-Elite is allowed to make internet connections (-;

Nathan Salomonis

unread,
Oct 30, 2008, 3:14:54 AM10/30/08
to go-e...@googlegroups.com
Hey Stan,

This is an important clarification. GO-Elite can’t actually grab the Affymetrix CSV annotations since the Affymetrix website requires a user to login before downloading. Thus, you need to go to the Affymetrix website and download the files to the appropriate folder prior to running the update. See “Option 1” at:
http://groups.google.com/group/go-elite/web/how-to-make-new-species-databases-for-go-elite?
Best,
Nathan

Gaj Stan (BIGCAT)

unread,
Oct 30, 2008, 3:21:01 AM10/30/08
to go-e...@googlegroups.com

Hi Nathan,

 

I started to figure that out while I was reading the database instructions. Updating the AFFX databases now!

Thanks for the quick replies!

Reply all
Reply to author
Forward
0 new messages