EnsEMBL updates may require some minor changes in GO-Elite?

4 views
Skip to first unread message

Gaj Stan (BIGCAT)

unread,
Oct 30, 2008, 6:49:20 AM10/30/08
to go-e...@googlegroups.com

Hey Nathan,

 

While manually updating my current EnsEMBL relationship databases, I came across a few changes in EnsEMBL that might need to be adressed:

 

- Changes on webpage:

·         You are now no longer able to select the one ‘GO ID’ box under the ‘EXTERNAL’ parameter in BioMART. This has been split in three seperate ‘GO ID’ boxes, each corresponding to a main GO category. This implies that you have to download 3 seperate files that you need to merge into one!!

·         “External Gene Symbol” under the ‘Attributes’ tag is now renamed to “Associated Gene Name”

o   Textual changes should be made at ‘Method B’ on the website.

 

- The layout of the EnsEMBL table (/databases/<species>/gene/Ensembl.txt) may change if you export it directly from BioMART (column order):

          GO-Elite v1.19:     ID                                 Symbol            Description      Chromosome   Species            Date     Remarks

          build 48:               Ensembl Gene ID         Description      External Gene ID

          build 50:               Ensembl Gene ID         Description      Associated Gene Name

 

   If this is not done in a proper way, the final output file will be messy (i.e. using descriptions instead of gene names - confirmed). This can be easily fixed by explaining that within BioMART the fields have to be selected in a specific order (thus point people out that they have to check first the gene name box and then the description box) and then all will be fine (:

 

Next to this, I’m still not able to create the EnsEMBL-MAPP.txt using the ‘Affymetrix CSV’ update window. It works perfectly when I put in the Affymetrix CSV file in the proper BuildDB directory. But doing this overwrites the Ensembl-Affymetrix relationship table and I only want to include the relationships from BioMART in my database (thus keep the old file). Although this is part of the local pathway analysis, and no high priority, I would love to know how I can recreate those gene-MAPP files without using the Affymetrix CSV (I don’t understand why this file is needed for that anyhow).

 

Best wishes,

 

  -- Stan

 

 

 

Nathan Salomonis

unread,
Oct 30, 2008, 1:41:03 PM10/30/08
to go-e...@googlegroups.com
Hi Stan,

You make a number of good points in terms of how to propperly make the BioMart database files which I will add to the documentation. You can only make the Ensembl-MAPP if you select the Wikipathways update option when also building databases using the Affymetrix CSV. Did you select this option when analyzing your CSV files?

Best,
Nathan

Gaj Stan (BIGCAT)

unread,
Oct 31, 2008, 3:03:13 AM10/31/08
to go-e...@googlegroups.com

Yes, I thought so too. The CSV updater is initiated with the following options:

* Species: Homo Sapiens

* Include gene associations from gene and uid-gene files in current directories: YES

* Extract gene-GeneOntology information from Affymetrix: NO

* Save options: OVERWRITE PREVIOUS

* Build Gene-MAPP associations from WikiPathways: YES

 

The EnsEMBL-Affy relationship table has been exported from BioMART and I would like to use that (since this has been updated more recently than the CSV on the Affy website). So for this exercise there is no CSV file present in my /BuildDBs/-dir, but we do have a fresh BioMART table in the /uid-gene/-dir.

 

After selecting the above options, a wikipathways TAB file is downloaded and then I get the following: http://ftp2.bigcat.unimaas.nl/~stan.gaj/goelite/screens/Screen01.jpg

 

  -- Stan

Nathan Salomonis

unread,
Oct 31, 2008, 3:18:11 AM10/31/08
to go-e...@googlegroups.com
Hi Stan,

If there are no actual valid Affymetrix CSV files in the directory “BuildDBs/Affymetrix/<species>/” then no gene-MAPP associations will be generated and it sounds like the existing gene-MAPP file will be over-written. I think I corrected this last issue (over-writing the file with nothing) in the most recent code I have here (not updated in the latest installer).

I realized today a way though that you can build a gene-MAPP file from WikiPathways without the Affymetrix CSV. The main issue is that only one gene ID is supplied from Wikipathways, which can be either EntrezGene, Ensembl, SwissProt, RefSeq or a MOD ID. This causes a problem if you want to get all these IDs converted to Ensembl or EntrezGene, so what GO-Elite does is it uses the gene system relationships for probesets as the equivalent to the GenMAPP gene database. This allows us to convert SwissProt to Ensembl, etc. I recommend downloading a bunch of Affymetrix CSV files for a species just for this purpose and choose not to over-write previous (this will save to the directory “NewDatabases”). Then you can just use the gene-MAPP associations. Your issue is that you don’t want Ensembl and EntrezGene relationships supplied by Affymetrix since they may not match the BioMart associations. To solve this, I may allow for a custom relationship file to be supplied by the user (also supplied as a default version from the Affy files) that will allow GO-Elite to make these inferred relationships from the Wikipathways file.

By the way, the reason I’m not too hot on using ONLY the BioMart associations, is that their method for linking probesets to Ensembl IDs is flawed.  See “Response from Ensembl Help Desk”  at:
http://groups.google.com/group/GenMAPP/browse_thread/thread/053a9238b341d1ca

By the way, is the screen shot from Linux? Does GO-Elite work OK for Linux? I haven’t tried yet.

Best,
Nathan

Gaj Stan (BIGCAT)

unread,
Oct 31, 2008, 3:41:47 AM10/31/08
to go-e...@googlegroups.com

Hey Nathan,

 

I was not aware about the EnsEMBL mapping issues and didn’t know WikiPathways supplies only one identifier (I always thought it was backed up with a gene database containing ID crosslinking). Thanks for bringing these points up, since it clarifies a lot!

 

So, from the official GO-elite’s point of view, the Affy CSV file is used as ‘highly recommended’ input then? If so, then I’d better stick to that as well and trust your judgement on this (since I don’t really have that much time to dig in the details here).

 

Thank you again for the wonderfull support after my waves of daily spam (:

 

Oh, and no. The screenshot was taken under Windows, but I do use GO-Elite under Ubuntu 6.06 LTS, and so I’ve encountered no real problems! Usually, if I come across any bug on my notebook (Windows), I try to reproduce it on the Linux system (workstation).

Nathan Salomonis

unread,
Oct 31, 2008, 1:17:52 PM10/31/08
to go-e...@googlegroups.com
Hey Stan,

From the stand-point of getting the richest GO and pathway links, it looks like EntrezGene is maybe the best (more gene associations to GO), followed by Ensembl. When just looking at Ensembl and links to probesets, those derived from the Affymetrix CSV seem to be more thorough than those from BioMart. However, I’m not sure if these rules hold up for all Affymetrix arrays (e.g., Gene 1.0 arrays).

I still try to grab the Ensembl-GO relationships from BioMart, since these should be better than derived from the Affymetrix CSV file.

Best,
Nathan
Reply all
Reply to author
Forward
0 new messages