Hey all,
I set up a local blast2go database for our internal uses and came
across some "tricks n' tips" I wanted to share.
There's a document detailing the setup for a local b2g at
http://bioinfo.cipf.es/blast2go --> Downloads --> Local B2G database
installation guide (howto_v1.3.tar.gz).
I followed all of the steps in the document (v1.3 - 29.08.2007) with
the following changes:
1. Step 5c of Importing the data used for the mapping:
Got the following error using the sun-java6-jdk on Ubuntu 7.10:
$ javac -classpath . ImportPIR.java
Note: ImportPIR.java uses unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
After some searching online, this is because of how newer versions of
JDK handle Generics. By running javac -Xlint:unchecked -classpath .
ImportPIR.java, I was able to compile the java class (with a warning).
2. A new file from www.geneontology.org/external2go/ec2go was
downloaded (Feb 23, 2008).
When attempting to import the file (Step 3), the following error
occurs:
mysqlimport: Error: Duplicate entry 'EC:3.6.3.14 GO:00469-'
for key 1, when using table: ec2go
Checking the following:
$ cat ec2go | grep 3.6.3.14
EC:3.6.3.14 GO:0046961
EC:3.6.3.14 GO:0046933
These GO terms are both for "hydrogen ion transporting ATP(ase)
(synthase) activity, rotational mechanism -- the differences are in
parentheses.
Upon closer inspection of the ec2go file, there are other duplicates.
They are:
EC:2.5.1.46 GO:0034038
EC:2.5.1.46 GO:0050983
EC:2.7.1.119 GO:0008904
EC:2.7.1.119 GO:0034071
EC:3.5.99.7 GO:0008660
EC:3.5.99.7 GO:0034100
EC:3.6.3.15 GO:0046962
EC:3.6.3.15 GO:0046932
EC:3.6.5.1 GO:0003924
EC:3.6.5.1 GO:0005834
EC:3.6.5.3 GO:0003924
EC:3.6.5.3 GO:0006412
EC:3.6.5.4 GO:0003924
EC:3.6.5.4 GO:0005786
After manually comparing the 2 GO terms for each EC, I selectively
removed/deleted one of them. (WARNING: doing so may break previous
mappings.) Once this was accomplished, the ec2go file could be
imported into database (Step 3).
3. When importing the annex file to the database, a similar error of
duplicates was thrown. I decided to leave this as is and stopped
there. For accomplishing the Annex, I will just reset the database
connection to default.
General:
Fixes for the guide (Importing data for the Enzyme Code retrieval):
In Step 2a, instead of replacing all ">" with ";", replace " > " as
well as " ; " with ";" (remove white space).
Additonally, in Step 2c, remove the 2nd COLUMN (not row), as well as
the header information.
Hope that helps others.
JP