load PfAM datafile error

34 views
Skip to first unread message

YTWU

unread,
Oct 23, 2013, 1:36:35 AM10/23/13
to jbiowh-...@googlegroups.com
 Hi,

I used the most recent 4.0.2 snapshot to load PfAM data files into MySQL under Windows.
Everything went fine until I encounter the following error:

INFO [main] 2013-10-22 12:38:54,814 -        MySQL:  Executing: INSERT INTO PfamARegFullSignifica
  FROM pfamA_reg_full_significant
   INFO [Thread-0] 2013-10-22 12:39:59,811 - JPA Check: 2013/10/22 12:39:59 message: connection OK
   INFO [main] 2013-10-22 12:44:47,997 -        MySQL: 39407591 elements modified
   INFO [main] 2013-10-22 12:44:47,998 -        MySQL:  Executing: INSERT INTO PfamARegFullSignifica
 model_end,domain_bits_score,domain_evalue_score,sequence_bits_score,sequence_evalue_score,cigar,in_
 SELECT t.WID,p.WID,q.Protein_WID,s.auto_pfamseq,s.seq_start,s.seq_end,s.ali_start,s.ali_end,s.model
 s.domain_evalue_score,s.sequence_bits_score,s.sequence_evalue_score,s.cigar,s.in_full,s.tree_order,
  INNER JOIN PfamSeq_has_Protein q on q.auto_pfamseq = s.auto_pfamseq s.auto_pfamA_reg_full
  ERROR [main] 2013-10-22 12:44:48,257 - Duplicate entry '828119572' for key 'PRIMARY'

How can I fix this? Thank you

YT Wu


Roberto Vera Alvarez

unread,
Oct 25, 2013, 5:11:46 AM10/25/13
to jbiowh-...@googlegroups.com

Hi,

 

Please, download our last JAR file from this page:

 

https://code.google.com/p/jbiowh/wiki/DownloadWiki?tm=2

 

Please, note that we introduced some changes after version 5.0.0 to the JBioWH framework.

 

You will have a JAR file (jbiowh-parser-<version>.jar) for insert the data to the relational schema and another JAR file (jbiowh-desktop-<version>.jar) fro the Desktop Client.

 

Try, to use this new version and let me know the result.

 

Regards,

Roberto

--
You received this message because you are subscribed to the Google Groups "jbiowh-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jbiowh-discus...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



YTWU

unread,
Oct 29, 2013, 10:34:02 PM10/29/13
to jbiowh-...@googlegroups.com
Same error points to a duplicated primary key. Please refer to the attached log file.
Trying to reinstall from beginning.

Regards,
YT



YTWU於 2013年10月23日星期三UTC+8下午1時36分35秒寫道:
jbiowh-loader.log

YTWU

unread,
Oct 31, 2013, 1:02:12 AM10/31/13
to jbiowh-...@googlegroups.com
I reinstalled the PfAM data to a clean database successfully,
the table "PfamARegFullSignificant" pointed by the previous error is empty.
I continued install DrugBank data smoothly. However, the next move, installation of GeneBank, the parser just finished at 6 seconds, with the following message. Nothing in the genebank was inserted to db tables.

F:\temp\jbiowh>java -jar e:\MySQL_wb\jbiowh-parser-5.0.2.jar -i e:\MySQL_wb\biowh\genebank.xml
 INFO [main] 2013-10-31 12:44:32,259 - Setting variables from the XML file
   INFO [main] 2013-10-31 12:44:32,269 - Opening JPA connection to:
   INFO [main] 2013-10-31 12:44:32,279 -        Driver: com.mysql.jdbc.Driver
   INFO [main] 2013-10-31 12:44:32,279 -        URL: jdbc:mysql://localhost:3309/biowh
   INFO [main] 2013-10-31 12:44:32,279 -        User: cheminfor_rw
   INFO [main] 2013-10-31 12:44:32,279 - Adding the WHDBMSFactory to the Map
   INFO [main] 2013-10-31 12:44:35,718 - Open EntityManagerFactory: org.eclipse.persistence.internal
.jpa.EntityManagerFactoryImpl@157eec5b
   INFO [main] 2013-10-31 12:44:35,727 - Parsing a GeneBank data source
   INFO [Thread-0] 2013-10-31 12:44:38,374 - JPA Check: 2013/10/31 12:44:38 message: connection OK
   INFO [main] 2013-10-31 12:44:38,394 - DataSet: GenBank is inserted with WID = 2
   INFO [main] 2013-10-31 12:44:38,404 - Global WID = 66004425
   INFO [main] 2013-10-31 12:44:38,404 - Truncating table: GeneBank
   INFO [main] 2013-10-31 12:44:38,404 -        MySQL: Open the MySQL connection
   INFO [main] 2013-10-31 12:44:38,424 -        MySQL: Connection 'com.mysql.jdbc.JDBC4Connection@52
398cda' successfully open
   INFO [main] 2013-10-31 12:44:38,424 -        MySQL:  Executing: TRUNCATE TABLE GeneBank
   INFO [main] 2013-10-31 12:44:38,434 -        MySQL: 0 elements modified
   INFO [main] 2013-10-31 12:44:38,434 - Truncating table: GeneBankAccession
   INFO [main] 2013-10-31 12:44:38,434 -        MySQL:  Executing: TRUNCATE TABLE GeneBankAccession
   INFO [main] 2013-10-31 12:44:38,434 -        MySQL: 0 elements modified
   INFO [main] 2013-10-31 12:44:38,434 - Truncating table: GeneBankCDS
   INFO [main] 2013-10-31 12:44:38,434 -        MySQL:  Executing: TRUNCATE TABLE GeneBankCDS
   INFO [main] 2013-10-31 12:44:38,444 -        MySQL: 0 elements modified
   INFO [main] 2013-10-31 12:44:38,444 - Truncating table: GeneBankCDSTemp
   INFO [main] 2013-10-31 12:44:38,444 -        MySQL:  Executing: TRUNCATE TABLE GeneBankCDSTemp
   INFO [main] 2013-10-31 12:44:38,444 -        MySQL: 0 elements modified
   INFO [main] 2013-10-31 12:44:38,444 - Truncating table: GeneBankCDSDBXref
   INFO [main] 2013-10-31 12:44:38,444 -        MySQL:  Executing: TRUNCATE TABLE GeneBankCDSDBXref
   INFO [main] 2013-10-31 12:44:38,454 -        MySQL: 0 elements modified
   INFO [main] 2013-10-31 12:44:38,454 - Truncating table: GeneBankFeatures
   INFO [main] 2013-10-31 12:44:38,454 -        MySQL:  Executing: TRUNCATE TABLE GeneBankFeatures
   INFO [main] 2013-10-31 12:44:38,454 -        MySQL: 0 elements modified
   INFO [main] 2013-10-31 12:44:38,454 - Truncating table: GeneBankCDS_has_GeneInfo
   INFO [main] 2013-10-31 12:44:38,454 -        MySQL:  Executing: TRUNCATE TABLE GeneBankCDS_has_Ge
neInfo
   INFO [main] 2013-10-31 12:44:38,464 -        MySQL: 0 elements modified
   INFO [main] 2013-10-31 12:44:38,464 -        MySQL:  Executing: ALTER TABLE GeneBankCDS_has_GeneI
nfo DISABLE KEYS
   INFO [main] 2013-10-31 12:44:38,464 -        MySQL: 0 elements modified
   INFO [main] 2013-10-31 12:44:38,464 -        MySQL:  Executing: ALTER TABLE GeneBank DISABLE KEYS

   INFO [main] 2013-10-31 12:44:38,474 -        MySQL: 0 elements modified
   INFO [main] 2013-10-31 12:44:38,474 -        MySQL:  Executing: ALTER TABLE GeneBankAccession DIS
ABLE KEYS
   INFO [main] 2013-10-31 12:44:38,474 -        MySQL: 0 elements modified
   INFO [main] 2013-10-31 12:44:38,474 -        MySQL:  Executing: ALTER TABLE GeneBankCDS DISABLE K
EYS
   INFO [main] 2013-10-31 12:44:38,474 -        MySQL: 0 elements modified
   INFO [main] 2013-10-31 12:44:38,474 -        MySQL:  Executing: ALTER TABLE GeneBankCDSDBXref DIS
ABLE KEYS
   INFO [main] 2013-10-31 12:44:38,474 -        MySQL: 0 elements modified
   INFO [main] 2013-10-31 12:44:38,484 -        MySQL:  Executing: ALTER TABLE GeneBankFeatures DISA
BLE KEYS
   INFO [main] 2013-10-31 12:44:38,484 -        MySQL: 0 elements modified
   INFO [main] 2013-10-31 12:44:38,574 -        MySQL:  Executing: ALTER TABLE GeneBankCDS_has_GeneI
nfo ENABLE KEYS
   INFO [main] 2013-10-31 12:44:38,574 -        MySQL: 0 elements modified
   INFO [main] 2013-10-31 12:44:38,574 -        MySQL:  Executing: ALTER TABLE GeneBank ENABLE KEYS
   INFO [main] 2013-10-31 12:44:38,584 -        MySQL: 0 elements modified
   INFO [main] 2013-10-31 12:44:38,584 -        MySQL:  Executing: ALTER TABLE GeneBankAccession ENA
BLE KEYS
   INFO [main] 2013-10-31 12:44:38,584 -        MySQL: 0 elements modified
   INFO [main] 2013-10-31 12:44:38,584 -        MySQL:  Executing: ALTER TABLE GeneBankCDS ENABLE KE
YS
   INFO [main] 2013-10-31 12:44:38,584 -        MySQL: 0 elements modified
   INFO [main] 2013-10-31 12:44:38,584 -        MySQL:  Executing: ALTER TABLE GeneBankCDSDBXref ENA
BLE KEYS
   INFO [main] 2013-10-31 12:44:38,594 -        MySQL: 0 elements modified
   INFO [main] 2013-10-31 12:44:38,594 -        MySQL:  Executing: ALTER TABLE GeneBankFeatures ENAB
LE KEYS
   INFO [main] 2013-10-31 12:44:38,594 -        MySQL: 0 elements modified
   INFO [main] 2013-10-31 12:44:38,604 - Updating DataSet: GenBank with WID = 2
   INFO [main] 2013-10-31 12:44:38,614 - Setting Global WID = 66004425
   INFO [main] 2013-10-31 12:44:38,624 - Total elapsed time: 6 s

Roberto Vera Alvarez

unread,
Oct 31, 2013, 5:14:14 AM10/31/13
to jbiowh-...@googlegroups.com

Hi,

 

The table "PfamARegFullSignificant" is filled using the Protein table. So, if you are inserting the PFam database before the Uniprot database, that belongs to the Protein modules, the table PfamARegFullSignificant will remain empty.

 

You should insert at least the Uniprot swissprot database into the Protein module and then reload the Pfam database.

 

For the GenBank database you should have the *.seq.gz files from the genbank release dir (ftp://ftp.ncbi.nlm.nih.gov/genbank/). For these file you should run the jbiowh-parser jar with this in the config file: <type>GeneBank</type>

 

After that, you should have the daily files, *.flat.gz files, from the GenBank daily dir (ftp://ftp.ncbi.nlm.nih.gov/genbank/daily-nc/) in another directory. For these file you should run the jbiowh-parser jar with this in the config file: <type>GeneBankUpdate</type>

Please, see this wiki page:

 

http://code.google.com/p/jbiowh/wiki/GenBankCF

 

You can't run the Genbank realease file with the daily updates because in the updates many genbank entries are updated. So, you need to load the release first and then run the update.

Let me know how it was.

 

Regards,

Roberto

Roberto Vera Alvarez

unread,
Oct 31, 2013, 5:33:02 AM10/31/13
to jbiowh-...@googlegroups.com

Hi,

 

I fixed a bug in the GenBank parser related with the release files extensions.

 

Please, download the last version (5.0.3) from our Download page:

 

https://code.google.com/p/jbiowh/wiki/DownloadWiki?tm=2

 

Regards,

Roberto

YTWU

unread,
Oct 31, 2013, 5:37:14 AM10/31/13
to jbiowh-...@googlegroups.com
Hi,

Here is my config file for genebank. And all *.seq.gz files are in the "E:\MySQL_wb\biowh\genebank". I was successful using 4.0.2 snapshot.

<warehouse>
    <name>GenBank</name>
    <type>GeneBank</type>
    <version>198.0</version>
    <homeurl>http://www.ncbi.nlm.nih.gov/genebank</homeurl>
    <releaseDate>10/15/2013</releaseDate>
    <database>biowh</database>
    <dbuser>cheminfor_rw</dbuser>
    <dbpassword>labywu-2</dbpassword>
    <directory>E:\MySQL_wb\biowh\genebank</directory>
    <temporal>F:\temp\jbiowh\</temporal>
    <driver>com.mysql.jdbc.Driver</driver>
    <url>jdbc:mysql://localhost:3309/</url>
    <xsdfiledef></xsdfiledef>
    <verbose>info</verbose>
    <droptables>true</droptables>
    <runlinks>true</runlinks>
</warehouse>

Roberto Vera Alvarez

unread,
Oct 31, 2013, 5:39:31 AM10/31/13
to jbiowh-...@googlegroups.com

Hi, Try the new version. I tested here with the last genbank release.

 

Your config file is fine. With the new version should work.

 

Regards,

Roberto

YTWU

unread,
Oct 31, 2013, 5:44:29 AM10/31/13
to jbiowh-...@googlegroups.com
Hi. Now it steps further, but error at below point. Same to the loader for daily update (*.flat.gz)

   INFO [main] 2013-10-31 17:36:40,900 - Creating file: GeneBank.tsv
   INFO [main] 2013-10-31 17:36:40,901 - Creating file: GeneBankAccession.tsv
   INFO [main] 2013-10-31 17:36:40,902 - Creating file: GeneBankCDS.tsv
   INFO [main] 2013-10-31 17:36:40,903 - Creating file: GeneBankCDSTemp.tsv
   INFO [main] 2013-10-31 17:36:40,903 - Creating file: GeneBankCDSDBXref.tsv
   INFO [main] 2013-10-31 17:36:40,904 - Creating file: GeneBankFeatures.tsv
   INFO [main] 2013-10-31 17:36:40,904 - Creating file: GeneBankCDS_has_GeneInfo.tsv
   INFO [main] 2013-10-31 17:36:40,905 - File: 0 of: 1901
   INFO [main] 2013-10-31 17:36:40,907 - Parsing file 1 E:\MySQL_wb\biowh\genebank\gbbct1.seq.gz
  ERROR [main] 2013-10-31 17:36:40,913 - Unparseable date: "15-MAY-2009"
  ERROR [main] 2013-10-31 17:36:40,913 - Error: java.text.ParseException: Unparseable date: "15-MAY-
2009"

Roberto Vera Alvarez

unread,
Oct 31, 2013, 5:52:18 AM10/31/13
to jbiowh-...@googlegroups.com

Hi.

 

I just load that file into my relational schema. Look at this:

 

mysql> select WID,LocusName,SeqLengh,MolType,Division,ModDate from GeneBank limit 10;

+-----------+-----------+----------+---------+----------+---------------------+

| WID | LocusName | SeqLengh | MolType | Division | ModDate |

+-----------+-----------+----------+---------+----------+---------------------+

| 389707821 | AB000100 | 2992 | DNA | BCT | 2009-05-15 00:00:00 |

| 389707825 | AB000106 | 1343 | rRNA | BCT | 1999-02-05 00:00:00 |

| 389707826 | AB000111 | 14024 | DNA | BCT | 2006-12-02 00:00:00 |

| 389707854 | AB000126 | 1728 | DNA | BCT | 1999-05-26 00:00:00 |

| 389707857 | AB000176 | 241 | DNA | BCT | 1999-05-26 00:00:00 |

| 389707860 | AB000177 | 241 | DNA | BCT | 1999-05-26 00:00:00 |

| 389707863 | AB000178 | 241 | DNA | BCT | 1999-05-26 00:00:00 |

| 389707866 | AB000179 | 241 | DNA | BCT | 1999-05-26 00:00:00 |

| 389707869 | AB000180 | 241 | DNA | BCT | 1999-05-26 00:00:00 |

| 389707872 | AB000181 | 241 | DNA | BCT | 1999-05-26 00:00:00 |

+-----------+-----------+----------+---------+----------+---------------------+

10 rows in set (0.14 sec)

 

mysql>

 

That, seems to be a problem of java and windows. Could you send me your java version? Run in your CMD this:

 

proteincuda1:config> java -version

java version "1.7.0_45"

Java(TM) SE Runtime Environment (build 1.7.0_45-b18)

Java HotSpot(TM) 64-Bit Server VM (build 24.45-b08, mixed mode)

proteincuda1:config>

 

Regards,

Roberto

Roberto Vera Alvarez

unread,
Oct 31, 2013, 6:07:30 AM10/31/13
to jbiowh-...@googlegroups.com

Hi, I used to jbiowh-parser-5.0.3.jar on a virtual machine with Windows XP and the java version:

 

C:\Documents and Settings\roberto\Desktop>java -version

java version "1.7.0_40"

Java(TM) SE Runtime Environment (build 1.7.0_40-b43)

Java HotSpot(TM) Client VM (build 24.0-b56, mixed mode, sharing)

 

and later I updated the java version to the last one:

 

C:\Documents and Settings\roberto\Desktop>java -version

java version "1.7.0_45"

Java(TM) SE Runtime Environment (build 1.7.0_45-b18)

Java HotSpot(TM) Client VM (build 24.45-b08, mixed mode, sharing)

 

And the parser runs well.

 

Please, review your java version.

 

Regards,

Roberto

On Thursday, October 31, 2013 02:44:29 AM YTWU wrote:

YTWU

unread,
Oct 31, 2013, 6:29:35 AM10/31/13
to jbiowh-...@googlegroups.com
Here is the java version.
Also, could that be because the "Language and Region" setup in my Windows.
Unzip the *.seq.gz files, there is a line with "15-MAY-2009" in the file.

F:\temp\jbiowh>java -version
java version "1.7.0_09"
Java(TM) SE Runtime Environment (build 1.7.0_09-b05)
Java HotSpot(TM) 64-Bit Server VM (build 23.5-b02, mixed mode)

Roberto Vera Alvarez

unread,
Oct 31, 2013, 6:36:05 AM10/31/13
to jbiowh-...@googlegroups.com

Hi,

 

Could you update your java version to the last release from this site:

 

http://www.oracle.com/technetwork/java/javase/downloads/index.html

 

The first entry on that file start with the line:

 

LOCUS AB000100 2992 bp DNA linear BCT 15-MAY-2009

 

As you can see the modification date is the 15-MAY-2009. The problem is with the java to parser the date using that format.

 

Yes, it could be due to your "Language and Region". Could you change it to US English and try again?

 

Let me know the result.

 

Regards,

Roberto.

YTWU

unread,
Oct 31, 2013, 7:00:36 AM10/31/13
to jbiowh-...@googlegroups.com
Hi Roberto,

After change the language and region to US English and the date time format, it is loading GeneBank data files now.
Thanks.

YT

Roberto Vera Alvarez

unread,
Oct 31, 2013, 7:02:03 AM10/31/13
to jbiowh-...@googlegroups.com

OK, fine.

 

Thank to you. I will try to fix that problem with the date parser and the time format in Windows.

 

Regards,

Roberto

Reply all
Reply to author
Forward
0 new messages