Info: Genome Elem not found

10 views
Skip to first unread message

Ayush Gupta

unread,
Apr 28, 2013, 7:33:06 PM4/28/13
to bio4j...@googlegroups.com
So I'm importing the the DB with modules GO, EnzymeDB, and Uniprot (both trembl and sprot). Its been running for 12 hours now with the message "INFO: GenomeElem not found for:" and the some protein accession. I was wondering if this is normal and how much longer can I expect it to be running? Is there anywhere where I can check the progress?

Thanks,
Ayush

Ayush Gupta

unread,
Apr 29, 2013, 3:46:28 AM4/29/13
to bio4j...@googlegroups.com
Like I opened all the log files but I just see a bunch of INFO messages. The GO actually has INFO about nodes and says DONE :) but all the others are just continuous info messages. How do I know how much is left to import and how long will it will take?

Pablo Pareja Tobes

unread,
Apr 29, 2013, 4:36:42 AM4/29/13
to bio4j...@googlegroups.com
Hi Ayush,

You are not importing RefSeq module, right?
If that's the case you should be be getting a lot of such messages since in the current version of Bio4j all RefSeq associations found in Uniprot KB entries are automatically searched in the DB so that this connection between the entry and the Genome Element is created, (in the next version there's a specific config XML file for importing Uniprot KB where you can specify which resources/subsets you want to include).

About how to check the progress of the importing process, there's no 'official' way to do so but there's a message printed out for every 10.000 or 100.000 entries imported in the DB (I don't remember the exact amount). You can check at Uniprot's website the total amount of proteins for the release you're importing and figure out how many of them are still missing.

HTH

Pablo



--
Has recibido este mensaje porque estás suscrito al grupo "bio4j-user" de Grupos de Google.
Para anular la suscripción a este grupo y dejar de recibir sus correos electrónicos, envía un correo electrónico a bio4j-user+...@googlegroups.com.
Para obtener más opciones, visita https://groups.google.com/groups/opt_out.
 
 



--
Pablo Pareja Tobes

Message has been deleted

Ayush Gupta

unread,
Apr 29, 2013, 5:06:32 AM4/29/13
to bio4j...@googlegroups.com
According to calculation it will take 700 to 800 more hours to complete. Is this right?

Pablo Pareja Tobes

unread,
Apr 29, 2013, 5:33:23 AM4/29/13
to bio4j...@googlegroups.com
Mmmm that sounds weird...
Could you share with us the files:
  • executions.xml
  • batchInserter.properties
As well as your computer specifications and the importing process command you launched?

Cheers,

Pablo

Ayush Gupta

unread,
Apr 29, 2013, 5:47:01 AM4/29/13
to bio4j...@googlegroups.com

command (in cygwin) :

jdk1.6.0_45/bin/java -jar ExecuteBio4jTool.jar executionsBio4j.xml &

computer:
processor: intel i5-2450M 2.50 gHZ
ram: 7.89 GB

My calculations could be wrong. But sprot stopped importing and that is 500k proteins and trembl has 34m proteins (i believe) so sprot started at 250 am and ended around 1030 pm. so about 20 hours times 68 is 1360 hours. However, I also did some calculations based on file size of the log files via comparisons and whatnot and those come out between 600 and 800 hours.
batchInserter.properties
executionsBio4j.xml
Message has been deleted

Ayush Gupta

unread,
Apr 30, 2013, 2:14:23 AM4/30/13
to bio4j...@googlegroups.com
After 24 hours of importing trembl (which started after 18 hours of importin sprot) 6.1m proteins were added. Trembl has 34 million so its going to take a total of 5-6 days?

Pablo Pareja Tobes

unread,
Apr 30, 2013, 3:34:24 AM4/30/13
to bio4j...@googlegroups.com
Hi Ayush,

Entries from SwissProt take longer to be imported since they have much more information included on average than the ones from TrEMBL.
Regarding the command it looks like you forgot to add -Xmx7G ?

Cheers,

Pablo

Ayush Gupta

unread,
Apr 30, 2013, 4:03:57 AM4/30/13
to bio4j...@googlegroups.com
Ok. And are the java libraries imported along with the database? If not, how do I get them? If so how do I get them with the necessary source attachments?

Pablo Pareja Tobes

unread,
Apr 30, 2013, 4:09:15 AM4/30/13
to bio4j...@googlegroups.com
Hi Ayush,

In order to both import the DB or use the API to traverse the DB you just need to include in your project the jar file Bio4j-0.8-jar-with-dependencies.jar 
You can download the source code for all files from Bio4j repository: https://github.com/bio4j/Bio4j

Cheers,

Pablo

Pablo Pareja Tobes

unread,
May 7, 2013, 4:46:38 AM5/7/13
to bio4j...@googlegroups.com
Hi Ayush,

I just looked it up and it should print a message every 10.000 proteins.

Cheers,

Pablo


On Mon, Apr 29, 2013 at 10:42 AM, Ayush Gupta <theayu...@gmail.com> wrote:
So the log file has a message for every 10,000 or 100,000 proteins?


On Monday, April 29, 2013 1:36:42 AM UTC-7, Pablo Pareja Tobes wrote:
Reply all
Reply to author
Forward
0 new messages