Database Creation

76 views
Skip to first unread message

Gabriel Amorim

unread,
Jul 27, 2020, 2:40:02 PM7/27/20
to CLARK Users
Hello,
I am trying to use CLARK, more specifically CLARK-S, for the first time and I've been having some issues while creating the database.  I downloaded the genomes for Bacteria, Fungi, Virus and Protozoa and when I tried constructing the 31-mers database through classify_metagenome.sh it ran for about over a day and then the server suddenly  crashed with no information whatsoever. I tried it another time and  It crashed again without any error messages. After a 3rd time and about 2 days later it managed to finish it without crashing. Then I tried running buildSpacedDB.sh and again after about a day it crashed without any warning. Since it was crashing too much I gave up on it and tried on a smaller database. So I tried constructing only a Fungi and Viral database and used the same metagenome from before on classify_metagenome.sh and it went just fine without any error, warnings or crashes, CLARK-S ran fine and no issues on this database. I ran everything on default parameters
I suspect its a problem with the RAM usage, but I'm using a server with 480gb RAM and as far as I've read it should be more than enough.
Is it not enough RAM or some other problem?  Can someone help me?
Thanks!

Rachid

unread,
Jul 27, 2020, 2:42:33 PM7/27/20
to CLARK Users
Hello Gabriel,

Please could you follow the guidelines for reporting issues? here:

Best,
Rachid

Gabriel Amorim

unread,
Jul 27, 2020, 7:55:45 PM7/27/20
to CLARK Users
Hello,  sorry I didn't put it in the first place.

1) Gabriel Amorim de Albuquerque Silva, gabrielam...@gmail.com
2) Bacteria, Fungi, Viral and Protozoa standard databases. As I've stated, I had no problems running the classification on the smaller database, but rather on the creation of the larger database. If necessary though I can ask the PI for authorization on sharing the metagenomic data.
3) ./classify_metagenome.sh -P <FOWARD> <REVERSE> -R <OUTPUT> -m 0 -n 5 --gzipped .There was no log, the server crashed and had to be rebooted.
4) Linux mendel-main 4.15.0-55-generic #60-Ubuntu SMP Tue Jul 2 18:22:20 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux. It has 490GB of RAM and 48 cores.

If it's worth mentioning, while it was still running I checked the RAM usage, and it was about 220GB.
Thanks for your response.

Rachid

unread,
Jul 27, 2020, 7:57:08 PM7/27/20
to CLARK Users
Thank you! 
Please provide the log.
It is printed on the terminal when you execute the scripts. If nothing is printed then please contact your Helpdesk.

Gabriel Amorim

unread,
Aug 6, 2020, 9:20:39 PM8/6/20
to CLARK Users
Hello again Rachid.

I tried creating the database again but this time I only selected Bacteria, as to reduce RAM usage. It took 32.1 h to complete "./classify_metagenome.sh -P <FOWARD> <REVERSE> -R <OUTPUT> -m 0 -n 5 --gzipped " but it did so with no errors. Then I ran the buildSpacedDB.sh script and it took 47.82 h, also completed with no error this time.

I must add though, and question if it is normal, but in the CLARK documentation, in the Software and System Requirements, it's said CLARK takes about 156 GB to build the database from Bacteria, however in my running it took a max of 335.62 GB for the 31-kmers database construction and 227.16 GB for the spaced k-mers. It's also said it takes about 101 GB for the CLARK-S classification, and even though it's not finished yet it has taken over 250 GB of RAM in my process.

Thank you for your support.

Rachid

unread,
Aug 6, 2020, 9:26:57 PM8/6/20
to CLARK Users
Hello Gabriel,

Thank you for sharing this.
The discrepancy you describe is due to the fact that the NCBI RefSeq database was updated to a now a much bigger database to process than when CLARK was published in 2015.
If you retry with all the databases you selected then the RAM usage and time should not be very far from those you indicated for Bacteria only.
Note the database creation is a single-threaded process.

Best,
Rachid

Gabriel Amorim

unread,
Aug 17, 2020, 4:38:48 PM8/17/20
to CLARK Users
Hi,

I imagined that could be the reason. If I may suggest, it would be nice to update the Software and System Requirements in the documentation with the new RAM usage estimate so that other users know better if they can run CLARK.
Thank you again for the support.

Regards,
Gabriel.
Reply all
Reply to author
Forward
0 new messages