Trouble generating database of kmers

109 views
Skip to first unread message

Joe Russell

unread,
Jun 24, 2016, 3:30:19 PM6/24/16
to CLARK Users
Hello,

I am trying to set up CLARK-S on our cluster and am running the Step 0 and Step 1 scripts as outlined in the ReadMe.  I ran the set_targets.sh script as follows...

[jrussell@pandora]$ /home/src/CLARK-S/CLARKSCV1.2.3/set_targets.sh /home/src/CLARK-S/CLARKSCV1.2.3/DIR_DB bacteria viruses human --species

From what I can tell, I think this completed successfully.

I am now trying to build the database of discriminative 31-mers by classifying a paired-end illumina metagenomic dataset of ~20 million reads. I have launched this job via torque on a node with 16 CPUs and ~132GB of RAM. It has been running for over 24 hours and I do not see any output files being generated. Since CLARK is supposed to be fast, I am assuming something is not right. Below is my torque script...

#PBS -d /home/src/CLARK-S/CLARKSCV1.2.3

#PBS -e /home/jrussell

#PBS -o /home/jrussell

#PBS -m abe

#PBS -M jrus...@mriglobal.org

#PBS -l nodes=n004.cluster.com:ppn=16,walltime=604800

#PBS -q default



/home/src/CLARK-S/CLARKSCV1.2.3/classify_metagenome.sh -P /home/src/CLARK-S/CLARKSCV1.2.3/R1.fastq /home/src/CLARK-S/CLARKSCV1.2.3/R2.fastq -n 16 -R /home/src/CLARK-S/CLARKSCV1.2.3/result2




Could you please let me know where I may have gone wrong? Where should the k-mer database be being built? There was a folder generated in /DIR_DB during set_targets.sh named 'bacteria_viruses_human_0' but it is still empty.


I am very interested in having CLARK-S as part of my workflow and look forward to any advice you can provide.


Thanks!


-joe

Rachid Ounit

unread,
Jun 24, 2016, 4:41:15 PM6/24/16
to Joe Russell, CLARK Users
Hello Joe!

Thank you for your interest! CLARK is indeed fast, and it should have finished in less that 24hours. You said you can access a server with 16cores and 132 GB of RAM, is it correct?

If yes then 132 GB is not gonna be enough, about 160GB (and more) is needed, please take a look at our BMC genomics paper or our latest preprint in biorxiv:
(In the supplementary data you will find these informations). Without enough RAM space, your server might be stuck trying to swap data between Memory and disk, so that could explain why the program did not finish quickly...

Let me know, I am sure there is a way (e.g., we are working with a team to make CLARK running online).

Cheers,
Rachid
--
You received this message because you are subscribed to the Google Groups "CLARK Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clarkusers+...@googlegroups.com.
To post to this group, send email to clark...@googlegroups.com.
Visit this group at https://groups.google.com/group/clarkusers.
To view this discussion on the web visit https://groups.google.com/d/msgid/clarkusers/db40ee6a-938c-4a26-b6c5-ac2635812ecf%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages