30 min run time just for demo.fastq

552 views
Skip to first unread message

hxz...@gmail.com

unread,
Jun 22, 2016, 2:14:06 PM6/22/16
to HUMAnN Users
Hi My name is Husen and I'm a new user of Humann2.

On our cluster, I requested 20 threads:

humann2 --input demo.fastq --output demo_output --threads 20


it takes about 30 min for the program to finish (successfully). Not bad, but our illumina nextseq generates 250,000,000 reads versus 12500 in the demo.fastq.

I observed that "running diamond" takes a long time.


Any tips to speed up things?

Thank you
Husen

Ali Rahnavard

unread,
Jun 22, 2016, 2:53:28 PM6/22/16
to humann...@googlegroups.com
Hi Husen,

Thank you for using HUMAnN2, the demo file is designed to be executable on your desktop or laptop, but for real samples we suggest:

1) the number of threads that that are requested should be less than number of cores you have on the machine/server that is used to run, otherwise could cause delay of switching threads between cores. 
2) You probably using he full diamond dataset with demo that needs ~30 minutes to load no matter what is the size of read files.
3) use a server with mutlti-cores and enough memory
4) Some parts of the program is parallelized that uses actually all the threads that are requested. 
5) for a sample size 250,000,000 reads using 8cores with --threads 8, ~30hrs cpu time + ~50G memory (as 8 cores are used) are needed.
6) "--bypass-translated-search"  option can speedup the run for a case that you decide to skip the translated search!

Thanks!
Ali

Lauren McIver

unread,
Jun 22, 2016, 4:09:02 PM6/22/16
to HUMAnN Users
Hi Husen,

For a quick demo run (around 3 minutes), run demo.fastq with the demo translated search database. Running demo.fastq with the full databases will take about 30 minutes. Please note the run time does not scale linearly with read count. The step that you saw that took a while to run (the translated alignment step) uses diamond which is not efficient for a small number of reads.  

Running 250 million reads with 8 cores is estimated to take 10-45 hours and use 30-60 GB of memory depending on the read set. The more reads that map to pangenomes the faster the run and smaller the memory. The time and memory requirements will be reduced if you use a filtered translated search database. However, a filtered translated database will not allow you to identify uncharacterized proteins.

Thanks!
Lauren

hxz...@gmail.com

unread,
Jun 22, 2016, 4:23:20 PM6/22/16
to HUMAnN Users
Hi Lauren, Ali,

Thank you for your prompt response! I am very interested in the option of using a "filtered translated search database". Any advice on how to generate such a database? I assume to use UniRef50 as a starting point?

Husen

Lauren McIver

unread,
Jun 22, 2016, 5:36:15 PM6/22/16
to HUMAnN Users
Hi Husen,

We have two filtered translated search databases available for HUMAnN2 (UniRef50 and UniRef90). To download and install the recommended filtered database (UniRef90), run the following (replacing $DIR with the location you would like to store the database):

$ humann2_databases --download uniref uniref90_ec_filtered_diamond $DIR

This command will update the humann2 config file to run with this database as the default selection.

Please let me know if you have any other questions!

Thanks!
Lauren

 

Husen Zhang

unread,
Jun 23, 2016, 9:33:48 AM6/23/16
to humann...@googlegroups.com

Lauren McIver

unread,
Jun 23, 2016, 12:06:15 PM6/23/16
to HUMAnN Users
Hi Husen,

It looks like you might have an older version of HUMAnN2 (v0.6.2 or prior). The new EC filtered databases were added in v0.7.0. If you update to the latest version ("$ pip install humann2 --upgrade"), you should see the filtered databases in the list of available databases.

Thanks!
Lauren
 

Husen Zhang

unread,
Jun 23, 2016, 5:00:17 PM6/23/16
to humann...@googlegroups.com
Hi Lauren,

Yes you are correct that I have v0.5.0. I couldn't do pip install
humann2 --upgrade since the upgrader seems trying to write to directories
where I don't have permission. Is there a way I can download the
"uniref90_ec_filtered_diamond" directly using wget?

Otherwise I'll stick to what i have , which is uniref50_GO_filtered.dmnd.

Husen

Lauren McIver

unread,
Jun 23, 2016, 7:19:20 PM6/23/16
to HUMAnN Users
Hi Husen,

If you do not have permission to upgrade the global install, try adding the "--user" option to the pip command. This will install HUMAnN2 to $HOME/.local/ . This way you will have the latest release which will be great!  

Alternatively you can download the database manually from this folder: http://huttenhower.sph.harvard.edu/humann2_data/uniprot/uniref_ec_filtered/ . After downloading the database decompress it. Then provide the folder in the option "--protein-database <folder>" when running humann2 (since you manually downloaded it the config file will not be updated to use this database as the default). Please note running a UniRef90 database with an older version of HUMAnN2 you will want to include the following options "--identity-threshold 90.0 --annotation-gene-index 7". In the newer versions of HUMAnN2, these modes are set for you by default based on the translated search database selected.

Please let me know if you have any additional questions!

Thanks!
Lauren


Husen Zhang

unread,
Jun 23, 2016, 11:08:35 PM6/23/16
to humann...@googlegroups.com
Hi Lauren:

well I followed the steps you outlined and upgraded to 0.7. "humann2_test"
said everything is ok. But, things worked in 0.5 are no long working.
The command run was:

humann2 -i demo.fastq -o out --nucleotide-database /path/to/chocophlan

Then the error:

"No species were selected from the prescreen.
Because of this the custom ChocoPhlAn database is empty."

any suggestions?
thanks again,

Husen

Lauren McIver

unread,
Jun 24, 2016, 3:01:40 PM6/24/16
to HUMAnN Users
Hi Husen,

It looks like the species found in the prescreen were not found in the ChocoPhlAn folder. Is the folder you are providing possibly a demo folder or a sub-set of the full folder? Now that you have the latest HUMAnN2 version (which is great!), you could delete your existing databases (nucleotide and protein) and download the latest full databases with the humann2_databases command. This will write to your config file so your default databases are all set. Alternatively you could search the existing ChocoPhlAn folder to see if the species found in the prescreen are included. The names of the files include the species names.

Thanks!
Lauren

Husen Zhang

unread,
Jun 24, 2016, 5:33:28 PM6/24/16
to humann...@googlegroups.com
Hi Lauren,

You are exactly right! The problem was I used a overly simple test.fastq which
contained no targets in chocophlan! The program works well with demo.fastq.
Thank you very much for your patience and help along!
Husen

Lauren McIver

unread,
Jun 24, 2016, 5:58:19 PM6/24/16
to HUMAnN Users
Hi Husen,

I am happy to help and glad to hear you are all set! Please let me know if you have any questions in the future.

Thanks!
Lauren

Reply all
Reply to author
Forward
0 new messages