HUMAnN2 questions

429 views
Skip to first unread message

hutlab...@gmail.com

unread,
Feb 26, 2018, 4:50:40 PM2/26/18
to HUMAnN Users
Dear Eric, Lauren and HUMAnN2 team,

I have learnt a lot from the forum and would like to say thank you for always taking time to respond to questions.

I am new to HUMAnN2 so please forgive me for any obvious errors on my part.

So, I successfully ran the demo so I know I should have 3 outputs.
I ran this script on my "kneaded" stool sample fastq files and below is an excerpt of the log to show the errors. It says the files (Minipath and Uniref) were not found, even though they was there.
As regards the output, I am not able to get past the *genefamilies.tsv files.
Why would I be able to get the demo to run but not my files? Could it be related to the full databases?
I am running this on a supercomputer - shared memory computer on Red Hat Enterprise Linux with 60 physical cores (120 logical cores with HyperThreading turned on), 64-bit Intel processors, and 512 GB of memory. It takes a long time ( ~ 8 hours) to get to the error for just one sample. I have about 203 more samples to run.

Other related questions:-
1) Is there a function to show levels above the genus like family, order and class for the gene families output?
2) I read that fungi are included in the Uniref database? Could the absence of eukaryotes be due to the way the sample were sequences?
3) if there are more efficient way of running large fastq files without waiting a whole day? Currently using the --threads options. Do you have any other suggestions?

Many thanks in advance for your help!
'Dupe

##SCRIPT###
humann2 \
--input xxxx_kneaddata.trimmed.fastq \
--output xxx/output \
--metaphlan xxxx/metaphlan2 \
--nucleotide-database xxxxxr/chocophlan \
--protein-database xxxx/uniref \
--o-log /xxx.log \
--memory-use maximum \
--threads 10


#####EXCERPT OF LOG#####
02/26/2018 03:25:44 AM - humann2.utilities - CRITICAL: Can not find file /xxxx/.local/lib/python2.7/site-packages/humann2/data/misc/map_uniref50_name.txt.bz2
02/26/2018 03:25:44 AM - humann2.store - DEBUG: Unable to read Names file: /xxxx/.local/lib/python2.7/site-packages/humann2/data/misc/map_uniref50_name.txt.bz2
02/26/2018 03:25:46 AM - humann2.humann2 - INFO: TIMESTAMP: Completed


02/26/2018 03:39:14 AM - humann2.utilities - CRITICAL: Can not find python module /xxxx/.local/lib/python2.7/site-packages/humann2/quantify/MinPath12hmp.py
02/26/2018 03:39:14 AM - humann2.utilities - CRITICAL: Can not find python module /xxxx/.local/lib/python2.7/site-packages/humann2/quantify/MinPath12hmp.py
02/26/2018 03:39:14 AM - humann2.utilities - CRITICAL: Can not find python module /xxxxr/.local/lib/python2.7/site-packages/humann2/quantify/MinPath12hmp.py

Modupe Coker

unread,
Feb 26, 2018, 4:53:44 PM2/26/18
to HUMAnN Users
Sorry- i am not sure why my email address was referencing the hut...@gmail.com account.  My computer must have remembered the account from a workshop.

On Monday, February 26, 2018 at 4:50:40 PM UTC-5, 

Eric Franzosa

unread,
Feb 27, 2018, 9:06:39 AM2/27/18
to humann...@googlegroups.com
Hmmm, it looks like something might've changed in your environment between running the demo and running on real data. Out of curiosity, if you run the demo again, does it still work?

What sort of samples are you working with (environment + read depth)? For reference a human gut of 10M reads takes about an hour to analyze using 8 cores. HUMAnN2's runtime is essentially linear in the fraction of reads that get passed to translated search.

To your specific questions:

1) Yes, the infer_taxonomy script can show you stratifications at higher taxonomic levels (as well as guessing about the taxonomy of unclassified UniRef gene families). Check out this section of the manual:


2) My guess is that missing fungal reads is probably an artifact of extraction procedures rather than sequencing per se.

3) If your samples are very deeply sequenced (say, 100M reads), and a lot of reads are being passed to translated search, then this can still take a while to run. You have the option in HUMAnN2 of bypassing the translated search and just focusing on the reads that map to pangenomes. This will be much faster, but you won't be using all of your data. It's a more reasonable step (especially as a first pass) if your samples are well characterized.

Thanks,
Eric

Modupe Coker

unread,
Feb 27, 2018, 2:18:52 PM2/27/18
to HUMAnN Users
Thanks a lot, Eric!
I will run the demo again, use the -bypass translated search option on my files as well as try out infer_taxonomy script and let you know how it goes.

In the meantime, what's your best guess as to what the problem might be and what to do to "reset" my environment?

Modupe Coker

unread,
Mar 1, 2018, 11:01:35 AM3/1/18
to HUMAnN Users
Hi Eric, 
The demo run worked well!

bash-4.2$ humann2 --input examples/demo.fastq --output out  --metaphlan /xxxx/metaphlan2 --protein-database /xxxx/uniref

Output files will be written to: /xxxx/humann2-0.11.1/out


Running metaphlan2.py ........


Found g__Bacteroides.s__Bacteroides_dorei : 59.18% of mapped reads

Found g__Bacteroides.s__Bacteroides_vulgatus : 40.82% of mapped reads


Total species selected from prescreen: 2


Selected species explain 100.00% of predicted community composition



Creating custom ChocoPhlAn database ........



Running bowtie2-build ........



Running bowtie2 ........  


Total bugs from nucleotide alignment: 2

g__Bacteroides.s__Bacteroides_vulgatus: 7336 hits

g__Bacteroides.s__Bacteroides_dorei: 8820 hits


Total gene families from nucleotide alignment: 3685


Unaligned reads after nucleotide alignment: 23.0666666667 %



Running diamond ........



Aligning to reference database: uniref90_annotated.1.1.dmnd


Total bugs after translated alignment: 3

g__Bacteroides.s__Bacteroides_vulgatus: 7336 hits

unclassified: 1079 hits

g__Bacteroides.s__Bacteroides_dorei: 8820 hits


Total gene families after translated alignment: 3785


Unaligned reads after translated alignment: 18.2476190476 %



Computing gene families ...


Computing pathways abundance and coverage ...


Output files created:

/xxxx/humann2-0.11.1/out/demo_genefamilies.tsv

/xxxx/humann2-0.11.1/out/demo_pathabundance.tsv

/xxxx/humann2-0.11.1/out/demo_pathcoverage.tsv



jiao...@163.com

unread,
Mar 7, 2018, 8:59:48 PM3/7/18
to HUMAnN Users
在 2018年2月27日星期二 UTC+8上午5:50:40,hutlab...@gmail.com写道:
Dear Eric:
I also wonder why the humann2 running so long time.
my sample is gut micro biome about 2G reads as general shotgun sequencing
It has been run about 12 hours and still not completed.
The log file says it is now running the bowtie-built now.

Look forward to your reply and Thank you
Reply all
Reply to author
Forward
0 new messages