I have learnt a lot from the forum and would like to say thank you for always taking time to respond to questions.
I am new to HUMAnN2 so please forgive me for any obvious errors on my part.
So, I successfully ran the demo so I know I should have 3 outputs.
I ran this script on my "kneaded" stool sample fastq files and below is an excerpt of the log to show the errors. It says the files (Minipath and Uniref) were not found, even though they was there.
As regards the output, I am not able to get past the *genefamilies.tsv files.
Why would I be able to get the demo to run but not my files? Could it be related to the full databases?
I am running this on a supercomputer - shared memory computer on Red Hat Enterprise Linux with 60 physical cores (120 logical cores with HyperThreading turned on), 64-bit Intel processors, and 512 GB of memory. It takes a long time ( ~ 8 hours) to get to the error for just one sample. I have about 203 more samples to run.
Other related questions:-
1) Is there a function to show levels above the genus like family, order and class for the gene families output?
2) I read that fungi are included in the Uniref database? Could the absence of eukaryotes be due to the way the sample were sequences?
3) if there are more efficient way of running large fastq files without waiting a whole day? Currently using the --threads options. Do you have any other suggestions?
Many thanks in advance for your help!
'Dupe
##SCRIPT###
humann2 \
--input xxxx_kneaddata.trimmed.fastq \
--output xxx/output \
--metaphlan xxxx/metaphlan2 \
--nucleotide-database xxxxxr/chocophlan \
--protein-database xxxx/uniref \
--o-log /xxx.log \
--memory-use maximum \
--threads 10
#####EXCERPT OF LOG#####
02/26/2018 03:25:44 AM - humann2.utilities - CRITICAL: Can not find file /xxxx/.local/lib/python2.7/site-packages/humann2/data/misc/map_uniref50_name.txt.bz2
02/26/2018 03:25:44 AM - humann2.store - DEBUG: Unable to read Names file: /xxxx/.local/lib/python2.7/site-packages/humann2/data/misc/map_uniref50_name.txt.bz2
02/26/2018 03:25:46 AM - humann2.humann2 - INFO: TIMESTAMP: Completed
02/26/2018 03:39:14 AM - humann2.utilities - CRITICAL: Can not find python module /xxxx/.local/lib/python2.7/site-packages/humann2/quantify/MinPath12hmp.py
02/26/2018 03:39:14 AM - humann2.utilities - CRITICAL: Can not find python module /xxxx/.local/lib/python2.7/site-packages/humann2/quantify/MinPath12hmp.py
02/26/2018 03:39:14 AM - humann2.utilities - CRITICAL: Can not find python module /xxxxr/.local/lib/python2.7/site-packages/humann2/quantify/MinPath12hmp.py
Output files will be written to: /xxxx/humann2-0.11.1/out
Running metaphlan2.py ........
Found g__Bacteroides.s__Bacteroides_dorei : 59.18% of mapped reads
Found g__Bacteroides.s__Bacteroides_vulgatus : 40.82% of mapped reads
Total species selected from prescreen: 2
Selected species explain 100.00% of predicted community composition
Creating custom ChocoPhlAn database ........
Running bowtie2-build ........
Running bowtie2 ........
Total bugs from nucleotide alignment: 2
g__Bacteroides.s__Bacteroides_vulgatus: 7336 hits
g__Bacteroides.s__Bacteroides_dorei: 8820 hits
Total gene families from nucleotide alignment: 3685
Unaligned reads after nucleotide alignment: 23.0666666667 %
Running diamond ........
Aligning to reference database: uniref90_annotated.1.1.dmnd
Total bugs after translated alignment: 3
g__Bacteroides.s__Bacteroides_vulgatus: 7336 hits
unclassified: 1079 hits
g__Bacteroides.s__Bacteroides_dorei: 8820 hits
Total gene families after translated alignment: 3785
Unaligned reads after translated alignment: 18.2476190476 %
Computing gene families ...
Computing pathways abundance and coverage ...
Output files created:
/xxxx/humann2-0.11.1/out/demo_genefamilies.tsv
/xxxx/humann2-0.11.1/out/demo_pathabundance.tsv
/xxxx/humann2-0.11.1/out/demo_pathcoverage.tsv