My colleague run
humann2 --input $fastafile --output $workdir --metaphlan /home/test/humann2/metaphlan2/biobakery-metaphlan2-1f76aaeb2f66 --bowtie2 /home/test/metagenome/bowtie2/bowtie2-2.2.9 --diamond /home/test/.local/bin --threads 16
without specific --search-mode
in the folder "humann2/DB/uniref", we have
uniref50_annotated.dmnd
uniref90_annotated.dmnd
but you put in our document:
"--search-mode {uniref50,uniref90}
search for uniref50 or uniref90 gene families
[DEFAULT: based on translated database selected]"
How can we select the translated database ?
Moreover, in the output file type _genefamilies.tsv
we have some lines like that:
UniRef50_A0A015T442 6015.917931
UniRef50_A0A015T442|unclassified 6015.917931
UniRef90_A0A015T442 6015.917931
UniRef90_A0A015T442|unclassified 6015.917931
where we have some repetitions and some values with UniRef90 and UniRef50.
or somethimes we have only data from UniRef90
UniRef50_R6N0W6|unclassified 7.902537394
UniRef90_J9CWL1 3515.841598
UniRef90_J9CWL1|unclassified 3515.841598
UniRef50_R6HM03 3515.658306
UniRef50_R6HM03|g__Bacteroides.s__Bacteroides_dorei 3077.908288
Does it mean that humann2 run it on the two databases or if some data come from only UniRef50?
Should we need to rerun humann2 and how to specify clearly the database that we want to use?
Regards,
Tiphaine
Do you know if it is possible to split the humann2's files into two files (Uniref50 and Uniref90) or should we need to rerun from fasta files on only one database or can we do from data from metaphlan?
Sorry to ask you but it was run on 300 samples, It took about 3days per sample with a node of 64Go and 16CPU.
Regards,
Tiphaine