Re: MetaPhlAN2 quality trimming

Francesco Asnicar

unread,

Apr 13, 2018, 11:06:54 AM4/13/18

to Lu Yang, MetaPhlAn-users

Hello,

Generally, trimming and merging are not a big deal for MetaPhlAn2, it could be however that you need to do these steps for other downstream analyses. so maybe it's better to perform them before starting.

For MetaPhlAn2 you can specify more than one input file, so no need to do the merging of the reads in advance.

In my opinion, performing a quality filtering step before running your metagenomes through MetaPhlAn2 is a good idea, so I would recommend performing any quality filtering step.

In details, for MetaPhlAn2 low-quality or short reads are not a problem. For short reads there is a threshold in MetaPhlAn2 below which the reads will be discarded ("--read_min_len", which by default is set to 70).

What can be a problem when MetaPhlAn2 does the mapping using bowtie2 are low complexity reads because bowtie2 can get stuck in finding the best hit.

I hope this helps you.

Many thanks,

Francesco

On Wed, Apr 11, 2018 at 3:29 PM Lu Yang <luyan...@gmail.com> wrote:

Hi,
I am new to MetaPhlAN, but I have scanned the manual online. It seems amazing.

Software has been installed. However, before I run my data. I came across a problem. Shall I need to trimming, quality filtering, and merging the reads before I run the MetaPhlAN2?

My data are shotgun data sequenced on Illumina 1.9 with paired-end reads 100bp on each ended. They are environmental samples. Each sample has 99,000,000 reads. My original purpose is to get a glimpse of what the taxonomy profiling is in the data.

Thanks in advance.

--
You received this message because you are subscribed to the Google Groups "MetaPhlAn-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to metaphlan-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Lu Yang

unread,

Apr 13, 2018, 1:55:03 PM4/13/18

to MetaPhlAn-users

Hi,

Thanks for the suggestion.

Then I run the code below.
metaphlan2.py --input_type fastq <(zcat R1.fastq.gz R2.fastq.gz) --bowtie2out.bowtie2.bz2 --nproc 10 > profiled.txt &

However, I got the error in the output profile. The output file is in the attachment.

But if I use the following code, no error found in the produced profile.
metaphlan2.py D2_1.fastq,D2_2.fastq --bowtie2out m.bowtie2.bz2 --nproc 55 --input_type fastq > profiled_metagenome.txt &

Since in my sense, there is no difference between the above two lines of code. Is there anything wrong?

Thanks a lot.

profiled.txt

Francesco Asnicar

unread,

Apr 16, 2018, 3:49:00 AM4/16/18

to Lu Yang, MetaPhlAn-users

Hi,

The error you should got should look like this:

"""

Help message for read_fastx.py

Traceback (most recent call last):

File "../../../../scratch/users/f.asnicar/hg/metaphlan2-install/metaphlan2/utils/read_fastx.py", line 123, in <module>

read_and_write_raw(f, opened=False, min_len=min_len)

File "../../../../scratch/users/f.asnicar/hg/metaphlan2-install/metaphlan2/utils/read_fastx.py", line 88, in read_and_write_raw

with fopen(fd) as inf:

File "../../../../scratch/users/f.asnicar/hg/metaphlan2-install/metaphlan2/utils/read_fastx.py", line 47, in fopen

return open(fn)

FileNotFoundError: [Errno 2] No such file or directory: '/dev/fd/63'

"""

And that's because of the read_fastx.py utility, which is not able to handle streams.

We will update the readme of MetaPhlAn2 to avoid to stream the input files.

Many thanks,

Francesco

Lu Yang

unread,

Apr 17, 2018, 11:53:56 PM4/17/18

to MetaPhlAn-users

在 2018年4月16日星期一 UTC-4上午3:49:00，Francesco Asnicar写道：

Hi,

May I follow up another question on quality control before running Metaphlan2?

As it suggested in HUMAnN2, quality filtered reads were suggested to be submitted to Humann2 running.

May I also use the kneaddata results as the input of the Metaphlan2 input? Since the outputs from kneaddata include paired_R1.fastq, paired_R2.fastq, unmatched_R1.fastq, unmatched_R2.fastq. Cat all these files into one fastq file as the input of Metaphlan2? OR just put the paired_R1 and paired_R2 as the input of Metaphlan2? If I cat all these files as one file, paired-end reads will be useless, is that correct?

Looking forward to your suggestions.

Thanks in advance.

Francesco Asnicar

unread,

Apr 18, 2018, 3:07:10 AM4/18/18

to Lu Yang, MetaPhlAn-users

Hi,

Yes, you can definitely use the results from kneaddata. You can avoid catting all the results into one file, you can just provide all of them in a comma-separated fashion (mind to not put spaces between commas and files) to MetaPhlAn2 like this:

$ metaphlan2.py metagenome_1.fastq,metagenome_2.fastq --bowtie2out metagenome.bowtie2.bz2 --nproc 5 --input_type fastq > profiled_metagenome.txt

Many thanks,

Francesco

Lu Yang

unread,

Apr 18, 2018, 2:52:37 PM4/18/18

to Francesco Asnicar, MetaPhlAn-users

Hi,

Thanks for the answer. So that means the unmatched_1.fastq and unmatched_2.fastq will not be sent to the Metaphlan2, am I right?

Again, thanks.☺

Best.

To unsubscribe from this group and stop receiving emails from it, send an email to metaphlan-users+unsubscribe@googlegroups.com.

Francesco Asnicar

unread,

Apr 19, 2018, 5:21:25 PM4/19/18

to Lu Yang, MetaPhlAn-users

Hi,

No, on the contrary, I suggest you to use them as well in the MetaPhlAn2 analysis, maybe the above command was misleading as it was suggesting two input files. You can extend it to 4 inputs:

$ metaphlan2.py metagenome_1.fastq,metagenome_2.fastq,metagenome_3.fastq,metagenome_4.fastq --bowtie2out metagenome.bowtie2.bz2 --nproc 5 --input_type fastq > profiled_metagenome.txt

Many thanks,
Francesco

Reply all

Reply to author

Forward