strainphlan errors in step 2

JQL

unread,

Sep 13, 2016, 8:44:14 AM9/13/16

to MetaPhlAn-users

Hi,

I tried to follow the tutorial for strainphlan. This thread is a bit too long so I decided to open a new thread.

I have samtools 0.1.19 and add dump_file.py in the path. It runs a bit long now before sending a different error. The error message didn't provide much useful information for me. Interestingly, when I ran the last "failed" command of dump_file.py. It finished without error.

Can someone please help? thanks John

$ sample2markers.py --ifn_samples ES4_MetaG_S5_L001_R1_001.sam.bz2 --input_type sam --output_dir marker/ --nproc 8 &
[1] 114110

$ Traceback (most recent call last):
File "/usr/local/metaphlan2/strainphlan_src/ooSubprocess.py", line 244, in wrapper
return f(*args, **kwargs)
File "/usr/local/metaphlan2/strainphlan_src/sample2markers.py", line 381, in run_sample
quiet=args['quiet'])
File "/usr/local/metaphlan2/strainphlan_src/sample2markers.py", line 306, in sam2markers
stderr=error_pipe)
File "/usr/local/metaphlan2/strainphlan_src/ooSubprocess.py", line 181, in chain
%(' | '.join(self.chain_cmds), return_code))
ooSubprocessException: Failed when executing the command: dump_file.py --input_file ES4_MetaG_S5_L001_R1_001.sam.bz2 | samtools view -bS - | samtools sort -o - marker/ES4_MetaG_S5_L001_R1_001.sam.bz2.bam.sorted | samtools mpileup -u - | bcftools view -c -g -p 1.1 -
return code: 255

Duy Tin Truong

unread,

Sep 13, 2016, 8:58:14 AM9/13/16

to JQL, MetaPhlAn-users

Hi John,

I have added the options "samtools_exe" and "bcftools_exe" to the latest version of sample2markers.py so that you can specify the correct samtools version. Can you try to specify explicitly the samtools and bcftools 0.1.19 and rerun that step?

Cheers,

Tin

--
You received this message because you are subscribed to the Google Groups "MetaPhlAn-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to metaphlan-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

JQL

unread,

Sep 13, 2016, 11:34:10 AM9/13/16

to MetaPhlAn-users, ql1...@gmail.com

Hi Tin,

thanks for the reply. I don't see options for "samtools_exe" or "bcftools_exe". See below. Does it mean I don't have the latest version? We installed the software not long ago.

Is "--sam2file_ext" what you meant?

John

[johnny@usslrlx0002 strainphlan]$ /usr/local/metaphlan2/strainphlan_src/sample2markers.py -h
usage: sample2markers.py [-h] --ifn_samples IFN_SAMPLES [IFN_SAMPLES ...]
[--ifn_markers IFN_MARKERS] --output_dir OUTPUT_DIR
[--nprocs NPROCS] [--min_read_len MIN_READ_LEN]
[--min_align_score MIN_ALIGN_SCORE]
[--min_base_quality MIN_BASE_QUALITY]
[--error_rate ERROR_RATE]
[--marker2file_ext MARKER2FILE_EXT]
[--sam2file_ext SAM2FILE_EXT] [--verbose]
--input_type {fastq,sam}

optional arguments:
-h, --help show this help message and exit
--ifn_samples IFN_SAMPLES [IFN_SAMPLES ...]
--ifn_markers IFN_MARKERS
--output_dir OUTPUT_DIR
--nprocs NPROCS
--min_read_len MIN_READ_LEN
--min_align_score MIN_ALIGN_SCORE
--min_base_quality MIN_BASE_QUALITY
--error_rate ERROR_RATE
--marker2file_ext MARKER2FILE_EXT
--sam2file_ext SAM2FILE_EXT
--verbose Show all information. Default "not set".
--input_type {fastq,sam}
The input type: fastq, sam. Sam files can be obtained
from the previous run of this script or
strainphlan.py).

Johnny Li

unread,

Sep 13, 2016, 1:52:32 PM9/13/16

to MetaPhlAn-users

I tried the following. Not sure that was what you instructed. thanks,

$ /usr/local/metaphlan2/strainphlan_src/sample2markers.py --ifn_samples ES4_MetaG_S5_L001_R1_001.sam.bz2 --input_type sam --output_dir marker/ --nproc 8 --samtools_exe 0.1.19 --bcftools_exe 0.1.19

usage: sample2markers.py [-h] --ifn_samples IFN_SAMPLES [IFN_SAMPLES ...]

[--ifn_markers IFN_MARKERS] --output_dir OUTPUT_DIR

[--nprocs NPROCS] [--min_read_len MIN_READ_LEN]

[--min_align_score MIN_ALIGN_SCORE]

[--min_base_quality MIN_BASE_QUALITY]

[--error_rate ERROR_RATE]

[--marker2file_ext MARKER2FILE_EXT]

[--sam2file_ext SAM2FILE_EXT] [--verbose]

--input_type {fastq,sam}

sample2markers.py: error: unrecognized arguments: --samtools_exe 0.1.19 --bcftools_exe 0.1.19

Duy Tin Truong

unread,

Sep 14, 2016, 2:13:33 AM9/14/16

to Johnny Li, MetaPhlAn-users

Hi Johnny,

I have just added those options yesterday to solve your problem. So, please update to the latest version. Besides, you should specify the path of the executable file for the version you want to use, e.g. --samtools_exe /usr/bin/samtools.0.1.19.

Thanks,

Tin

JQL

unread,

Sep 14, 2016, 5:29:18 AM9/14/16

to MetaPhlAn-users, ql1...@gmail.com

Thanks Tin. I will be out of town for a couple of weeks. Will ask our IT to install new version and will let you know.

Have a nice day!
John

JQL

unread,

Sep 26, 2016, 9:56:47 AM9/26/16

to MetaPhlAn-users, ql1...@gmail.com

Hi Tin,

I updated the latest sample2markers.py, and reran the step with your suggestion. Any thoughts?

[1]+ Running sample2markers.py --ifn_samples $home/ES4_MetaG_S5_L001_R1_001.sam.bz2 --input_type sam --output_dir $home/marker/ --nproc 8 --samtools_exe /usr/local/bin/samtools --bcftools_exe /usr/local/bin/bcftools &

$ Traceback (most recent call last):
File "/usr/local/metaphlan2/strainphlan_src/ooSubprocess.py", line 244, in wrapper
return f(*args, **kwargs)

File "/usr/local/metaphlan2/strainphlan_src/sample2markers.py", line 393, in run_sample
bcftools_exe=args['bcftools_exe'])
File "/usr/local/metaphlan2/strainphlan_src/sample2markers.py", line 314, in sam2markers

stderr=error_pipe)
File "/usr/local/metaphlan2/strainphlan_src/ooSubprocess.py", line 181, in chain
%(' | '.join(self.chain_cmds), return_code))

ooSubprocessException: Failed when executing the command: dump_file.py --input_file /data/I_data/shotgun/strainphlan/ES4_MetaG_S5_L001_R1_001.sam.bz2 | /usr/local/bin/samtools view -bS - | /usr/local/bin/samtools sort -o - /data/I_data/shotgun/strainphlan/marker/ES4_MetaG_S5_L001_R1_001.sam.bz2.bam.sorted | /usr/local/bin/samtools mpileup -u - | /usr/local/bin/bcftools view -c -g -p 1.1 -
return code: 255

the samtools appears to be the correct version:
$ samtools

Program: samtools (Tools for alignments in the SAM format)
Version: 0.1.19-44428cd

thanks
John

Duy Tin Truong

unread,

Sep 27, 2016, 2:33:45 AM9/27/16

to JQL, MetaPhlAn-users

Hi John,

Can you try to execute this command alone to see if everything is fine:

dump_file.py --input_file /data/I_data/shotgun/strainphlan/ES4_MetaG_S5_L001_R1_001.sam.bz2 | /usr/local/bin/samtools view -bS - | /usr/local/bin/samtools sort -o - /data/I_data/shotgun/strainphlan/marker/ES4_MetaG_S5_L001_R1_001.sam.bz2.bam.sorted | /usr/local/bin/samtools mpileup -u - | /usr/local/bin/bcftools view -c -g -p 1.1 -

Thanks,

Tin

Duy Tin Truong

unread,

Nov 1, 2016, 3:04:31 AM11/1/16

to Johnny Li, metaphl...@googlegroups.com

Hi John,

There were errors with raxml, maybe you can try with other phylogenetic tree builder like fasttree http://www.microbesonline.org/fasttree/ on the s__Lactobacillus_johnsonii.fasta file. In addition, you should use the combination of parameters set in "--relaxed_parameters", "--relaxed_parameters2", "--relaxed_parameters3" to reduce the stringency as setting "--marker_in_clade 0.2" needs to combine with other parameters as well.

Cheers,

Tin

On Fri, Oct 21, 2016 at 9:06 PM Johnny Li <ql1...@gmail.com> wrote:

Hi Tin,

I have 128 sequencing files, belonging to 8 samples. That is, each samples were sequenced in 8 lanes, PE (see below). they most likely from the same strain. I understand you said if less than 3 strains, then it can't do it. but why did it say "ERROR: Problem reading number of species and sites"?

Have a good weekend,
John

$ grep -c '>' s__Lactobacillus_johnsonii.fasta
17

$ grep '>' s__Lactobacillus_johnsonii.fasta
>MOCK_MetaG_S1_L003_R2_001
>MOCK_MetaG_S1_L004_R2_001
>MOCK_MetaG_S1_L006_R2_001
>MOCK_MetaG_S1_L008_R1_001
>MOCK_MetaG_S1_L001_R1_001
>MOCK_MetaG_S1_L005_R2_001
>MOCK_MetaG_S1_L002_R1_001
>MOCK_MetaG_S1_L002_R2_001
>MOCK_MetaG_S1_L006_R1_001
>MOCK_MetaG_S1_L004_R1_001
>MOCK_MetaG_S1_L003_R1_001
>MOCK_MetaG_S1_L007_R2_001
>MOCK_MetaG_S1_L008_R2_001
>GCF_L_johnsonii
>MOCK_MetaG_S1_L007_R1_001
>MOCK_MetaG_S1_L005_R1_001
>MOCK_MetaG_S1_L001_R2_001

ooSubprocess: raxmlHPC-PTHREADS-SSE3 -s /data/IBM_data/shotgun/strainphlan/trees/L_johnsonii2/s__Lactobacillus_johnsonii.fasta -w /data/IBM_data/shotgun/strainphlan/trees/L_johnsonii2 -n s__Lactobacillus_johnsonii.tree -p 1234 -m GTRCAT -T 16
ERROR: Problem reading number of species and sites
2016-10-21 13:36:46,734 | INFO | __main__ | build_tree | 1122 | Cannot build the tree! The number of samples is too few or there is some error with raxmlHMP
2016-10-21 13:36:47,118 | INFO | __main__ | strainer | 1511 | Finished!

On Fri, Oct 21, 2016 at 11:34 AM, Duy Tin Truong <duytin...@gmail.com> wrote:
Hi John,

How many samples did you use and can you count how many strain samples that could be constructed:
grep -c '>' s__Lactobacillus_johnsonii.fasta

If there were less than 3 strains, raxml could not build the tree. Besides, you can use --relaxed_parameters? option to reduce stringency.

Cheers,
Tin

On Fri, Oct 21, 2016 at 6:19 PM Johnny Li <ql1...@gmail.com> wrote:
forgot the link
ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/bacteria/Lactobacillus_johnsonii/all_assembly_versions/GCF_001714745.1_ASM171474v1

On Fri, Oct 21, 2016 at 11:14 AM, Johnny Li <ql1...@gmail.com> wrote:

Hi Tin,

I ran the strainphlan.py to create trees. See below. The script ran fine with no errors. But it doesn't create the tree file (RAxML_bestTree.s__Eubacterium_siraeum.tree) as mentioned in the tutorial. Instead, I got the following files: arguments.txt, s__Lactobacillus_johnsonii.fasta, s__Lactobacillus_johnsonii.info, s__Lactobacillus_johnsonii.marker_pos, and s__Lactobacillus_johnsonii.polymorphic. Do you know what I have done wrong?

In addition, here is where I got the genomic seq. from the NCBI refseq database (in gz, not bz2). Can you help?

thanks
John

strainphlan.py --ifn_samples marker/*.markers --ifn_markers marker_fasta_files/s__Lactobacillus_johnsonii_markers.fasta --ifn_ref_genomes GCF_L_johnsonii.fna.gz --output_dir trees/L_johnsonii --clades s__Lactobacillus_johnsonii --marker_in_clade 0.2 --nprocs_main 16

On Fri, Oct 21, 2016 at 2:11 AM, Duy Tin Truong <duytin...@gmail.com> wrote:
Hi John,

Blastn is required for adding the reference genomes and muscle is for marker alignment. Besides, you can produce all trees without adding reference genomes by specifying "--clades all", otherwise you have to run strainphlan for each species. To reduce the stringency, you can look at some suggestions in "--relaxed_paramters?".

Cheers,
Tin

On Thu, Oct 20, 2016 at 6:04 PM Johnny Li <ql1...@gmail.com> wrote:
Sorry I sent too soon.

is blastn requied by muscle alignment? I look at the muscle, it doesn't seem to require blastn. I wonder what blastn is for here?

thanks
John

]$ strainphlan.py --ifn_samples *.markers --ifn_markers s__Salmonella_enterica_markers.fasta --ifn_ref_genomes GCF_000353585.1_S._enterica_Tennessee_CDC07-0191_cds_from_genomic.fna.gz --output_dir . --clades s__Salmonella_enterica --marker_in_clade 0.2
2016-10-20 10:41:08,843 | ERROR | __main__ | check_dependencies | 1529 | Cannot find blastn in the executable path!

On Thu, Oct 20, 2016 at 10:56 AM, Johnny Li <ql1...@gmail.com> wrote:
Hi again,

I was doing this step, but got the error for

On Thu, Oct 20, 2016 at 10:36 AM, Johnny Li <ql1...@gmail.com> wrote:

Hi Tin,

I have a question perhaps quite naive. If I identified 10 species in clades.txt, can you extract markers for 10 species at once? In tutorial, it was shown one at a time.

It seems to me that strainphlan is doing one species at time for tree generation and viewing?

thanks
John

On Wed, Oct 19, 2016 at 10:48 AM, Duy Tin Truong <duytin...@gmail.com> wrote:
Hi John,

Yes, that is "--print_clades_only". In addition, you can change --marker_in_clade to 0.5.

Cheers,
Tin

On Wed, Oct 19, 2016 at 5:41 PM Johnny Li <ql1...@gmail.com> wrote:
Tin,

thanks. I don't see --print-clades option. There is one called --print_clades_only, which is probably not what you referred. what is your recommendation for the cutoff for --marker_in_clade option? 0.8 does seem to be quite stringent.

I may have more questions when I get to building trees.

thanks
John

On Wed, Oct 19, 2016 at 9:59 AM, Duy Tin Truong <duytin...@gmail.com> wrote:
Hi John,

The step of running extract_markers.py is necessary if you need to add reference genomes to your trees, otherwise you can skip it and for each clade that you need to add the reference genomes, you need to run that step. Besides, the "--print_clades" option will only print the clades with the ratio of present markers greater than 0.8 (can be changed --marker_in_clade), i.e. if a clade X has 200 markers, and the number markers present in a sample is 100, then the clade will not be printed. There are other options in strainphlan.py that can be seen by "-h" to reduce the stringency.
Finally, the all_markers.fasta contains all the markers of the database. If you want to extract consensus markers for each sample, you have to open the "*.markers" files produced by sample2markers.py by using msgpack.load(open("the_sample.markers")).

Hope this helps.

Cheers,
Tin

On Wed, Oct 19, 2016 at 4:36 PM Johnny Li <ql1...@gmail.com> wrote:
Hi Tin,

thanks you so much for being so patient. It works now. I now have a question on extracting clade specific markers.

after one identifies all clades in the samples: strainphlan.py --ifn_samples *.markers --output_dir . --print_clades_only > clades.txt.
if clades.txt have 100 species, does he need to run "extract_markers.py --mpa_pkl mpa_v20_m200.pkl --ifn_markers all_markers.fasta --clade s__Eubacterium_siraeum --ofn_markers s__Eubacterium_siraeum.markers.fasta" 100x (replacing s__Eubacterium_siraeum with different a different strain name)?

I guess one can use a for loop ...

BTW, I ran one of my samples, it generated a clades.txt file that contains the following. Is it normal to not have bacteria? If there is no bacteria, can I still go ahead with extract strain specific markers? In other words, does all_markers.fasta contain only bacterial markers?

$ more clades.txt
s__Abelson_murine_leukemia_virus
s__Avian_endogenous_retrovirus_EAV_HP
s__Avian_myelocytomatosis_virus
s__Porcine_type_C_oncovirus
s__Saccharomyces_cerevisiae_killer_virus_M1

thanks
John

On Sat, Oct 8, 2016 at 3:55 AM, Duy Tin Truong <duytin...@gmail.com> wrote:
Hi John,

It was included in the samtools 0.1.19:
https://sourceforge.net/projects/samtools/files/samtools/0.1.19/samtools-0.1.19.tar.bz2/download

Cheers,
Tin

Fan Li

unread,

Jun 12, 2019, 4:04:41 PM6/12/19

to MetaPhlAn-users

I ran into this error just now and found that it was due to having bcftools version 1.8 installed. Fixed by manually installing the old 0.1.19 version of bcftools from Sourceforge and providing the --bcftools_exe argument.

This issue on the bcftools Github was helpful:
https://github.com/samtools/bcftools/issues/391

> To unsubscribe from this group and stop receiving emails from it, send an email to metaphl...@googlegroups.com.

Reply all

Reply to author

Forward