Pplacer issues with zero likelihood and differing query lengths

5 views
Skip to first unread message

Rowan Softley

unread,
Dec 20, 2023, 2:53:38 PM12/20/23
to pplacer users
Hi,

I'm trying to use Pplacer to phylogenetically place V1-V2 16s rRNA fragments into a tree of full length (or nearly full length) 16s rRNA genes. The full length reference genes have been identified in genomic bins following WGS and the V1-V2 query sequences represent ASVs from a QIIME2/DADA2 analysis. 

I am using aligned and masked sequences from qiime2 (MAFFT) and a tree generated with the aligned reference sequences using FastTree. I then created a ref package using taxit with the aligned sequences, tree and tree stats:

fasttree -gamma -nt -gtr -log reftree.txt ./aligned_pacbio_illum_inoc.fasta > reftree.nwk

taxit create -l 16s_rRNA -P inoc.refpkg \
    --aln-fasta aligned_pacbio_illum_inoc.fasta \
    --tree-stats reftree.txt \
    --tree-file reftree.nwk

When I try to place the query sequences into the tree, I get two erroneous output depending on whether I include the reference sequences in the query alignment:

$ pplacer -c inoc.refpkg ./query_ref_aligned_seq/query_ref_aligned.fasta
Running pplacer v1.1.alpha19-0-g807f6f3 analysis on ./query_ref_aligned_seq/query_ref_aligned.fasta...
Found reference sequences in given alignment file. Using those for reference alignment.
WARNING: your tree has zero pendant branch lengths. This can lead to zero likelihood values which will keep you from being able to place sequences. You can remove identical sequences with seqmagick.
Pre-masking sequences... sequence length cut from 1682 to 394.
Determining figs... figs disabled.
Allocating memory for internal nodes... done.
Optimizing site categories... 1 Uncaught exception: Failure("Site 199 has zero likelihood.")
Fatal error: exception Failure("Site 199 has zero likelihood.") 

Or with only aligned query sequences:

$ pplacer -c inoc.refpkg ./query_only/query_aligned.fasta
Running pplacer v1.1.alpha19-0-g807f6f3 analysis on ./query_only/query_aligned.fasta...
Didn't find any reference sequences in given alignment file. Using supplied reference alignment.
WARNING: your tree has zero pendant branch lengths. This can lead to zero likelihood values which will keep you from being able to place sequences. You can remove identical sequences with seqmagick.
query 0d247c8f8647fe246d5199122431758b is not the same length as the reference alignment (got 376; expected 1614)


I have removed any duplicate sequences and still encounter the same issues - any suggestions?

Thanks,
R

 
Reply all
Reply to author
Forward
0 new messages