Didn't find any reference sequences in given alignment file.

143 views
Skip to first unread message

lpi...@berkeley.edu

unread,
Jan 12, 2018, 6:50:24 PM1/12/18
to pplacer users
Hello,

I have successfully used pplacer previously on other datasets and I am having a problem with this dataset. I have an MSA of 37,119 reference sequences that I used to create the reference package. I used FastTreeDbl to create the tree statistics file and the tree. 12S_taxids.txt is a list of all of the taxid #s from the MSA and 12S_seq_info.csv has two columns... the first seqname and the second tax_id. seqname are the same identifiers as the ones in the MSA file. I had to use the flag --no-reroot because I would get a stack overflow error without it. Also I had to convert the combo.sto to combo.fa because I received a Tokenizer error using .sto. This is what I have done so far:

FastTreeDbl -nt -gtr -log 12Stree.log < 12S_MSA.fa > 12Stree.tre

taxit taxtable ncbi_taxonomy.db -f 12S_taxids.txt -i 12S_seq_info.csv -o 12S_taxonomyfromtaxids.csv

taxit create -l 12S -P 12S.refpkg --taxonomy 12S_taxonomyfromtaxids.csv --aln-fasta 12S_MSA.fa --seq-info 12S_seq_info.csv --tree-stats 12Stree.log --tree-file 12Stree.tre --aln-sto 12S_MSA.sto --no-reroot

seqmagick convert --alphabet dna --input-format fasta --output-format stockholm 12S_MSA.fa 12S_MSA.sto

hmmbuild refseqs.hmm 12S_MSA.sto

hmmalign -o combo.sto --mapali 12S_MSA.sto refseqs.hmm 12S_reads.fasta

seqmagick convert --alphabet dna --input-format stockholm --output-format fasta combo.sto combo.fa

pplacer --mrca-class -c 12S.refpkg/ combo.fa

Running pplacer v1.1.alpha19-0-g807f6f3 analysis on combo.fa...
Didn't find any reference sequences in given alignment file. Using supplied reference alignment.
WARNING: your tree has zero pendant branch lengths. This can lead to zero likelihood values which will keep you from being able to place sequences. You can remove identical sequences with seqmagick.
query 0000000|GQ279545.1 is not the same length as the reference alignment (got 1098; expected 874)


My combo.fa and combo.sto files are very large (3.5GB and 8GB, respectively) so I am not attaching them.

Any advice is appreciated. Many thanks in advance!

-Lenore

Erick Matsen

unread,
Jan 13, 2018, 1:09:14 PM1/13/18
to pplace...@googlegroups.com
My guess is that something happened to the reference sequence names through your process. Make triple sure that the names in your tree and the reference names that appear in combo.fa are identical.

You can use nw_labels from the newick utilities to dump out the taxon names, or one of many other packages.

--
You received this message because you are subscribed to the Google Groups "pplacer users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pplacer-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
Frederick "Erick" Matsen, Associate Member
Fred Hutchinson Cancer Research Center
http://matsen.fredhutch.org/

lpi...@berkeley.edu

unread,
Jan 16, 2018, 4:54:08 PM1/16/18
to pplacer users
It seems that FastTreeDbl might not be outputting the correct labels when I checked with nw_labels. I switched over to RAxML to construct the tree which produced the correct nw_labels but now I am still getting the same error:


Running pplacer v1.1.alpha19-0-g807f6f3 analysis on combo.fa...
Didn't find any reference sequences in given alignment file. Using supplied reference alignment.
query 0000000|KY815244.1 is not the same length as the reference alignment (got 920; expected 314)
Reply all
Reply to author
Forward
0 new messages