Error related to input tree labels ?

473 views
Skip to first unread message

Benjamin lin

unread,
Oct 13, 2017, 8:24:54 AM10/13/17
to raxml
Dear Alexis,

I encounter a RAxML bug probably related to long tree labels.
I use a tree with ~12000 leaves and the corresponding 1.5kb multiple alignment.

If i use the modes -f v (EPA) or -f e (optimization of an input tree), i get the following output and core dump error (last line):

IMPORTANT WARNING
Found 34 sequences that are exactly identical to other sequences in the alignment.
Normally they should be excluded from the analysis.
An alignment file with sequence duplicates removed has already
been printed to file
../mmod_LTPs128_SSU_aligned_99.5_trim.hmm_1000.aln.fasta.reduced
Using BFGS method to optimize GTR rate parameters, to disable this specify "--no-bfgs"
Found a total of 12953 taxa in tree file ../mmmod_LTPs128_SSU_tree.newick
This is RAxML version 8.2.9 released by Alexandros Stamatakis on July 20 2016.
With greatly appreciated code contributions by:
Andre Aberer      (HITS)
Simon Berger      (HITS)
Alexey Kozlov     (HITS)
Kassian Kobert    (HITS)
David Dao         (KIT and HITS)
Sarah Lutteropp   (KIT and HITS)
Nick Pattengale   (Sandia)
Wayne Pfeiffer    (SDSC)
Akifumi S. Tanabe (NRIFS)
Charlie Taylor    (UF)
Alignment has 1979 distinct alignment patterns
Proportion of gaps and completely undetermined characters in this alignment: 32.50%
RAxML likelihood-based placement algorithm
Using 1 distinct models/data partitions with joint branch length optimization
All free model parameters will be estimated by RAxML
ML estimate of
25 per site rate categories


Partition: 0
Alignment Patterns: 1979
Name: No Name Provided
DataType: DNA
Substitution Matrix: GTR


RAxML was called as follows:


raxmlHPC
-SSE3 -f v -G 0.015625 -m GTRCAT -n test_1000 -s ../1000.aln.fasta -t ../tree.newick


raxmlHPC
-SSE3: treeIO.c:836: treeFindTipByLabelString: Assertion `! tr->nodep[lookup]->back' failed.
Aborted (core dumped)


This seems to appear after the parsing of the alignment (where a few identical sequences are detected), and the tree.
This is true for RAxML 8.2.9 to 8.2.11 (I didn't try earlier versions).
The bug don't appear in the rapid bootstrapping + ML inference  mode (-f a).
So, the error might be related to the -t option (used in both -f v and e modes).

I tested the integrity of the tree and the alignment with other software, and everything looks fine.
Tree labels look as follows:
- Erwinia_oleae__GU810925__Enterobacteriaceae
- Erwinia_gerundensis__FJ611848__Enterobacteriaceae
- Pantoea_intestinalis__KP326384__Enterobacteriaceae
- Pantoea_theicola__AB907776__Enterobacteriaceae
- Yersinia_enterocolitica_subsp_enterocolitica__AF366378__Enterobacteriaceae
- Yersinia_enterocolitica_subsp_palearctica__FR729477__Enterobacteriaceae

I can send you the files if you want to reproduce the error.
In the meantime i will explore the treeFindTipByLabelsString method to try to understand the issue.

Thanks,

Alexandros Stamatakis

unread,
Oct 14, 2017, 8:24:17 AM10/14/17
to ra...@googlegroups.com
yes, please send me the files to my personal email,

alexis

On 13.10.2017 14:24, Benjamin lin wrote:
> Dear Alexis,
>
> I encounter a RAxML bug probably related to long tree labels.
> I use a tree with ~12000 leaves and the corresponding 1.5kb multiple
> alignment.
>
> If i use the modes -f v (EPA) or -f e (optimization of an input tree), i
> get the following output and core dump error (last line):
>
> |
> IMPORTANT WARNING
> Found34sequences that are exactly identical to other sequences inthe
> alignment.
> Normallythey should be excluded fromthe analysis.
> Analignment file withsequence duplicates removed has already
> been printed to file
> ../mmod_LTPs128_SSU_aligned_99.5_trim.hmm_1000.aln.fasta.reduced
> UsingBFGS method to optimize GTR rate parameters,to disable thisspecify
> "--no-bfgs"
> Founda total of 12953taxa intree file ../mmmod_LTPs128_SSU_tree.newick
> ThisisRAxMLversion 8.2.9released byAlexandrosStamatakison July202016.
> Withgreatly appreciated code contributions by:
> AndreAberer(HITS)
> SimonBerger(HITS)
> AlexeyKozlov(HITS)
> KassianKobert(HITS)
> DavidDao(KIT andHITS)
> SarahLutteropp(KIT andHITS)
> NickPattengale(Sandia)
> WaynePfeiffer(SDSC)
> AkifumiS.Tanabe(NRIFS)
> CharlieTaylor(UF)
> Alignmenthas 1979distinct alignment patterns
> Proportionof gaps andcompletely undetermined characters
> inthisalignment:32.50%
> RAxMLlikelihood-based placement algorithm
> Using1distinct models/data partitions withjoint branch length optimization
> Allfree model parameters will be estimated byRAxML
> ML estimate of 25per site rate categories
>
>
> Partition:0
> AlignmentPatterns:1979
> Name:NoNameProvided
> DataType:DNA
> SubstitutionMatrix:GTR
>
>
> RAxMLwas called asfollows:
>
>
> raxmlHPC-SSE3 -f v -G 0.015625-m GTRCAT -n test_1000 -s
> ../1000.aln.fasta -t ../tree.newick
>
>
> raxmlHPC-SSE3:treeIO.c:836:treeFindTipByLabelString:Assertion`!
> tr->nodep[lookup]->back' failed.
> Aborted (core dumped)
> |
>
>
> This seems to appear after the parsing of the alignment (where a few
> identical sequences are detected), and the tree.
> This is true for RAxML 8.2.9 to 8.2.11 (I didn't try earlier versions).
> The bug don't appear in the rapid bootstrapping + ML inference  mode (-f a).
> So, the error might be related to the -t option (used in both -f v and e
> modes).
>
> I tested the integrity of the tree and the alignment with other
> software, and everything looks fine.
> Tree labels look as follows:
> - Erwinia_oleae__GU810925__Enterobacteriaceae
> - Erwinia_gerundensis__FJ611848__Enterobacteriaceae
> - Pantoea_intestinalis__KP326384__Enterobacteriaceae
> - Pantoea_theicola__AB907776__Enterobacteriaceae
> - Yersinia_enterocolitica_subsp_enterocolitica__AF366378__Enterobacteriaceae
> - Yersinia_enterocolitica_subsp_palearctica__FR729477__Enterobacteriaceae
>
> I can send you the files if you want to reproduce the error.
> In the meantime i will explore the treeFindTipByLabelsString method to
> try to understand the issue.
>
> Thanks,
>
> --
> You received this message because you are subscribed to the Google
> Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to raxml+un...@googlegroups.com
> <mailto:raxml+un...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

--
Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies
Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology

www.exelixis-lab.org

Benjamin lin

unread,
Nov 14, 2017, 11:07:24 AM11/14/17
to raxml
Hi,

I think I found the reason of this core dump.
Using the phylogenetic placement (-f v) or tree optimisation (-f e) modes, it seems the reduction of the input alignment is done while it shouldn't.
This tree contains indeed around 20 identical sequences and RAxML appears to use the .reduced version of the alignment.
Some leaves are consequently missing, concluding to the error:

> raxmlHPC-SSE3:treeIO.c:836:treeFindTipByLabelString:Assertion`! 
> tr->nodep[lookup]->back' failed. 
> Aborted (core dumped) 

After using --no-­seq-­check, this issue disappears.
Maybe --no-­seq-­check should be automatically activated in these modes.

I hope this is helpful,

benjamin

Benjamin lin

unread,
Nov 14, 2017, 11:14:40 AM11/14/17
to raxml
Ok, unfortunately I was too fast when writing this.
It seems the optimisation is starting, but I obtain a new error after some time:


standard
-RAxML-8.2.11/raxmlHPC-PTHREADS-SSE3 -f e -m GTRGAMMA -s mod_LTPs128_SSU_aligned_99.5_trim.phylip -t mmmod_LTPs128_SSU_tree.ne
wick
-n optimization_LTPs128_SSU --no-seq-check

Likelihood problem in model optimization l1: -inf l2: -2309304.9874063567258417606353759765625000000000 tolerance: 0.00000230930498740
63564833547284455006476

What may rise this infinite value ?

Best regards,

benjamin

Alexandros Stamatakis

unread,
Nov 15, 2017, 4:49:55 AM11/15/17
to ra...@googlegroups.com
Dear Benjamin,

Thanks for the update. The problem you are observing is due to numerical
issues with the Gamma model of rate heterogeneity that are described in
here:

https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-12-470

This known issue has been solved in RAxML-NG, the only problem is that
RAxML-NG does not interface yet with EPA-NG, the better, faster, more
scalable, re-design of EPA.

So there are basically three solutions:

1. Wait for the integration of RAxML-NG with EPA-NG (might become
available until the end of the year.
2. Reduce the number of taxa in your reference tree
3. Try using the GTRCAT model which should be less susceptible to those
numerical issues

Hope this helps,

Alexis

On 14.11.2017 17:14, Benjamin lin wrote:
> Ok, unfortunately I was too fast when writing this.
> It seems the optimisation is starting, but I obtain a new error after
> some time:
>
> |
>
> standard-RAxML-8.2.11/raxmlHPC-PTHREADS-SSE3 -f e -m GTRGAMMA -s
> mod_LTPs128_SSU_aligned_99.5_trim.phylip -t mmmod_LTPs128_SSU_tree.ne
> wick -n optimization_LTPs128_SSU --no-seq-check
>
> Likelihoodproblem inmodel optimization l1:-inf
> l2:-2309304.9874063567258417606353759765625000000000tolerance:0.00000230930498740
> <https://groups.google.com/d/optout>.
>
> --
> Alexandros (Alexis) Stamatakis
>
> Research Group Leader, Heidelberg Institute for Theoretical Studies
> Full Professor, Dept. of Informatics, Karlsruhe Institute of
> Technology
>
> www.exelixis-lab.org <http://www.exelixis-lab.org>
Reply all
Reply to author
Forward
0 new messages