performance decrease for long alignments, even with same site patterns

24 views
Skip to first unread message

Leonardo de Oliveira Martins

unread,
Oct 23, 2019, 6:13:33 AM10/23/19
to IQ-TREE

Hello!

I've ran a small simulation to confirm that, under a simple model, the execution time should be approximately constant over the number of SITE PATTERNS, irrespective of the sequence length. This is true up to a certain length, but for longer alignments iq-tree execution time increases linearly with the alignment length (more details below). The only explanation I can find is due to the log-likelihood being very small, it needs to be rescaled more often. However from what I understand this is done at the internal nodes, when the number of taxa is very large (the pll library rescales by default). From what I understand the weights (pattern frequencies) are the last step of the log-likelihood calculation. What is going on? What am I forgetting?

sincerely,
Leo

More details on the simulation:

the alignment snp1.aln starts as a 32bp alignment over 31 taxa, and at every step of the simulation it doubles in size by concatenation (goalign concat). I confirm that the number of "distinct patterns" is 26 at every iteration.
for i in `seq 1 16`; do  mv snp1.aln o.aln; goalign concat -i o.aln o.aln > snp1.aln; echo -n $i "  " >> timing2.txt; iqtree -m HKY+G -s snp1.aln -redo | grep wall >> timing2.txt; done

These are the timings I observe:
1   Total wall-clock time used: 0.996 sec (0h:0m:0s)
2   Total wall-clock time used: 0.959 sec (0h:0m:0s)
3   Total wall-clock time used: 0.869 sec (0h:0m:0s)
4   Total wall-clock time used: 0.724 sec (0h:0m:0s)
5   Total wall-clock time used: 1.015 sec (0h:0m:1s)
6   Total wall-clock time used: 0.689 sec (0h:0m:0s)
7   Total wall-clock time used: 0.832 sec (0h:0m:0s)
8   Total wall-clock time used: 0.774 sec (0h:0m:0s)
9   Total wall-clock time used: 0.986 sec (0h:0m:0s)
10   Total wall-clock time used: 1.631 sec (0h:0m:1s)
11   Total wall-clock time used: 2.536 sec (0h:0m:2s)
12   Total wall-clock time used: 4.208 sec (0h:0m:4s)
13   Total wall-clock time used: 7.764 sec (0h:0m:7s)
14   Total wall-clock time used: 15.750 sec (0h:0m:15s)
15   Total wall-clock time used: 32.115 sec (0h:0m:32s)

And this is figure showing the timings (up to ~10kbp it behaves as expected, and then the execution time explodes). The same behaviour happens without the Gamma model.

iqtree_timing.png


Leonardo de Oliveira Martins

unread,
Oct 23, 2019, 9:20:07 AM10/23/19
to iqt...@googlegroups.com
To (partially) answer my own question: PLL's parsimony calculation (both initial tree and generation of 98 parsimony trees) was responsible for the slowing down.  By avoiding it ("-t PARS -ninit 2" for example) then the timing was constant, as expected. Why PLL is not using the site frequency info is another story...

cheers
Leo

--
You received this message because you are subscribed to the Google Groups "IQ-TREE" group.
To unsubscribe from this group and stop receiving emails from it, send an email to iqtree+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/iqtree/b9e99ace-f446-43c7-b1f0-0a28e2963a1a%40googlegroups.com.


--

Minh Bui

unread,
Oct 23, 2019, 6:16:20 PM10/23/19
to IQ-TREE, Alexandros Stamatakis
Hi Leo, CC Alexis,

Thanks for looking into this. Alexis can better comment on this, but I vaguely remember that PLL indeed does not use site frequency for a reason. In your extreme case, the parsimony might dominate the run time, leading to this behavour.

Cheers,
Minh

<iqtree_timing.png>



-- 
You received this message because you are subscribed to the Google Groups "IQ-TREE" group.
To unsubscribe from this group and stop receiving emails from it, send an email to iqtree+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/iqtree/b9e99ace-f446-43c7-b1f0-0a28e2963a1a%40googlegroups.com.

-- 
You received this message because you are subscribed to the Google Groups "IQ-TREE" group.
To unsubscribe from this group and stop receiving emails from it, send an email to iqtree+un...@googlegroups.com.

IQ-TREE

unread,
Oct 24, 2019, 1:31:42 AM10/24/19
to IQ-TREE
As far as I remember (I implemented this a long time ago) the site frequencies are not used because they complicate the vectorization of the parsimony code,

Alexis
Reply all
Reply to author
Forward
0 new messages