BUG: Unable to process alignment by training of PMSF model

187 views
Skip to first unread message

Alexey Neverov

unread,
Jun 11, 2020, 4:21:24 AM6/11/20
to IQ-TREE
Hi to all!
I tried to trainthe PMSF model for my alignments and obtained the same error messages for all files:

iqtree -nt AUTO -s cox2.G10.fas -m LG+C20+F+G -ft MEGAtree_mtREV_F.newick -n 0

ERROR: phylotreepars.cpp:63: void PhyloTree::computePartialParsimonyFast(PhyloNeighbor *, PhyloNode *): Assertion `!aln->ordered_pattern.empty()' failed.
ERROR: STACK TRACE FOR DEBUGGING:
ERROR:
ERROR: *** IQ-TREE CRASHES WITH SIGNAL ABORTED
ERROR: *** For bug report please send to developers:
ERROR: ***    Log file: atp6.G10.fas.log

The alignment and the log files are attached.
I would be very appreciated for assistance

Yours sinserely, Alexey Neverov
PMSF_error.zip

Alexey Neverov

unread,
Jun 11, 2020, 7:52:00 AM6/11/20
to IQ-TREE
Sorry
I fogot to attach a guide tree file

четверг, 11 июня 2020 г., 11:21:24 UTC+3 пользователь Alexey Neverov написал:
MEGAtree_mtREV_F.newick

Minh Bui

unread,
Jun 12, 2020, 5:58:15 PM6/12/20
to IQ-TREE, Alexey Neverov
Hi Alexey,

Which IQ-TREE version did you use? 

It looks like some old bug that was fixed some time ago. So please try the latest version, e.g. the stable 1.6.12 or the latest v2. If it still crashes let me know.

Minh

-- 
You received this message because you are subscribed to the Google Groups "IQ-TREE" group.
To unsubscribe from this group and stop receiving emails from it, send an email to iqtree+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/iqtree/2b7ec7d1-1685-4e66-a8b6-8f2bede8ad4eo%40googlegroups.com.
<MEGAtree_mtREV_F.newick>

Minh Bui

unread,
Jun 12, 2020, 6:18:36 PM6/12/20
to IQ-TREE, Alexey Neverov
I just saw that you used v2.0.3. So that’s OK.

However, something concerns me is that your alignment is very short:

Alignment has 4350 sequences with 225 columns, 225 distinct patterns
224 parsimony-informative, 1 singleton sites, 0 constant sites

In that case, you shouldn’t use any complex models. I’d be very cautious even if you use a simple model — there is very little phylogenetic information in this alignment. So either add [many] more genes or reduce the number of sequences, before phylogenetic inference.

Sorry about that,
Minh

Alexey Neverov

unread,
Jun 28, 2020, 2:43:42 PM6/28/20
to IQ-TREE
The problem is somewhare in the '-nt AUTO' option. I sucsessfully processed this dataset when used fixed number of cores by specifying the '-nt 16' option. In the attachment the results of analysis.
You mentioned that the number of sites in the alignment might be not sufficient to reliebly estimate model parameters. My alignment contains 4350 sequences. How I can estimate the reliability of obtained results?
The log file contains the warning:
'WARNING: The mixture model might be overfitting because some mixture weights are estimated close to zero'. Totally 6 out of 20 weights are zero. Is it really a problem? Can I deal with it by reducing the number of classes in the model?

Thank you for your reply.
Yours Alexey.


суббота, 13 июня 2020 г., 1:18:36 UTC+3 пользователь Minh Bui написал:
I just saw that you used v2.0.3. So that’s OK.

However, something concerns me is that your alignment is very short:

Alignment has 4350 sequences with 225 columns, 225 distinct patterns
224 parsimony-informative, 1 singleton sites, 0 constant sites

In that case, you shouldn’t use any complex models. I’d be very cautious even if you use a simple model — there is very little phylogenetic information in this alignment. So either add [many] more genes or reduce the number of sequences, before phylogenetic inference.

Sorry about that,
Minh
On 13 Jun 2020, at 7:58 am, Minh Bui <min...@univie.ac.at> wrote:

Hi Alexey,

Which IQ-TREE version did you use? 

It looks like some old bug that was fixed some time ago. So please try the latest version, e.g. the stable 1.6.12 or the latest v2. If it still crashes let me know.

Minh

On 11 Jun 2020, at 9:52 pm, Alexey Neverov <nev...@gmail.com> wrote:

Sorry
I fogot to attach a guide tree file

четверг, 11 июня 2020 г., 11:21:24 UTC+3 пользователь Alexey Neverov написал:
Hi to all!
I tried to trainthe PMSF model for my alignments and obtained the same error messages for all files:

iqtree -nt AUTO -s cox2.G10.fas -m LG+C20+F+G -ft MEGAtree_mtREV_F.newick -n 0

ERROR: phylotreepars.cpp:63: void PhyloTree::computePartialParsimonyFast(PhyloNeighbor *, PhyloNode *): Assertion `!aln->ordered_pattern.empty()' failed.
ERROR: STACK TRACE FOR DEBUGGING:
ERROR:
ERROR: *** IQ-TREE CRASHES WITH SIGNAL ABORTED
ERROR: *** For bug report please send to developers:
ERROR: ***    Log file: atp6.G10.fas.log

The alignment and the log files are attached.
I would be very appreciated for assistance

Yours sinserely, Alexey Neverov

-- 
You received this message because you are subscribed to the Google Groups "IQ-TREE" group.
To unsubscribe from this group and stop receiving emails from it, send an email to iqt...@googlegroups.com.
atp6.results.zip

Rob Lanfear

unread,
Jun 30, 2020, 7:56:46 PM6/30/20
to IQ-TREE
Hi Alexey,

You mentioned that the number of sites in the alignment might be not sufficient to reliebly estimate model parameters. My alignment contains 4350 sequences. How I can estimate the reliability of obtained results?

You could simulate data that resemble the data you have (i.e. under the PMSF-type models, and alignments of the same length, number of rows, site rates, and information content that you have), and then test how well you can recover trees when using PMSF models (as you have for your empirical data) when compared to other, simpler models. This is a standard procedure called a parametric bootstrap (just to help you find things via google!).
 
The log file contains the warning:
'WARNING: The mixture model might be overfitting because some mixture weights are estimated close to zero'. Totally 6 out of 20 weights are zero. Is it really a problem?

Yes.
 
Can I deal with it by reducing the number of classes in the model?

Potentially. But as Minh said, 225 informative sites is very very little information. At the very least I would recommend comparing a large range of models (focussing particularly on the most simple models) using standard approaches like the AICc, BIC, etc. This will give you a good feeling for whether you can realistically justify parameter-rich models. 
 
Rob

To unsubscribe from this group and stop receiving emails from it, send an email to iqtree+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/iqtree/1dc1c6e6-2abf-4467-85d9-c4ff919ebb76o%40googlegroups.com.


--
Rob Lanfear
Division of Ecology and Evolution,
Research School of Biology,
The Australian National University,
Canberra

Heiko Schmidt

unread,
Jul 1, 2020, 5:30:02 AM7/1/20
to IQ-TREE Forum
Dear Alexey,

Two additional comments in addition to those of Minh and Rob:

> On 28 Jun 2020, at 20:43, Alexey Neverov <neva...@gmail.com> wrote:
>
> The problem is somewhare in the '-nt AUTO' option. I sucsessfully processed this dataset when used fixed number of cores by specifying the '-nt 16' option. In the attachment the results of analysis.

With only 225 sites in the alignment, parallelisation will not efficiently, meaning time savingly, work either.

Typically, batches of sites are distributed to different processors. This causes some extra time to be required, e.g., for distributing the data and collecting the results. Hence, the batches have to be sufficiently large, to actually save more time than is caused by this overhead.

We have seen people enforcing high numbers of threads (cores) and then ended up with longer running times than when using only 1 core :(

> You mentioned that the number of sites in the alignment might be not sufficient to reliebly estimate model parameters. My alignment contains 4350 sequences. How I can estimate the reliability of obtained results?
> The log file contains the warning:

Just one comment to think about. Assume all your sites would be informative to resolve parts of the tree. Furthermore, assume that each site can resolve one split (=branch) in the tree (something you typically cannot achieve with real data). Then, your dataset would be able to only able to resolve 225 out of the 4347 branches in your final tree. All other branches would be more or less randomly resolved and might differ from run to run, because there is no information in the data.

The only way to resolve a higher percentage of branches reliably is either to add more sequence data to each taxon, e.g. more genes, or to reduce the number of sequences (the first thing here is typically to remove identical sequences - this is also done already by IQ-TREE automatically and then re-added at the end, because here it is clear in what region of the tree they would end up, however, there is no way to determine their branching order in the tree.

And as stressed by the others - with so short sequences you should not used complex models. You do not have enough information in the data for this.

Best regards,
Heiko
> To unsubscribe from this group and stop receiving emails from it, send an email to iqtree+un...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/iqtree/1dc1c6e6-2abf-4467-85d9-c4ff919ebb76o%40googlegroups.com.
> <atp6.results.zip>

-----------------------------------------------------------------------------
Heiko Schmidt
Center for Integrative Bioinformatics Vienna (CIBIV)
University of Vienna / Max Perutz Labs
http://www.cibiv.at/
-----------------------------------------------------------------------------

Reply all
Reply to author
Forward
0 new messages