Missing taxa error in the "per site log Likelihoods" computation (-f G)

27 views
Skip to first unread message

Yan Wang

unread,
Jun 5, 2017, 6:15:58 PM6/5/17
to raxml
Dear RAxML community,

I encountered one problem when using the "per site log Likelihoods" computation (-f G) in RAxML v8.2.8. I am testing two topologies on genome-wide markers that allow gaps. When I compute individual markers, the RAxML keeps reporting an error and refuse to run "ERROR: Sequence ABC consists entirely of undetermined values which will be treated as missing data...ERROR: Found X sequences that consist entirely of undetermined values, exiting..." when it detects the taxa missing this entire marker.

I made it work by adding a universal starting site and the result seems fine. May I ask is there any way to avoid such error and allow missing taxa in the (-f G) option like other options do? Any advice is appreciated.

Best regards,
Yan Wang, PhD 
University of California, Riverside 
1207D Genomics Building 
Riverside, CA 92521

Alexandros Stamatakis

unread,
Jun 7, 2017, 11:25:08 PM6/7/17
to ra...@googlegroups.com
Dear Yan,

> I encountered one problem when using the "per site log Likelihoods"
> computation (-f G) in RAxML v8.2.8. I am testing two topologies on
> genome-wide markers that allow gaps. When I compute individual markers,
> the RAxML keeps reporting an error and refuse to run "ERROR: Sequence
> ABC consists entirely of undetermined values which will be treated as
> missing data...ERROR: Found X sequences that consist entirely of
> undetermined values, exiting..." when it detects the taxa missing this
> entire marker.
>
> I made it work by adding a universal starting site and the result seems
> fine. May I ask is there any way to avoid such error and allow missing
> taxa in the (-f G) option like other options do?

Doesn't: --no-seq-check work?

However, building trees on MSAs containing completely undetermined
sequences is something I would absolutely not recommend, since the
phylogenetic position of those sequences will be completely random and
will distort your signal.

Alexis

> Any advice is appreciated.
>
> Best regards,
> Yan Wang, PhD
> University of California, Riverside
> 1207D Genomics Building
> Riverside, CA 92521
>
> --
> You received this message because you are subscribed to the Google
> Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to raxml+un...@googlegroups.com
> <mailto:raxml+un...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

--
Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies
Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology
Adjunct Professor, Dept. of Ecology and Evolutionary Biology, University
of Arizona at Tucson

www.exelixis-lab.org

Yan Wang

unread,
Jun 12, 2017, 5:20:43 PM6/12/17
to ra...@googlegroups.com
Dear Alexis,

Very grateful for your suggestions! The option "--no-seq-check" worked very well in this case, though it has not been listed in the current manual of v8.2.X.

I agree it should be avoided to use completely undetermined sequences to build the trees. In this example, I tried to compute the "per site log likelihood" given a few fixed topologies. In the meantime, I also did a few comparison tests with simulated data that fills the missing taxa. The total likelihood produces similar ranks of the given topologies. It looks like the contribution of these completely random gaps (signals) is "equally" limited to the fixed topology ranking. I will perform multi-level tests to confirm this though.

Best regards,
Yan

On Wed, Jun 7, 2017 at 8:25 PM, Alexandros Stamatakis <alexandros...@gmail.com> wrote:
Dear Yan,

I encountered one problem when using the "per site log Likelihoods" computation (-f G) in RAxML v8.2.8. I am testing two topologies on genome-wide markers that allow gaps. When I compute individual markers, the RAxML keeps reporting an error and refuse to run "ERROR: Sequence ABC consists entirely of undetermined values which will be treated as missing data...ERROR: Found X sequences that consist entirely of undetermined values, exiting..." when it detects the taxa missing this entire marker.

I made it work by adding a universal starting site and the result seems fine. May I ask is there any way to avoid such error and allow missing taxa in the (-f G) option like other options do?

Doesn't: --no-seq-check work?

However, building trees on MSAs containing completely undetermined sequences is something I would absolutely not recommend, since the phylogenetic position of those sequences will be completely random and will distort your signal.

Alexis

Any advice is appreciated.

Best regards,
Yan Wang, PhD
University of California, Riverside
1207D Genomics Building
Riverside, CA 92521

--
You received this message because you are subscribed to the Google Groups "raxml" group.
To unsubscribe from this group and stop receiving emails from it, send an email to raxml+unsubscribe@googlegroups.com <mailto:raxml+unsubscribe@googlegroups.com>.

For more options, visit https://groups.google.com/d/optout.

--
Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies
Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology
Adjunct Professor, Dept. of Ecology and Evolutionary Biology, University
of Arizona at Tucson

www.exelixis-lab.org


--
You received this message because you are subscribed to a topic in the Google Groups "raxml" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/raxml/ZaDD3ZwTfbw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to raxml+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages