How exactly should one create ­­an ascertainment bias corrected ML using asc­corr=stamatakis option?kis

732 views
Skip to first unread message

Jay T

unread,
Mar 31, 2021, 7:44:29 PM3/31/21
to raxml
From the manual it is fairly clear that you need a partition file. Since I am interested in the stamatakis correction I need a file with the invariant sites listed for A C G T. 

ex: p1.txt
250 300 400 100 (Raxml 8.2.X manual)

But its not really clear how one goes about creating this partition file. My typical workflow is to generate a whole genome SNP alignment then mask the areas of recombination with Gubbins. I will take the recombination masked alignment and run it through RaxML. 

How do you count these invariant sites? I've fed my Gubbins output .fasta to FABOX so I could try to extract the invariant sites and then I was going to use grep to count the A C G Ts. But FABOX just returns a white blank page when I upload my fasta file.

Is there a guide or tutorial that covers how to generate this partition file - perhaps some handy scripts? I've looked everywhere including this group and haven't had much luck. Be safe and thanks!

Alexandros Stamatakis

unread,
Apr 1, 2021, 5:36:26 AM4/1/21
to ra...@googlegroups.com
Well the easiest way really is to not do the filtering step at all but
just use the entire MSA as is and without ASC bias correction.

What will happen in this case is that RAxML will automatically compress
all sites containing only As, Cs, Gs, or Ts each into a single site and
compute as well as assign the weights/counts you would put into the file
you indicate below automatically to the respective single site
containing only As, Cs, Gs, or Ts.

So if you have the full alignment anyway there's no need to use
ascertainment bias correction.

Please also consider switching to RAxML-NG
https://github.com/amkozlov/raxml-ng as RAxML8 is not supported any more.

RAxML-NG will also handle this in the way I describe above.

Alexis
> --
> You received this message because you are subscribed to the Google
> Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to raxml+un...@googlegroups.com
> <mailto:raxml+un...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/raxml/6bbbf50d-1e1b-49bd-8a4b-f4f4f40c4e3dn%40googlegroups.com
> <https://groups.google.com/d/msgid/raxml/6bbbf50d-1e1b-49bd-8a4b-f4f4f40c4e3dn%40googlegroups.com?utm_medium=email&utm_source=footer>.

--
Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies
Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology

www.exelixis-lab.org

Jay T

unread,
Apr 1, 2021, 7:10:46 AM4/1/21
to raxml
Thanks for the prompt response Dr. Stamatakis -

Just to clarify - In our previous report we first generated a whole genome SNP alightment against a reference using Snippy.  Then we filtered the polymorphic areas of recombination with Gubbins 2.3.1.  The recombination filtered alignment was then used in RaxML 8.2.9 with Stamatakis ascertainment bias to model the evolution of our bacterial genomes and accurately estimate the substitution rate, hence provide more accurate branch lengths. 

Are you saying that the correction is unnecessary? Forgive me for asking but in what scenario is a correction even needed? Is it only when you generate the a non-recombination filtered MSA?  Or when using a RaxML generated alignment for time-scaled phylogenetic analyses such as BEAST?

Thanksforyourresponse.

Alexandros Stamatakis

unread,
Apr 1, 2021, 11:27:10 PM4/1/21
to ra...@googlegroups.com
Dear Jay,

> Thanks for the prompt response Dr. Stamatakis -

Please call me Alexis.

> Just to clarify - In our previous report we first generated a whole
> genome SNP alightment against a reference using Snippy.  Then we
> filtered the polymorphic areas of recombination with Gubbins 2.3.1.  The
> recombination filtered alignment was then used in RaxML 8.2.9 with
> Stamatakis ascertainment bias to model the evolution of our bacterial
> genomes and accurately estimate the substitution rate, hence provide
> more accurate branch lengths.
>
> Are you saying that the correction is unnecessary?

If you do have the full alignment (i.e., not only the SNPs) including
the invariable sites the correction is indeed unnecessary.

> Forgive me for asking
> but in what scenario is a correction even needed?

When you don't have the full MSA but only the SNPs.


> Is it only when you
> generate the a non-recombination filtered MSA?  Or when using a RaxML
> generated alignment for time-scaled phylogenetic analyses such as BEAST?

RAxML doesn't generate any alignments. Again, if you have the full
genome including the SNPs AND all the invariable sites the best way to
analyze this is to just use the full dataset without any correction.

Alexis
> <https://groups.google.com/d/msgid/raxml/6bbbf50d-1e1b-49bd-8a4b-f4f4f40c4e3dn%40googlegroups.com?utm_medium=email&utm_source=footer
> <https://groups.google.com/d/msgid/raxml/6bbbf50d-1e1b-49bd-8a4b-f4f4f40c4e3dn%40googlegroups.com?utm_medium=email&utm_source=footer>>.
>
>
> --
> Alexandros (Alexis) Stamatakis
>
> Research Group Leader, Heidelberg Institute for Theoretical Studies
> Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology
>
> www.exelixis-lab.org <http://www.exelixis-lab.org>
>
> --
> You received this message because you are subscribed to the Google
> Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to raxml+un...@googlegroups.com
> <mailto:raxml+un...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/raxml/5e239c8f-c05f-41cb-b1a0-1899e7df9861n%40googlegroups.com
> <https://groups.google.com/d/msgid/raxml/5e239c8f-c05f-41cb-b1a0-1899e7df9861n%40googlegroups.com?utm_medium=email&utm_source=footer>.

Jay T

unread,
Apr 2, 2021, 12:00:45 AM4/2/21
to raxml
Apologies Alexis! First on the name and also on RaxML alignment thing (misspoke trying to write on mobile and I cannot edit).

Let me clarify. First we use snippy to build a core genome alignment and perform variant calling against a reference. Once the full whole genome alignment is generated, we use gubbins to remove areas of recombination. This gubbins trimmed SNP alignment is then used for phylogenetic reconstruction using RaxML (we are currently testing raxml-ng too). 

So we provide the invariant/monomorphic sites to RaxML for the asc correction because we are not using the original full alignment from raxML. I felt I had made some progress on this but it is still somewhat new to me. I used a tool called raxml_ascbias (https://github.com/btmartin721/raxml_ascbias) to extract and count the invariant sites from the original snippy full alignment. It seems like it works well. 

Does this workflow seem valid and have you used that particular tool? Thanks for sharing your tool and your expertise. I learn something new everyday.

Alexandros Stamatakis

unread,
Apr 2, 2021, 12:25:36 AM4/2/21
to ra...@googlegroups.com
Dear Jay,

> Apologies Alexis! First on the name and also on RaxML alignment thing
> (misspoke trying to write on mobile and I cannot edit).

no problem :-)

> Let me clarify. First we use snippy to build a core genome alignment and
> perform variant calling against a reference. Once the full whole genome
> alignment is generated, we use gubbins to remove areas of recombination.
> This gubbins trimmed SNP alignment is then used for phylogenetic
> reconstruction using RaxML (we are currently testing raxml-ng too).

Okay, now I see.

> So we provide the invariant/monomorphic sites to RaxML for the asc
> correction because we are not using the original full alignment from
> raxML. I felt I had made some progress on this but it is still somewhat
> new to me. I used a tool called raxml_ascbias
> (https://github.com/btmartin721/raxml_ascbias
> <https://github.com/btmartin721/raxml_ascbias>) to extract and count the
> invariant sites from the original snippy full alignment. It seems like
> it works well.
>
> Does this workflow seem valid and have you used that particular tool?

This seems to be valid, but I have never used this tool. In my lab we do
mostly method development and only occasionally analyze empirical data.

> Thanks for sharing your tool and your expertise. I learn something new
> everyday.

:-)

Alexis
> <http://www.exelixis-lab.org <http://www.exelixis-lab.org>>
> >
> > --
> > You received this message because you are subscribed to the Google
> > Groups "raxml" group.
> > To unsubscribe from this group and stop receiving emails from it,
> send
> > an email to raxml+un...@googlegroups.com
> > <mailto:raxml+un...@googlegroups.com>.
> > To view this discussion on the web visit
> >
> https://groups.google.com/d/msgid/raxml/5e239c8f-c05f-41cb-b1a0-1899e7df9861n%40googlegroups.com
> <https://groups.google.com/d/msgid/raxml/5e239c8f-c05f-41cb-b1a0-1899e7df9861n%40googlegroups.com>
>
> >
> <https://groups.google.com/d/msgid/raxml/5e239c8f-c05f-41cb-b1a0-1899e7df9861n%40googlegroups.com?utm_medium=email&utm_source=footer
> <https://groups.google.com/d/msgid/raxml/5e239c8f-c05f-41cb-b1a0-1899e7df9861n%40googlegroups.com?utm_medium=email&utm_source=footer>>.
>
>
> --
> Alexandros (Alexis) Stamatakis
>
> Research Group Leader, Heidelberg Institute for Theoretical Studies
> Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology
>
> www.exelixis-lab.org <http://www.exelixis-lab.org>
>
> --
> You received this message because you are subscribed to the Google
> Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to raxml+un...@googlegroups.com
> <mailto:raxml+un...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/raxml/7cf99737-a64a-4eb9-98d5-61963228766en%40googlegroups.com
> <https://groups.google.com/d/msgid/raxml/7cf99737-a64a-4eb9-98d5-61963228766en%40googlegroups.com?utm_medium=email&utm_source=footer>.
Reply all
Reply to author
Forward
Message has been deleted
0 new messages