Removed sequences

10 views
Skip to first unread message

Roni L

unread,
Feb 28, 2025, 3:24:18 PMFeb 28
to bali-phy-users
Hi Ben, 

I am now up and running and successfully using the software. 

One issue that we cannot work out is the removal of sequences from the input. In my last run (around 55 sequences inputted) 8 sequences did not appear anywhere in the output, they are 100% in the input file in the same format as the others. They are also not repeated sequences and I can not identify any pattern or reason to them being removed?

I would appreciate your thoughts on this,
Thanks,
Roni

Benjamin Redelings

unread,
Feb 28, 2025, 3:34:54 PMFeb 28
to bali-ph...@googlegroups.com
Hi Roni,

Nothing immediately jumps to mind about why sequences would be
removed.   The easiest way to identify the issue would be to e-mail me
the sequences, or another group of sequences that have the same
problem.   Are you comfortable letting me see them?

-BenRI
> --
> You received this message because you are subscribed to the Google
> Groups "bali-phy-users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to bali-phy-user...@googlegroups.com.
> To view this discussion visit
> https://groups.google.com/d/msgid/bali-phy-users/e9da59ba-15f3-4bc5-948c-30c532b16b9fn%40googlegroups.com
> <https://groups.google.com/d/msgid/bali-phy-users/e9da59ba-15f3-4bc5-948c-30c532b16b9fn%40googlegroups.com?utm_medium=email&utm_source=footer>.

Benjamin Redelings

unread,
Mar 8, 2025, 3:07:20 PMMar 8
to bali-ph...@googlegroups.com
Hi Roni,

Any followup? Without seeing the sequences I suspect that you have made some kind of mistake in the sequence file, for example merging two sequences. If I have a test case, I can verify whether or not the problem is in bali-phy or the sequences. Without a test case my hands are tied.

One thing you can try is to run the bali-phy tool 'alignnent-info' on the sequence file. You can also try opening the sequences in (for example) aliview or Seaview to check that they are read correctly.

-BenRI

Roni L

unread,
Mar 13, 2025, 3:52:20 PMMar 13
to bali-phy-users
Hi Ben, 

So my fasta formatter was locked at 60 sequences therefor the remainder was being removed! Thankfully this issue is now resolved.

We have a few variable regions that we were hoping to try use a protein mixture CAT model, is this available in the current download? I notice a previous post about using the model back in 2017 but can't see any notes in the User Guide.

Let me know if you have any notes on this,
Thanks
Roni

Benjamin Redelings

unread,
Mar 13, 2025, 4:22:56 PMMar 13
to bali-ph...@googlegroups.com
Hi Roni,

Glad to hear the first issue is resolved!

The exact CAT model isn't available.  The CAT model is a mixture of an
unknown number of F81 models with different frequencies, and work on
amino acids.  There are a few options to do something similar:

- I can add back the C10 and C20 models (with fixed frequencies and
fixed number of components) for bali-phy 4.1 if you want. These are
simplified versions of the CAT models with pre-estimated amino-acid
frequencies in each component.  Estimating under these models just needs
to estimate the weights for each component.

- You can specify a fixed number of components and estimate the
frequencies.  For example, to do 10 components, you would do:

    bali-phy amino-acids.fasta --smodel
"mixture([f81,f81,f81,f81,f81,f81,f81,f81,f81,f81]) +> Rates.gamma(n=4)"

This will have 40 components, and so will be quite slow.

- You can also try something like --smodel
"mixture([lg08,lg08,lg08,lg08,lg08]) +> Rates.gamma(n=4)". This is a
mixture of 5 LG08 models with different frequencies. F81 is a fairly
simple model, so LG08 can lead to higher likelihoods.

- Each of these models works with alignments estimated or fixed.

- I think the phyloBayes software may have some tricks to speed up the
likelihood evaluation specifically for the CAT model. bali-phy doesn't
have those tricks, so its going to be slower even if the alignment is fixed.

- if you want a model that operates on CODONS (instead of amino acids),
then you want something different than the CAT model.  For example,

    bali-phy bglobin.fasta --smodel 'mixture([m +> mut_sel_aa, m +>
mut_sel_aa, m +> mut_sel_aa, m +> mut_sel_aa, m +> mut_sel_aa]) +>
Rates.gamma where {m=gtr +> x3 +> dNdS}'

Since this works on codons, it will be much slower.

-BenRI
> <https://groups.google.com/d/msgid/bali-phy-users/e9da59ba-15f3-4bc5-948c-30c532b16b9fn%40googlegroups.com?utm_medium=email&utm_source=footer>>.
>
> --
> You received this message because you are subscribed to the Google
> Groups "bali-phy-users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to bali-phy-user...@googlegroups.com.
> To view this discussion visit
> https://groups.google.com/d/msgid/bali-phy-users/e37855d1-7798-4727-a923-8917a9891e23n%40googlegroups.com
> <https://groups.google.com/d/msgid/bali-phy-users/e37855d1-7798-4727-a923-8917a9891e23n%40googlegroups.com?utm_medium=email&utm_source=footer>.

roni...@icloud.com

unread,
Mar 20, 2025, 12:49:50 PMMar 20
to bali-ph...@googlegroups.com
Hi Ben, 

Apologies, I am studying part time hence the spaced out responses. 

Re-adding C20 models (with fixed frequencies and fixed number of components) for bali-phy 4.1 sounds good. In the mean time I will try the mixture models. 

Thanks,
Roni

You received this message because you are subscribed to a topic in the Google Groups "bali-phy-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/bali-phy-users/QSgIGSzwZ2Q/unsubscribe.
To unsubscribe from this group and all its topics, send an email to bali-phy-user...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/bali-phy-users/b9b173c8-1225-4673-a57e-28a605255aba%40gmail.com.

Reply all
Reply to author
Forward
0 new messages