Hi!
> I'm not convinced that we need to do clustering rather than proper
> phylogenetic analysis, but it would be nice to get any advice!
As always, I'd start out by asking "What's the question?"
(1) If you want to ask 'How similar are these proteins?' then clustering on a distance metric is describing similarity.
(2) If you want to ask 'How are these proteins related?' then formal alignment and model-based phylogenetic inference is the way forward.
Of course, (2) is quite likely to answer (1) as well, unless there's a lot of convergence. And you might use (1) on the way to (2), as a rough approximation to define the limits of the analysis (i.e. to guess at orthology)
I would only cluster based on distance if you explicitly expect similarity and phylogeny to be decoupled (e.g. lots of convergence, or many gained or lost domains) or if phylogenetic analysis is intractable either for computational requirements, or because evolutionary models can't be usefully applied (i.e. no meaningful alignment due to domain gains/ losses or re-ordering).
For (2), my current favourite would be t_coffee -mode 'accurate', followed by IQtree2
Regards!
D
--
Darren Obbard
darren...@ed.ac.uk
Institute of Evolutionary Biology
University of Edinburgh
Ashworth Laboratories, Charlotte Auerbach Road
Edinburgh EH9 3FL
Office 0131 651 7781
Mobile: 07968 838 635
http://obbard.bio.ed.ac.uk/
> -----Original Message-----
> From:
ashworth-c...@googlegroups.com <ashworth-code-
>
mon...@googlegroups.com> On Behalf Of Edward Wallace
> Sent: 24 May 2021 15:48
> To:
ashworth-c...@googlegroups.com
> Subject: [ashworth-code-monkeys] Clustering proteins in multiple sequence
> alignments?
>
> This email was sent to you by someone outside the University.
> You should only click on links or attachments if you are certain that the email
> is genuine and the content is safe.
> --
> The wiki is at:
>
https://www.wiki.ed.ac.uk/display/AshCodes/Ashworth+Codemonkeys
> The mailing list archive is at:
>
https://groups.google.com/forum/?fromgroups#!forum/ashworth-code-
> monkeys
> If you have trouble editing the wiki or emailing the group, let me know:
>
sujai...@ed.ac.uk
> ---
> You received this message because you are subscribed to the Google Groups
> "Ashworth Codemonkeys" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to
ashworth-code-mo...@googlegroups.com
> <mailto:
ashworth-code-mo...@googlegroups.com> .
> To view this discussion on the web, visit
>
https://groups.google.com/d/msgid/ashworth-code-
> monkeys/CALKBTxaMo9KXoKyiTLJ9BqtZORX2a0Ts0njZGHi16qmtSmH-
> NA%
40mail.gmail.com <
https://groups.google.com/d/msgid/ashworth-
> code-monkeys/CALKBTxaMo9KXoKyiTLJ9BqtZORX2a0Ts0njZGHi16qmtSmH-
> NA%
40mail.gmail.com?utm_medium=email&utm_source=footer> .
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.