Best strategy for phylogeographic with thousands of sequences

55 views
Skip to first unread message

geo

unread,
Apr 1, 2026, 1:53:03 AMApr 1
to beast-users

Dear all,

I recently faced the need to perform a phylogeographic analysis on several viral strains circulating across different European countries. In this context, my goal is both to identify significant state-to-state transition rates (i.e., migration links) and to assess potential predictors using a GLM framework.

Given the availability of several thousand sequences, I initially opted for a two-step approach to keep the analysis computationally tractable. First, I performed a phylodynamic analysis to estimate the tree distribution, and then Icalcualte the MCC tree to run the phylogeographic inference while keeping the topology fixed. To maintain consistency, I used BEAST X v10 for both steps, specifying operators appropriate for a fixed-topology analysis in the second run.

However, even under this setup, the phylogeographic analysis appears to proceed rather slowly and does not seem to reach convergence as quickly as expected. This was somewhat surprising, given that only a single fixed topology was used.

I was therefore wondering:

  • Is this behavior expected for this type of analysis. I've attempted with the BSSVS selction approach only (not GLM yet)?
  • Is there an established or recommended way to automate phylogeographic inference over a subset of trees sampled from the posterior (rather than relying on a single MCC tree)?
  • Alternatively, is it necessary to run independent analyses for each selected tree?

Any clarification or suggestions on hot to procced would be greatly appreciated.

Many thanks in advance.

nicolad...@gmail.com

unread,
Apr 2, 2026, 3:04:11 AMApr 2
to beast-users
I would suggest using Thorney BEAST which should scale to something like 10k sequences a be more accurate than fixing the bifurcating topology (which might not be consistent with the geographic data).
If that also doesn't scale (but it should) then another option could be for example PastML.

Nicola

geo

unread,
Apr 3, 2026, 8:47:46 PMApr 3
to beast-users
Thank you, I'll try to use the approaces you suggested

geo

unread,
Apr 7, 2026, 4:14:49 AMApr 7
to beast-users

Dear Nicola,

Sorry for the stupid question

I am currently experiencing some difficulties in using BEASTGen, at first in locating the Java executable (JAR file) required to run it. In fact, in the most recent version available online (
v1.10.5pre_thorney_v0.1.1), I was unable to find the compiled JAR file.

Additionally, I was wondering whether if  there is a template or a proper manual in order to include continuous phylogeographic analyses and GLM models. I've seen that in the template provieded in the "Approaches for analyzing large phylogenetic datasets" this kind of analysis are not included.   Specifically, I would be interested in understanding whether there are existing functions or workflows that allow these components to be implemented automatically.

I apologize for the rather naive nature of this question, but any guidance based on your experience would be greatly appreciated.

Best regards,
Giovanni

Reply all
Reply to author
Forward
0 new messages