MCMC non-convergence (Skygrid, 500 taxa) & weak temporal signal after subsampling

21 views

Skip to first unread message

Si-Qian Wu

unread,

Mar 15, 2026, 2:47:28 PMMar 15

to beast-users

Hi everyone,

I'm seeking advice on dataset subsampling and MCMC convergence for a single segment of an endemic arbovirus. I am trying to reconstruct its spatiotemporal history but have hit a bottleneck.

Dataset & Subsampling:

Original: 2,500 - 3,000 full-length sequences.
Cleaning: Excluded all intra-segment recombinants and inter-segment reassortants.
Subsampling: Removed identical sequences clustered by year/location/host, then applied a spatiotemporal stratified subsampling (capping seqs per province/year grid).
Final size: ~500 representative sequences.

The Issues:

Weak Temporal Signal: Checking the 500-taxa dataset in TempEst shows a very weak root-to-tip regression (R2 ≈ 0.02 - 0.05).
MCMC Non-convergence: I ran a baseline BEAST analysis (without discrete spatial traits yet) using a Skygrid demographic model. To compensate for the weak temporal signal, I set an informative clock prior (Normal: mean = 1.0E-4, stdev = 0.5E-5). However, after 300 million generations, the MCMC completely fails to converge (Posterior and most parameter ESS < 100).

Hardware & Performance:

The run is exceedingly slow. I am using a Windows 11 mini-PC (AMD Ryzen 7 PRO 6850H, 8 cores, 32 GB RAM, integrated AMD Radeon Graphics) with the BEAGLE library via OpenCL.

My Questions:

Subsampling: Could my spatiotemporal stratified subsampling have inadvertently flattened the temporal structure? Are there more robust subsampling strategies for large endemic datasets to preserve a sufficient temporal signal?
Convergence: Is the weak temporal signal actively preventing the Skygrid model from converging despite the informative prior? Would pruning specific outliers, reducing Skygrid grid points, or fixing the substitution rate entirely be a better approach here?
Performance: Do you have any suggestions for modifying the priors/operators, or specific BEAGLE command-line flags to run a 500-sequence Skygrid model more efficiently on an integrated AMD GPU setup?