MCMC non-convergence (Skygrid, 500 taxa) & weak temporal signal after subsampling

4 views
Skip to first unread message

Si-Qian Wu

unread,
Mar 15, 2026, 2:47:28 PM (yesterday) Mar 15
to beast-users

Hi everyone,

I'm seeking advice on dataset subsampling and MCMC convergence for a single segment of an endemic arbovirus. I am trying to reconstruct its spatiotemporal history but have hit a bottleneck.

Dataset & Subsampling:

  • Original: 2,500 - 3,000 full-length sequences.

  • Cleaning: Excluded all intra-segment recombinants and inter-segment reassortants.

  • Subsampling: Removed identical sequences clustered by year/location/host, then applied a spatiotemporal stratified subsampling (capping seqs per province/year grid).

  • Final size: ~500 representative sequences.

The Issues:

  1. Weak Temporal Signal: Checking the 500-taxa dataset in TempEst shows a very weak root-to-tip regression (R2 ≈ 0.02 - 0.05).

  2. MCMC Non-convergence: I ran a baseline BEAST analysis (without discrete spatial traits yet) using a Skygrid demographic model. To compensate for the weak temporal signal, I set an informative clock prior (Normal: mean = 1.0E-4, stdev = 0.5E-5). However, after 300 million generations, the MCMC completely fails to converge (Posterior and most parameter ESS < 100).

Hardware & Performance:

The run is exceedingly slow. I am using a Windows 11 mini-PC (AMD Ryzen 7 PRO 6850H, 8 cores, 32 GB RAM, integrated AMD Radeon Graphics) with the BEAGLE library via OpenCL.

My Questions:

  1. Subsampling: Could my spatiotemporal stratified subsampling have inadvertently flattened the temporal structure? Are there more robust subsampling strategies for large endemic datasets to preserve a sufficient temporal signal?

  2. Convergence: Is the weak temporal signal actively preventing the Skygrid model from converging despite the informative prior? Would pruning specific outliers, reducing Skygrid grid points, or fixing the substitution rate entirely be a better approach here?

  3. Performance: Do you have any suggestions for modifying the priors/operators, or specific BEAGLE command-line flags to run a 500-sequence Skygrid model more efficiently on an integrated AMD GPU setup?

Thanks in advance for your time and insights!

Reply all
Reply to author
Forward
0 new messages