I would appreciate some advice regarding the "proportion invariable" (I) parameter in the Site Model specification panel in BEAST2.
I typically run BEAST2 using a SNP alignment that contains only variable sites, as I work with bacterial genomes and including all sites would require a prohibitive amount of RAM. To account for the excluded constant sites, I manually add the constant-site correction to the XML following Remco's guidance in Correcting for constant sites in BEAST2. Correcting for constant sites in BEAST2
Conceptually, it seems reasonable to use a proportion-invariable model because the vast majority of sites in my bacterial genomes are invariant (e.g., >99.999%). My intuition is that I should estimate this parameter and perhaps provide a starting value such as 0.9.
However, in practice, BEAST2 only receives an alignment containing variable sites, with the constant sites represented indirectly through the constant-site correction. Given this setup, I am unsure whether it is appropriate to estimate the proportion-invariable parameter at all, or whether doing so would effectively double-count information already accounted for by the constant-site correction.
Could you advise on the recommended approach in this situation?
Cheers,
Koen Vdl