Hello Xiaobo,
I would go with option 3.
Another (perhaps better) way to think of this is that what matters if L * mu, where mu is the mutation rate to sites that can enter your analysis. So you can think of L as being fixed at the length of the target region, but the effective mutation rate is lower because you only count the rate for sites that also pass the other criteria.
Best,
Ryan
> On Apr 22, 2024, at 7:12 AM, Xiaobo Qian (Xb) <
qianxia...@gmail.com> wrote:
>
> Hi Ryan,
>
> Thank you for your reply!
>
> One more question for me again when read other's paper. Due to no reply from the author, I come to bother you again :(
>
> The paper decribed as following:
> "To avoid the bias caused by the coding sequence regions, we selected intergenic, synonymous, and intronic sites from the target region as the neutral sites for analysis." (Unluckily, author did not tell us what is the targe region)
>
> If it makes sense, and we assumed 10M variants called from the whole genome (~28000M), then 1M variants satified with criteria were selected from target region or whole genome (I don't know), would L be which one below?
> 1. (The total length of whole genome) * 1M / 10M
> 2. (The total length of target region) * 1M / 10M
> 3. (The total length of target region) * 1M / (just number of variants called from target region, may be selected from 10M variants)
>
> Bests,
> Xiaobo