As per your previous discussions with other users, I noticed that theta=4*Nref*mu*L .
I am considering a genotype data in VCF format for humans. I have 60 million SNPs in my dataset, which becomes 1 million in number after LD pruning. I am using these 1 million SNPs to create folded frequency spectrum in dadi. In this scenario, what should be my L? 60 million or 1 million?
My next question is, can I consider mutation rate mu as 10^(-8) for humans in equation for theta (theta=4*Nref*mu*L) ?
My third question: which approach is better for overall process and uncertainty analysis?
a) frequency spectrum for full data, Godambe Information matrix
or b) frequency spectrum for LD-pruned data, Fisher information matrix?
Arghya.