Running dadi using variant-only vcf as input?

1 view
Skip to first unread message

Isaac Linn

unread,
Feb 19, 2026, 5:32:15 PM (3 days ago) Feb 19
to dadi-user
Sorry if this is a simple question, but I haven't found a clear answer so far.

I'm going through the process of running dadi on a population dataset for which I have a SNP-only VCF. I can go through the process of variant calling again, but at present, I don't have access to a VCF with variant and invariant sites. I'm wondering whether I can use dadi with this input only, whether I need to pad the dataset with monomorphic sites, or whether it's not an issue. I've tried running dadi so far and population size estimates look way low when I use L=length of callable input sequence, but when I use L= number of sites in VCF (again, mostly polymorphic) the population size estimates are a reasonable order of magnitude (compared with nucleotide diversity). 

I think I understand that the monomorphic sites would not make it into the allele frequency spectrum, but I'm concerned that it would impact the underlying model or estimation of theta. 
Thanks,
Isaac

Ryan Gutenkunst

unread,
Feb 20, 2026, 12:32:45 PM (3 days ago) Feb 20
to dadi...@googlegroups.com
Hello Isaac,

Yes, dadi will work fine with a variant-only vcf; it will not impact the model or estimation of theta compared to an all-sites VCF.

Estimating L is sometimes tricky, depending on how the sequencing was done. It may be that certain regions were uncallable or masked (for example, repetitive regions), so L is rarely the full genome size. Setting L equal to the number of variant sites is a mistake.

Best,
Ryan

--
You received this message because you are subscribed to the Google Groups "dadi-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dadi-user+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/dadi-user/1fa08eda-b25d-4c9d-a319-e50cbfe1b6dan%40googlegroups.com.

Reply all
Reply to author
Forward
0 new messages