Hi Michael,
In response to your query (which was by no means the first), I've just implemented a method to use freebayes as a pooled frequency-based caller. I believe this will bring it in line with your experimental needs.
Here is how you can use freebayes as a frequency-based caller:
freebayes -f ref.fa --pooled-continuous alignments.bam | vcfkeepinfo - AO RO TYPE
....
chr20_bit 196 . AT A 0 . AO=17;RO=4547;TYPE=del
chr20_bit 203 . GA G 0 . AO=10;RO=4767;TYPE=del
chr20_bit 212 . C G 0 . AO=9;RO=5022;TYPE=snp
chr20_bit 220 . C A 0 . AO=8;RO=5037;TYPE=snp
chr20_bit 223 . C CA 0 . AO=2;RO=5040;TYPE=ins
chr20_bit 228 . T A 0 . AO=8;RO=5074;TYPE=snp
chr20_bit 233 . T C 0 . AO=6;RO=5148;TYPE=snp
Now, you have counts for the reference allele (RO) and each alternate allele (AO) which pass input filters (by default -F 0.2 -C 2). Make sure to set these lower (e.g. -F 0 -C 1 is completely open) or you will not detect anything below 20% frequency and/or at least 2 observations in the same sample. That noted, the major way to improve the specificity of the called alleles in this case is to adjust input filters.
You can now use the AO and RO results to establish per-allele, per-site observation frequencies, which should approximate the frequency in your pool. To make this even easier, I'm considering adding another optional field when --pooled-continuous is specified, EAF (estimated allele frequency). Do not use the AF field in the output, as this will correspond to the AF under the model.
In some cases, you may find that opening up the filters will slow runtime dramatically by increasing the computational complexity of the analysis. To avoid this increase, add --report-genotype-likelihood-max.
Additionally, you can use the --max-complex-gap setting to call variable haplotypes defined between alleles not separated by more than a given number of base pairs. In noisy data, this can help to improve specificity, as your context means you won't be able to rely on the Bayesian model to exclude artifacts.
Notes:
- vcfkeepinfo is in vcflib/ in the root of the freebayes source tree.)
- The old "pooled" behavior is retained as --pooled-discrete, and can be mixed with --pooled-continuous.)
Hope this helps!
Best,
Erik