Reference alleles in vcf files do not match reference genome

439 views
Skip to first unread message

Jack Boyle

unread,
Feb 12, 2019, 1:08:26 PM2/12/19
to Stacks
Hello,

I am calling SNPs from GBS data, using ref_map.pl and populations.

Outputting a vcf file from populations produces a vcf in which the "reference" allele does not match the actual reference sequence provided in all cases (in my data set, about 20% of SNPs have the "reference" allele switched).

What is the basis by which stacks decides the reference allele when writing VCF files?

Thanks for your help!

Jack

PS: the populations call:

populations \
  --in_path $stacks_files_directory \
  --popmap $population_map \
  --min_maf 0.02 \
  --max_obs_het 0.60 \
  --write_random_snp \
  --ordered_export \
  -r 0.8 \
  --genepop \
  --vcf

Catchen, Julian

unread,
Feb 14, 2019, 9:41:04 AM2/14/19
to stacks...@googlegroups.com, Jack Boyle
Hi Jack,

Stacks does not use the reference genome for SNP calling, only for
ordering of the loci. All SNP calls are based only on the RAD data
itself, and the one that ends up the 'REF' allele in the exported VCF is
(I think) the allele at the highest frequency in the populations at that
position.

We designed it this way because what many people use as their reference
is not a high quality sequence, but a draft, or even something built
from previous RAD data or some other source.

julian

Jack Boyle wrote on 2/12/19 12:08 PM:
Reply all
Reply to author
Forward
0 new messages