Hello,
I've aligned short-read (RAD-capture) data to a reference genome using gstacks, and now I'm trying to get a single SNP per catalog locus, and when I use the "--write-single-snp" option in populations, my populations.snps.vcf has many instances of SNPs that are very close together, tens or hundreds of BPs. Also, it seems that sites aren't always written out in ascending order, like:
OmyA_01 1386933
OmyA_01 1386890
...
OmyA_01 8564993
OmyA_01 8564862
Anyways, in the populations help page, the --write-single-snp option says "restrict data analysis to only the first SNP per locus", so I have two questions:
1) will the resulting vcf file from
populations --vcf --write-single-snp
make a .vcf file that only has one single SNP per catalog locus (vs. just giving the analysis statistics of a single SNP per locus)? And related,
2) how can I tell which genomic coordinates are covered in each STACKS catalog locus? I can't find this information in any of the catalog.* files.
It's also confusing because I'd like to use the populations.haps.vcf file of linked SNPs, but although these alleles are made up of multiple sites, only a single site is given per locus in the populations.haps.vcf file. I was trying to make sense of all the positions by cross-referencing the populations.snps.vcf and populations.haps.vcf files...
Thanks in advance for your help!
Jared