Dear all,
I am using STACKS to analyze GBS data on Sorghum for which we have a reference genome. I went through the pipeline to call SNPs (demultiplexing, mapping with bowtie, ref_map.pl and Populations program). Everything went well, but I observed in my output VCF file that duplicated SNP were present. At the same genomic position, the same SNP can be present twice and have the same alleles. These data seem to come from the same read that have been sequenced in both forward and reverse sense. As a result of that, I have data for one SNP for some given individuals and some data for the other duplicated SNP for some other individuals.
Here is an example from my VCF file:
#CHROM |
POS |
ID |
REF |
ALT |
QUAL |
FILTER |
INFO |
Sb01 |
27283 |
144413 |
C |
G |
. |
PASS |
NS=35;AF=0.786,0.214 |
Sb01 |
27283 |
144398 |
C |
G |
. |
PASS |
NS=43;AF=0.721,0.279 |
I am wondering why STACKS pipeline did not merged these duplicated SNP? Furthermore, I would like to merge them, to conserve only one SNP when necessary in order to merge information of depth and through the different individuals (i.e. I have in total 62 individuals, in the example above I have SNP data for 35 individuals for the first SNP and for 43 individuals for the second SNP).
Is someone has already meet this problem? Am I using STACKS the wrong way? Please, is anyone can point me out to a solution?
Many thanks,
Baptiste
--
Stacks website: http://creskolab.uoregon.edu/stacks/
---
You received this message because you are subscribed to a topic in the Google Groups "Stacks" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stacks-users/Ag8YyEFe7z0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to stacks-users...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
$ cat GATKplusStacks_1.map | awk '{print $1 "_" $4}' | uniq -c | grep ^" 2" | head
# 2 Scaffold0_188976
# 2 Scaffold0_189003
# 2 Scaffold0_189006
# 2 Scaffold0_1337593
# 2 Scaffold0_1694204
# 2 Scaffold0_1805958
# 2 Scaffold0_1805978
# 2 Scaffold1_213295
# 2 Scaffold1_213362
# 2 Scaffold1_364081
$ cat GATKplusStacks_1.map | awk '{print $1 "_" $4}' | uniq -c | grep ^" 2" | wc -l
#1185
This means that merging does not seem to be doing what I had hoped, even in the VCF output. I tried putting both enzymes into this command and this doesn't work, so I'm not sure how to get these loci to merge.