Dear list,
I would like to extract SNPs from a VCF file which are localized in CDS. The CDS positions are specified in an external GFF3 file. I considered using
intersectBed to perform this step. However, using the command below, the same SNPs can be repeated multiple times. Yet, when I use the
-u option, there is no duplicates in the output VCF anymore... but I do not clearly see why
intersectBed behaves this way without the
-u option since there is no strict "interval" in my VCF, only variant monobase positions.
bedtools intersect \
-a input.vcf \
-b cc.gff3 \
-header > CDS.vcf
Could you explain as to why intersectBed would output redundant SNPs when using the basic intersection method described in the command above?
Thank you very much for your lights,
Sincerely,
Rémi