Hi Eric and Thierry,
As you mention, the changes are now quite old, but we don't want to
break scripts. After discussing it with Nicolas, I think we could modify
the ID field. Part of the change was to better record the alignment
positions in ref-based data. The previous three columns of the VCF in a
denovo analysis looked like:
un 4123284 cloc_col0
Where chromosome was always 'un' for unknown, the basepair was a the
running length of all RAD loci, so not meaningful, and the catalog locus
and column of the SNP (where 'col' was zero based -- how it is tracked
internally in the software).
The change for de novo was:
cloc col1 .
Where the catalog locus is now the 'chromosome', and the position is the
1-based column of the SNP (standards compliant).
Ref-map went to:
chrX 241663 cloc:col1:clocstrand
Where the chromosome and basepair are the alignment position and
alignment strand has been added.
So, we could change it for de novo:
cloc col1 cloc:col1
and ref-based to
chrX 241663 cloc:col1:strand
So, still slightly different, but should just be a small change to a script.
What do you guys think?
julian
'Thierry Gosselin' via Stacks wrote on 2/6/20 10:13 AM:
> Salut Eric
>
> Nicolas and Julian made changes in the 3 firsts stacks VCF columns
> (CHROM, POS, ID) starting in version 1.42 (Aug 05, 2016),
> then the biggest changes came in version 2 Beta 7 (Dec 29, 2017) and 10
> (Apr 10, 2018).
>
> All this to make VCf more standard compliant... It's now similar to VCFs
> produced by: bcftools, freeBayes, GATK, ipyrad, platypus, etc.
>
> For /de novo/, the CHROM column contains the LOCUS info, the position of
> the SNP on the read is = POS minus 1
>
> I'm reading all these versions and VCFs in stackr and it's a lot of if
> ... else...
>
> Best
> Thierry
>