VCF ID column empty in populations output (STACKS 2.5)

160 views
Skip to first unread message

Eric Normandeau

unread,
Feb 5, 2020, 3:59:26 PM2/5/20
to Stacks
I'm using STACKS 2.5 and finding that my filtering and summarizing scripts no longer work. Digging in the problem, I find that the VCF format seems to have changed. Namely, the ID column only contains dots now whereas it used to contain useful information about the locus number and position.

I'm not seeing this change listed in the change log. Was this planned?

I'd strongly advocate for putting back this information back in the ID column for backwards compatibility for all of us who have scripts that parse the VCFs from STACKS to extract meaningful information.

Take care,

Eric
Message has been deleted

Thierry Gosselin

unread,
Feb 6, 2020, 11:14:42 AM2/6/20
to Stacks
Salut Eric

Nicolas and Julian made changes in the 3 firsts stacks VCF columns (CHROM, POS, ID) starting in version 1.42 (Aug 05, 2016), 
then the biggest changes came in version 2 Beta 7 (Dec 29, 2017) and 10 (Apr 10, 2018). 

All this to make VCf more standard compliant... It's now similar to VCFs produced by: bcftools, freeBayes, GATK, ipyrad, platypus, etc.

For de novo, the CHROM column contains the LOCUS info, the position of the SNP on the read is = POS minus 1

I'm reading all these versions and VCFs in radiator and it's a lot of if ... else... 

Best
Thierry

Julian Catchen

unread,
Feb 6, 2020, 11:28:01 AM2/6/20
to 'Thierry Gosselin' via Stacks, eric.norm...@gmail.com
Hi Eric and Thierry,

As you mention, the changes are now quite old, but we don't want to
break scripts. After discussing it with Nicolas, I think we could modify
the ID field. Part of the change was to better record the alignment
positions in ref-based data. The previous three columns of the VCF in a
denovo analysis looked like:

un 4123284 cloc_col0

Where chromosome was always 'un' for unknown, the basepair was a the
running length of all RAD loci, so not meaningful, and the catalog locus
and column of the SNP (where 'col' was zero based -- how it is tracked
internally in the software).

The change for de novo was:

cloc col1 .

Where the catalog locus is now the 'chromosome', and the position is the
1-based column of the SNP (standards compliant).

Ref-map went to:

chrX 241663 cloc:col1:clocstrand

Where the chromosome and basepair are the alignment position and
alignment strand has been added.

So, we could change it for de novo:

cloc col1 cloc:col1

and ref-based to

chrX 241663 cloc:col1:strand

So, still slightly different, but should just be a small change to a script.

What do you guys think?

julian

'Thierry Gosselin' via Stacks wrote on 2/6/20 10:13 AM:
> Salut Eric
>
> Nicolas and Julian made changes in the 3 firsts stacks VCF columns
> (CHROM, POS, ID) starting in version 1.42 (Aug 05, 2016),
> then the biggest changes came in version 2 Beta 7 (Dec 29, 2017) and 10
> (Apr 10, 2018).
>
> All this to make VCf more standard compliant... It's now similar to VCFs
> produced by: bcftools, freeBayes, GATK, ipyrad, platypus, etc.
>
> For /de novo/, the CHROM column contains the LOCUS info, the position of
> the SNP on the read is = POS minus 1
>
> I'm reading all these versions and VCFs in stackr and it's a lot of if
> ... else...
>
> Best
> Thierry
>

Eric Normandeau

unread,
Feb 6, 2020, 11:44:35 AM2/6/20
to Julian Catchen, stacks...@googlegroups.com
Hi,

I like any version that has the same info in column 3 for reference and
denovo (minus the strand potentially).

Is there a reason why "un" is not good for the Chromosome? I'd probably
go with:

un  whatever  cloc:col1

But this is also good:

whatever  whatever cloc:col1

Thanks Julian and Nicolas for all the work on STACKS. Whatever you
decide, including the status quo, I'll just adjust my scripts :)

Take care,

Eric

Thierry Gosselin

unread,
Feb 6, 2020, 11:46:45 AM2/6/20
to Stacks
I like the way it is for both de novo and reference-based!!
🤔
But then it might be because I don't want to add another if ... else... based on stacks version >2.6 😜

All kidding aside, I would try to follow the latest VCF spec (4.3?) and go for what's easiest for you guys.
I'll follow and adjust any path taken 

Eric Normandeau

unread,
Feb 11, 2020, 8:47:47 AM2/11/20
to Stacks
I modified my script so if no one else needs the change feel free to discard my suggestion!
Reply all
Reply to author
Forward
0 new messages