Unable to parse whitelist

536 views
Skip to first unread message

Michael Crossley

unread,
Nov 5, 2015, 5:28:34 PM11/5/15
to Stacks
I generated a vcf file with stacks and did some post-processing in R to widdle things down to some well-behaving SNPs. Now I'd like to go back and generate other file types with this group of SNPs. I tried generating a whitelist, but ran into a problem when I ran populations. I've looked over all the previous conversations about whitelists and am still stumpted about what's going wrong.

I got the following error:
Unable to parse whitelist, './b32_mark3_whitelist.tsv' at line 1

when running the following code:
populations -b 32 -P ./batch32 -M /home/crossley/stacks_analysis/popmaps/b32_popmap.txt -W ./b32_mark3_whitelist.tsv -t 36 --structure --genepop --phase --plink --beagle_phased

My whitelist looks like this:
39624    54670
39625    54670
129051    54192
237967    54403
238003    54403
325909    54569
325915    54569
325918    54569
...

Column 1 is comprised of the position ("POS" in the vcf output of a previous stacks run), and column 2 is comprised of the ID ("ID" in vcf output). I am a bit confused about which of these numbers is the "locus ID" mentioned in the manual, noticing that catalog ID and locus ID are two different things (at least in sstacks *matches.tsv outfiles). If the numbers in the vcf output aren't the right ones to use, how can I identify the SNPs that made it through post-processing in the catalog?

Thanks,

Michael Crossley

unread,
Nov 6, 2015, 9:18:59 AM11/6/15
to Stacks
I am using Stacks version 1.24

Julian Catchen

unread,
Nov 6, 2015, 10:53:25 AM11/6/15
to stacks...@googlegroups.com, mcros...@gmail.com
A whitelist is relative to the assembled loci in the population (each of
those is referred to as a catalog locus which is identified by its
catalog ID), not to a reference position in a genome since Stacks is
designed for both having and not having a reference.

See the manual for examples:

http://catchenlab.life.illinois.edu/stacks/manual/#wl

Michael Crossley

unread,
Nov 6, 2015, 11:23:22 AM11/6/15
to Stacks, mcros...@gmail.com, jcat...@illinois.edu
Is there any field in common between the catalog and vcf output that I could use to identify desired SNPs and create a proper whitelist?


Julian Catchen

unread,
Nov 6, 2015, 11:40:01 AM11/6/15
to stacks...@googlegroups.com, mcros...@gmail.com
The batch_X.sumstats.tsv file has all the information you need. You can
grep the SNPs out of sumstats based on your list compiled from the VCF,
then you can easily cut the locus ID/SNP column out of the file to make
the whitelist. See the definition of the format here:

http://catchenlab.life.illinois.edu/stacks/manual/#files

Michael Crossley

unread,
Nov 11, 2015, 1:34:49 PM11/11/15
to Stacks, mcros...@gmail.com, jcat...@illinois.edu
I am still unsure exactly which column in the *.vcf file corresponds to a column in the *sumstats.tsv file
The *.vcf file has headings Chrom, Pos, ID, while the *sumstats.tsv file has headings Locus ID, Chromosome, Basepair, Column
Does ID match Locus ID?
Does Pos match Basepair?


Michael Crossley

unread,
Nov 11, 2015, 2:32:07 PM11/11/15
to Stacks, mcros...@gmail.com, jcat...@illinois.edu
I've also noticed that the *sumstats.tsv file has many less SNPs than the *.vcf file.
Number of entries in *sumstats.tsv = 14813
Number of unique Locus IDs in *sumstats.tsv = 930
Number of SNPs in *.vcf = 53,844 (and that's after filtering things post-Stacks filtering)

Why might there be so few entries in the *sumstats.tsv file?


Tony Kess

unread,
Jul 12, 2016, 4:12:16 PM7/12/16
to Stacks, mcros...@gmail.com, jcat...@illinois.edu
Hi Michael,

I came across this question while working on a similar problem - ID and POS in VCF match Catalog Locus and BP in the sumstats file. Did you find a way to build a working whitelist with this information?

Michael Crossley

unread,
Jul 12, 2016, 4:43:58 PM7/12/16
to Tony Kess, Stacks, jcat...@illinois.edu
Hi Tony, this was so long ago I honestly don't remember how things turned out. I think I was interested in making a whitelist because I wanted to use Stacks to create certain file formats. But I think I ultimately figured out how to write those file formats with my own code, so no worries.

-Mike
--
Michael Crossley
-a well loved son-
Graduate Research Assistant
UW-Madison, Entomology
1630 Linden Dr. #637
Madison, WI 53706  
(608) 316-5120
mcro...@wisc.edu
Reply all
Reply to author
Forward
0 new messages