Working with haplotypes, not only with biallelic loci

Sab Le Cam

unread,

Sep 21, 2015, 8:48:54 AM9/21/15

to Stacks

Hello,

I have been using stacks (version 1.32) to analyse ddRADseq data (300 samples from 20 populations)
Classically, I have constructed a catalogue of loci and genotyped my samples at these loci using denovo modules (m=10, M=2; n=2, max 2 stacks per locus):
ustacks-->cstacks->sstacks-> rxstack (filtering out loci with bad loglikelihood scores) ->cstacks and sstacks again.

Until now I have been using populations (whitelist, -p 16, -r 0.5) to construct a reliable dataset for downstream population genetics analyses, exporting a random SNP per loci (--write_random_snp) in vcf format.

The thing is, populations filters out loci with more than 2 alleles and I feel like I migth be missing important/meaningfull genetic diversity infos by only considering biallelic loci (or at least I'd like to check if I am...)

I have tried to export haplotypes using --vcf_haplotypes in populations. It didn't work (error message! populations: unrecognized option '--vcf_haplotypes'), but even if it did, only biallelic haplotypes would have been exported, is that right?

Is there a way to export genotypes of multi-allelic loci from stacks in vcf or plink format?
(I am not using the stacks web interface so the export_sql is not an option for me...)

Thanks

Cheers

Sabrina

Julian Catchen

unread,

Sep 21, 2015, 9:30:57 AM9/21/15

to stacks...@googlegroups.com, sable...@gmail.com

Hi Sabrina,

You need to upgrade to the latest version of Stacks. There is a bug in
version 1.32 that causes excessive filtering. You don't need to re-run
the pipeline, just re-run populations once you have upgraded.

Regarding VCF haplotypes, you can only have two haplotypes per
individual biologically, but you can have multiple haplotypes at that
locus across the population, each individual still being biallelic. Let
me know if you still have problems with the option after upgrading.

Best,

julian

Sab Le Cam wrote:
> Hello,
>
> I have been using stacks (version 1.32) to analyse ddRADseq data (300
> samples from 20 populations)
> Classically, I have constructed a catalogue of loci and genotyped my

> samples at these loci using /denovo /modules (m=10, M=2; n=2, max 2
> stacks per locus):
> /ustacks/-->/cstacks/->/sstacks/-> /rxstack /(filtering out loci with
> bad loglikelihood scores) ->/cstacks /and /sstacks /again.
>
> Until now I have been using /populations /(whitelist, -p 16, -r 0.5)//to

> construct a reliable dataset for downstream population genetics
> analyses, exporting a random SNP per loci (--write_random_snp) in vcf
> format.
>

> The thing is, /populations /filters out loci with more than 2 alleles

> and I feel like I migth be missing important/meaningfull genetic
> diversity infos by only considering biallelic loci (or at least I'd like
> to check if I am...)
>

> I have tried to export haplotypes using /--vcf_haplotypes /in
> /populations/. It didn't work (error message! populations: unrecognized

Sab Le Cam

unread,

Sep 21, 2015, 10:25:29 AM9/21/15

to Stacks, sable...@gmail.com, jcat...@illinois.edu

Hi Julian,

Thanks for responding so fast!

Sorry I wasn't very clear in my first message, by biallelic loci I was referring to biallelic loci at the pop level

Not sure what you mean by "excessive filtering".
Is it the reason why --vcf_haplotypes is not working or the reason why only biallelic SNPs (at the pop level) were exported when running populations with the --write_random_snp option?

With the --write_random_snp option, according to the population.log files,it seemed like loci with more than 2 alleles at the pop level were filtered out, but I actually thought it was a normal fearture of populations when dealing with SNPs:

Generating nucleotide-level summary statistics for population '1'
Population '1' contained 4 incompatible loci -- more than two alleles present.

To sum up does this mean I should re-run populations with the upgraded version of stacks for both options? --write_random_snp and --vcf_haplotypes?

Best,

Sabrina

Julian Catchen

unread,

Sep 22, 2015, 2:30:27 PM9/22/15

to Sab Le Cam, Stacks

Hi Sabrina,

You should not use the --write_random_snp and --vcf_haplotypes option in the same run. The random SNP option will prune out all the other SNPs at a locus, therefore removing the RAD locus haplotypes that you get with the --vcf_haplotypes option.

Yes, for SNPs, the system will only process sites that have two alleles across the population at that site. However, the haplotypes produced by Stacks are created when a RAD locus has more than one SNP present, allowing multiallelic RAD loci. If you turn on the --write_random_snp option then all those excess SNPs are pruned and everything collapses back down to biallelic SNPs.

Best,

julian

Sab Le Cam

September 21, 2015 at 9:25 AM

Hi Julian,

Thanks for responding so fast!

Sorry I wasn't very clear in my first message, by biallelic loci I was referring to biallelic loci at the pop level

Not sure what you mean by "excessive filtering".
Is it the reason why --vcf_haplotypes is not working or the reason why only biallelic SNPs (at the pop level) were exported when running populations with the --write_random_snp option?

With the --write_random_snp option, according to the population.log files,it seemed like loci with more than 2 alleles at the pop level were filtered out, but I actually thought it was a normal fearture of populations when dealing with SNPs:

Generating nucleotide-level summary statistics for population '1'
Population '1' contained 4 incompatible loci -- more than two alleles present.

To sum up does this mean I should re-run populations with the upgraded version of stacks for both options? --write_random_snp and --vcf_haplotypes?

Best,

Sabrina

Le lundi 21 septembre 2015 15:30:57 UTC+2, Julian Catchen a écrit :

Julian Catchen

September 21, 2015 at 8:30 AM

Hi Sabrina,

You need to upgrade to the latest version of Stacks. There is a bug in version 1.32 that causes excessive filtering. You don't need to re-run the pipeline, just re-run populations once you have upgraded.

Regarding VCF haplotypes, you can only have two haplotypes per individual biologically, but you can have multiple haplotypes at that locus across the population, each individual still being biallelic. Let me know if you still have problems with the option after upgrading.

Best,

julian

Sab Le Cam

September 21, 2015 at 7:48 AM

Hello,

I have been using stacks (version 1.32) to analyse ddRADseq data (300 samples from 20 populations)

Classically, I have constructed a catalogue of loci and genotyped my samples at these loci using denovo modules (m=10, M=2; n=2, max 2 stacks per locus):
ustacks-->cstacks->sstacks-> rxstack (filtering out loci with bad loglikelihood scores) ->cstacks and sstacks again.

Until now I have been using populations (whitelist, -p 16, -r 0.5) to construct a reliable dataset for downstream population genetics analyses, exporting a random SNP per loci (--write_random_snp) in vcf format.

The thing is, populations filters out loci with more than 2 alleles and I feel like I migth be missing important/meaningfull genetic diversity infos by only considering biallelic loci (or at least I'd like to check if I am...)

I have tried to export haplotypes using --vcf_haplotypes in populations. It didn't work (error message! populations: unrecognized option '--vcf_haplotypes'), but even if it did, only biallelic haplotypes would have been exported, is that right?

Sab Le Cam

unread,

Sep 25, 2015, 5:59:14 AM9/25/15

to Stacks, sable...@gmail.com, jcat...@illinois.edu

Hello Julian,

--vcf_haplotypes option works fine with stacks 1.35

Thanks again

Sabrina

Reply all

Reply to author

Forward