populations -P $denovo -M $popmap/argus_full_trimmed3 -O $output/argus_assays_run1 -r 0.8 -p 13
cat populations.sumstats.tsv | grep -v "^#" | cut -f 1 | uniq -c | sort | awk '$1 == 13' | sed 's/........//' | sort -n > tags_single_snp
populations -P $denovo -M $popmap/argus_full_trimmed3 -O $output/argus_assays_run1/white_list_loci \
-W $output/argus_assays_run1/tags_single_snp --genepop --fasta-loci --fasta-samples --vcfHi Andy,
The -r and -p filters are implemented per nucleotide – the program first filters per locus and then filters individual SNPs within loci.
I can think of a few ways your code could fail, one example would occur when you have one locus with a SNP found in 6 populations and a second SNP found in 7 different populations, adding up to 13.
You can just be a bit fancier with your UNIX to get a more robust list. Instead of only filtering on the locus ID, combine locus ID with SNP column to get one entry per snp per locus, like this:
cat populations.sumstats.tsv | grep -v "^#" | cut -f 1,4 | sort -n | uniq -c
Which looks like:
2 261 189
2 261 195
2 261 198
9 262 348
7 263 155
7 263 304
7 263 335
7 263 388
7 263 6
2 264 182
2 264 240
2 264 289
2 264 301
And this represents: “the number of populations” “locus” “SNP-column”.
Then, pull out loci/SNP combos where all 13 populations are represented:
cat populations.sumstats.tsv | grep -v "^#" | cut -f 1,4 | sort -n | uniq -c | awk '{ if($1 == 13) print $2}' | sort -n | uniq -c
This will look like:
10 1045
7 1046
4 1047
1 1048
4 1049
6 1050
1 1051
7 1052
9 1053
Which is now telling you how many SNPs are occurring per locus (when the locus had to occur in all 13 pops), then, just pull out those loci with a ‘1’ in the first column, which are loci in all 13 populations with one SNP:
cat populations.sumstats.tsv | grep -v "^#" | cut -f 1,4 | sort -n | uniq -c | awk '{ if($1 == 13) print $2}' | sort -n | uniq -c | grep -E " +1 [0-9]+" | awk '{if ($1 == 1) print $2;}'
Afterwards, take the loci from this list and spot check a few (by grepping them back out of the sumstats file) to make sure everything worked.
However, I’m not sure why you want to arbitrarily limit the data to loci with only a single SNP? You could just use --write-single-snp or --write-random-snp to get a wider breadth of loci more easily.
Best,
julian
--
Stacks website: http://catchenlab.life.illinois.edu/stacks/
---
You received this message because you are subscribed to the Google Groups "Stacks" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
stacks-users...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/stacks-users/940394fb-0926-423d-90c9-ca9efe3610f8n%40googlegroups.com.
Julian
Thanks for getting back to me.
For the population genetics part of my project, I have been using the –write_single_snp filter to generate my dataset.
However, one of the aims of my project is to develop a low-density SNP panel for the target species. The assays will be run on the Fluidigm/Standard Biotools EP1 system. The system is based on allele-specific PCR. To help with assays design, the recommendation is that there is not another SNP within 30bp upstream or downstream of the target SNP. In the past to make things easier, I therefore have choosen RADtags that contain only a single SNP that ideally is polymorphic in all my populations.
Having tried your code, it does indeed pull out SNPs that are found in all 13 populations. However, when I have double checked some of these loci, the RADtags also have multiple other SNPs that were polymorphic in <13 populations and are therefore not ideal for assay design.
In have the additional problem that my coverage (after removing PCR duplicates) isn’t great (average across all samples ~10x), so I suspect that some of the SNPs are actually sequencing error.
Best Wishes
Andy
From:
stacks...@googlegroups.com <stacks...@googlegroups.com> on behalf of Catchen, Julian <jcat...@illinois.edu>
Date: Friday, 15 December 2023 at 22:19
To: stacks...@googlegroups.com <stacks...@googlegroups.com>
Subject: Re: [stacks] Single SNP loci
CAUTION: This email originated from outside of the organisation. Do not click links or open attachments unless you recognise the sender and know the content is safe.
.
To view this discussion on the web visit https://groups.google.com/d/msgid/stacks-users/SN6PR11MB25572A8F8684806BF5DD71B5A793A%40SN6PR11MB2557.namprd11.prod.outlook.com.