SNP Whitelists

181 views
Skip to first unread message

Rosie Purdy

unread,
Feb 1, 2024, 12:30:16 PM2/1/24
to Stacks
Hiya,

I'm using gstacks and populations for my reference-aligned data (ddRAd-seq). All is well so far although I want to create a whitelist of all my loci with 1-3 SNPs in for my 96 samples, but I'm unsure of how to go about this. 
I've seen the manual's example method to generate a list of random loci from populations.summary.tsv files, but I wouldn't know how I could adapt that for my analysis (I used --write-single-snp in my population code, and don't want random SNPs).

I'm very new to bioinformatics and coding so any (fairly dumbed down) suggestions and example code would be hugely appreciated.

Thanks!
Rosie

Catchen, Julian

unread,
Feb 1, 2024, 5:13:35 PM2/1/24
to stacks...@googlegroups.com

Hi Rosie,

 

This is the most recent treatment of this topic:

 

https://groups.google.com/g/stacks-users/c/-nbCH9SBYfA/m/AqHOlSaPAgAJ

 

Best,

 

julian

Rosie Purdy

unread,
Feb 2, 2024, 9:05:12 AM2/2/24
to Stacks
Hi Julian,

Thank you for your quick reply! I've read Andrew's thread, and tried to adapt it... would this code be correct in this case?

cat populations.sumstats.tsv | grep -v "^#" | cut -f 1 | uniq -c | sort | awk '$1 >= 1 && $1 <= 3' | sort -n > whitelist.tsv

Or do I also need to include the 'sed' command which Andrew used too?

Many thanks,
Rosie

Catchen, Julian

unread,
Feb 2, 2024, 2:46:02 PM2/2/24
to stacks...@googlegroups.com

Hi Rosie,

 

If you have a single population, that code will work – for multiple populations it is more complex. However, it is easy enough to run populations with all samples as a single population, then generate your whitelist, then re-run populations with a different popmap containing all your populations (and using the newly generated whitelist). However, you can still have different counts of SNPs per-locus and per-population depending on how the SNPs are distributed in the wider meta-population (e.g., a locus may have 3 SNPs, but that locus could be missing from one of your subpopulations). So, it all depends on why you only want to look at loci with a specific number of SNPs, or the importance of having a specific number of SNPs.

 

julian

 

--
Stacks website: http://catchenlab.life.illinois.edu/stacks/
---
You received this message because you are subscribed to the Google Groups "Stacks" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stacks-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/stacks-users/c65a1ee6-a857-4a43-8f4a-5ec77ed5835fn%40googlegroups.com.

Reply all
Reply to author
Forward
0 new messages