Dear group members,
I am new to bioinformatics and would greatly appreciate your guidance on SNP mining and analyzing my dataset using STACKS.
Research Objective: To study population connectivity in butterflies.
Dataset: I have ddRAD PE data from 640 individuals, with 100 bp reads.
Current Progress and Results:
USTACKS: Average depth of coverage = 142x (range: 5.75x to 428x)
CSTACKS: Final catalog contains 103,954 loci.
GSTACKS:
Populations Program:
I’m concerned about the drastic reduction in loci when applying the r = 0.75 filter and would like to understand if I might be doing something wrong.
Additional context: The effective coverage of my data seems good overall, but I suspect uneven coverage or missing data might be an issue. How this thing can be fixed/improved
Any advice on troubleshooting or optimizing my pipeline would be greatly appreciated!
Thank you in advance for your time and expertise.
Praveen
Hi Praveen,
Did you optimize your assembly parameters prior to running ustacks on all your samples? How is your population map set up – how many populations are you defining with how many samples (-r is dependent on the population map you use)?
You will likely want to remove the very low coverage samples from your analysis, they are certainly affecting the filtering, since they will be missing almost all the data.
Best,
Julian