Hi!
I am running Stacks v2.58 in reference-based mode on ddGBS data. I am concerned that suboptimal DNA quality in some of my samples would result in a low amount of shared loci between samples. Unfortunately, I cannot get fresh individuals or reextract the DNA, so I did a pilot on a subset of samples (n = 12). Everything looks fine so far (number of reads, % reads aligned, coverage). I would like to check the amount of missing loci per sample after running 'populations', but I cannot find it manually and stacks-dist-extract does not find it either:
> stacks-dist-extract populations.log.distribs loci_per_sample
Error: Couldn't find section 'loci_per_sample' in 'populations.log.distribs'.
It does find the other tables (e.g. stacks-dist-extract populations.log.distribs snps_per_loc_postfilters)
I guess calculating the percentage of loci found per sample/total (table effective_coverages_per_sample from gstacks) and checking how many loci are shared by e.g. >80% of the samples (table samples_per_loc_postfilters from populations) give similar information, but I thought it was another interesting way to look at the data, even though I am not sure what would be a hard threshold in terms of missing loci.
Do I have to add a flag to 'populations' to get this table?
Alternatively, is there anything else I could look at to see if my samples are good enough before sequencing more individuals?
Thank you!
Best,
Nadege