Hi Silvia,
If the --rm-pcr-duplicates flag was not specified to the gstacks program (or to
denovo_map.pl), then no PCR duplicates would be removed. We are not connected to the Galaxy project, so I don’t know how they set this up.
If the flag had been specified, you would have seen something like this in the gstacks.log file:
Built 114736 loci comprising 100356561 forward reads and 92065401 matching paired-end reads; mean insert length was 340.0 (sd: 98.0).
Removed 8291160 unpaired (forward) reads (8.3%); kept 92065401 read pairs in 110571 loci.
Removed 51232513 read pairs whose insert length had already been seen in the same sample as putative PCR duplicates (55.6%); kept 40832888 read
pairs.
You might want to check with the folks running Galaxy EU or run the program directly yourself.
You appear to have high variability in coverage, so you may want to remove any samples <5x or so from the analysis (though it seems you are above that threshold). You could also consider downsampling some of the individuals that have very high coverage.
Regardless, I would proceed to the populations stage to see how many SNPs you have that are shared across your samples with a basic set of filters. Assessing your ‘missing data’ may be more useful than trying to assess coverage (beyond removing samples that
obviously failed in the molecular library construction).
Best,
Julian