Problem with output files

JB

unread,

Mar 28, 2016, 3:56:36 PM3/28/16

to AftrRAD

Hi Mike,

I've run through the AftrRAD pipeline with no obvious problems being reported. However, when I look at the output files at the end of FilterSNPs it appears there was an issue. I am attaching the "Master Report" and a copy of the "SNPmatrix" output (you can see that it looks like it started to work and then stopped). The monomorphic file is empty. Do you have any thoughts about what might be going wrong? I don't know if it makes a difference but I did run Genotype.pl twice, assuming that the second time would just overwrite the first, but let me know if this could be a problem (also one sample was removed the second time). One other thought is that there are a ton of loci (in the hundreds of thousands) probably because the dataset includes one outgroup sample--I don't know if this would cause an issue? Thanks, I appreciate any advice you might have!

Jeremy

MasterReport.txt

SNPMatrix_100.76.txt

Mike Sovic

unread,

Mar 29, 2016, 1:15:52 PM3/29/16

to AftrRAD

Hi Jeremy,

You're right that there's no problem re-running Genotypes.pl, and that the second run should just overwrite the first. The number of loci also should be fine. I think the issue is likely with your read depths and the default parameters (assuming you used the defaults) for how AftrRAD calls genotypes and creates these output files. Specifically, your master report says the average read depths are around 6-7. The default in Genotypes.pl is to only call genotypes if the sum of the read counts at the locus being considered is 10 (MinReads argument). On the surface, this doesn't seen too bad, as say you had two alleles in a heterozygous locus with reads counts 9 and 4. You'd have 13 total reads, so the heterozygous genotype would be called. However, you are also only retaining loci scored in 100% of your 94 samples (this is apparent from the name of your SNPMatrix file). Therefore, if one individual at the locus had read counts of 5 and 2, this sample will not be genotyped and instead treated as missing data, and you in turn throw the locus out because it is not scored in 100% of your samples. So, if I'm right that this is the issue, here are possible ways forward…

1.) Reduce the MinReads argument to something lower than 10 (note this applies to the Genotypes.pl script - not the AftrRAD.pl script).

2.) Allow some level of missing data by setting the pctScored argument to something less than 100.

3.) A combination of 1 and 2.

4.) Resequence for additional coverage.

See if any of these work for you (at least 1-3, I know 4 probably isn't an attractive option), and if not, let me know.

Mike

JB

unread,

Mar 29, 2016, 4:50:30 PM3/29/16

to AftrRAD

Thanks, Mike. I did rerun it changing to pctScored argument to 80 and this helped. I have minReads set to 15 as I am worried about having coverage much below this for subsequent population genetic analyses (let me know if you think going lower is not a problem though). I will continue to toy with these parameters. I am also going to start completely over and see if removing an outgroup sample helps--I am not sure what the effect of having one highly divergent sample in the dataset does for the pipeline. Let me know if you have any thoughts on that...I appreciate your help!

Jeremy

Reply all

Reply to author

Forward