It appears that I was incorrect about trimmomatic as fastp is now used (I am currently reworking data on EARTH).
I have noticed a lot of adapters, but not all, are not found by fastp across all libraries, some in low quality individuals while others in
ones that seem fine.
At least in one PE library, Trueseq adapter Idx 2 is dominant in the adapter not found logs.
I need to investigate further as to what indices and libraries are most affected.
A typical not found log reads as such:
```WARNING: cut_by_quality5 is deprecated, please use cut_front instead.
Detecting adapter sequence for read1...
No adapter detected for read1
Read1 before filtering:
total reads: 559526
total bases: 81131270
Q20 bases: 78550986(96.8196%)
Q30 bases: 74541824(91.878%)
Read1 after filtering:
total reads: 558997
total bases: 81054512
Q20 bases: 78511340(96.8624%)
Q30 bases: 74516448(91.9337%)
Filtering result:
reads passed filter: 558997
reads failed due to low quality: 428
reads failed due to too many N: 101
reads failed due to too short: 0
reads with adapter trimmed: 0
bases trimmed due to adapters: 0
Duplication rate (may be overestimated since this is SE data): 67.1151%
JSON report: LM_LM092-L5-P2-A2.json
HTML report: LM_LM092-L5-P2-A2.html
fastp -i LM_LM092-L5-P2-A2.F.fq.gz -o LM_LM092-L5-P2-A2.R1.fq.gz --cut_by_quality5 20 --cut_by_quality3 20 --cut_window_size 5 --cut_mean_quality 15 -q 15 -u 50 -j LM_LM092-L5-P2-A2.json -h LM_LM092-L5-P2-A2.html
fastp v0.19.7, time used: 20 seconds```
I was not familiar with this program until today to be honest, as my preliminary analyses were with trimmomatic.
The reason I am investigating this is both Dave Portnoy (my committee member) and myself had noticed a quite high number of snps per locus that did not make sense for SE data. The original perception was this was due to an overabundance of singletons but I am not so sure.
If anything jumps out at you or if you have an idea as to what might be a good thing to look at, could you let me know?
I realize there could be multiple things going on here, but I am not a bioinformatics guru by any means.
For example I am not sure if " cut_by_quality5 is deprecated, please use cut_front" could a major culprit or not.
I will continue trying to find patterns via libraries /indices.
Thanks again,
-Pearce