split_libraries.py -f G3BCOWM02.fna -q G3BCOWM02.qual -o split_libraries/G3BCOWM02 -l 300 -L 600 -m map_file2.txt -b 10 -z truncate_only -t -s 20
Number raw input seqs 166533
Length outside bounds of 300 and 600 16965
Num ambiguous bases exceeds limit of 6 52
Missing Qual Score 0
Mean qual score below minimum of 20 35
Max homopolymer run exceeds limit of 6 1129
Num mismatches in primer exceeds limit of 0: 148324
Number of sequences with identifiable barcode but without identifiable reverse primer: 20
-z truncate_only option enabled; sequences without a discernible reverse primer as well as sequences with a valid barcode not found in the mapping file may still be written.
Sequence length details for all sequences passing quality filters:
Raw len min/max/avg 387.0/622.0/525.4
Wrote len min/max/avg 357.0/592.0/490.5
Barcodes corrected/not 0/0
Uncorrected barcodes will not be written to the output fasta file.
Corrected barcodes will be written with the appropriate barcode category.
Corrected but unassigned sequences will not be written unless --retain_unassigned_reads is enabled.
Total valid barcodes that are not in mapping file 0
Sequences associated with valid barcodes that are not in the mapping file will not be written.
Barcodes in mapping file
Num Samples 10
Sample ct min/max/mean: 1 / 6 / 2.80
Sample Sequence Count Barcode
10B.1 6 TAGTATCAGC
6A.2 4 CGTGTCTCTA
5A.2 4 ATATCGCGAG
3A.2 4 AGCACTGTAG
8A.1 3 ACGAGTGCGT
7A.2 2 CTCGCGTGTC
4A.2 2 ATCAGACACG
9B.2 1 TCTCTATGCG
2A.2 1 AGACGCACTC
1A.2 1 ACGCTCGACA
Total number seqs written 28
Aside from the num mismatches being extremely high, there are also very few sequences matching to barcodes.
I also used split_libraries on 3 other files with the same barcodes and primers and they all worked fine, so I am not sure why it is not working for this particular file. I have attached the first 500 lines of my fna file as well as my mapping file so you can also take a look. Thank you for your help.