QIIME not recognising primers or barcodes

28 views
Skip to first unread message

Angela Zou

unread,
Aug 10, 2017, 3:00:29 PM8/10/17
to Qiime 1 Forum
Hello, it appears that split_libraries is not recognising the barcodes or the primers that I have listed in the mapping file, however using grep I can see that the primers and barcodes are listed are in the fna file. This is the output from my split_libraries command:

split_libraries.py -f G3BCOWM02.fna -q G3BCOWM02.qual -o split_libraries/G3BCOWM02 -l 300 -L 600 -m map_file2.txt -b 10 -z truncate_only -t -s 20


split_library_log.txt output: 

Number raw input seqs 166533

Length outside bounds of 300 and 600 16965
Num ambiguous bases exceeds limit of 6 52
Missing Qual Score 0
Mean qual score below minimum of 20 35
Max homopolymer run exceeds limit of 6 1129
Num mismatches in primer exceeds limit of 0: 148324

Number of sequences with identifiable barcode but without identifiable reverse primer: 20

-z truncate_only option enabled; sequences without a discernible reverse primer as well as sequences with a valid barcode not found in the mapping file may still be written.

Sequence length details for all sequences passing quality filters:
Raw len min/max/avg 387.0/622.0/525.4
Wrote len min/max/avg 357.0/592.0/490.5

Barcodes corrected/not 0/0
Uncorrected barcodes will not be written to the output fasta file.
Corrected barcodes will be written with the appropriate barcode category.
Corrected but unassigned sequences will not be written unless --retain_unassigned_reads is enabled.

Total valid barcodes that are not in mapping file 0
Sequences associated with valid barcodes that are not in the mapping file will not be written.

Barcodes in mapping file
Num Samples 10
Sample ct min/max/mean: 1 / 6 / 2.80
Sample Sequence Count Barcode
10B.1 6 TAGTATCAGC
6A.2 4 CGTGTCTCTA
5A.2 4 ATATCGCGAG
3A.2 4 AGCACTGTAG
8A.1 3 ACGAGTGCGT
7A.2 2 CTCGCGTGTC
4A.2 2 ATCAGACACG
9B.2 1 TCTCTATGCG
2A.2 1 AGACGCACTC
1A.2 1 ACGCTCGACA

Total number seqs written 28

Aside from the num mismatches being extremely high, there are also very few sequences matching to barcodes. 

Here is when I used grep to find matching primers/barcodes in G3BCOWM02.fna:

grep -c AGAGTTTGAT[A-Z][A-Z]TGGGCTCAG G3BCOWM02.fna (my primer sequence is AGAGTTTGATYMTGGCTCAG) 

This returns:
156465

Or grep -c ACGAGTGCGT -m5 G3BCOWM02.fna
16416


I also used split_libraries on 3 other files with the same barcodes and primers and they all worked fine, so I am not sure why it is not working for this particular file. I have attached the first 500 lines of my fna file as well as my mapping file so you can also take a look. Thank you for your help. 
test.fna
map_file2.txt

Jose Antonio Navas Molina

unread,
Aug 11, 2017, 10:12:02 AM8/11/17
to Qiime 1 Forum
Hi Angela,

I forwarded your question to another dev. We will back to you ASAP.

Thanks,

TonyWalters

unread,
Aug 11, 2017, 10:48:08 AM8/11/17
to Qiime 1 Forum
Hello Angela,

I looked a few of your sequences and matched them up to the primer (with a - to fix an indel):

>G3BCOWM02EB08T length=112 xy=1660_0699 region=2 run=R_2011_05_27_09_01_34_
CGAGAGATACAGAGTTTGATCATGGGCTCAGGATGAACGCTGGCGGCGTGCCTAATACATGCAAGTCGAGCGAAGCGCTTGCCTCTGATCCTTCGGGTGAAGAGGCTTGTGA
          AGAGTTTGATYMTGG-CTCAG
>G3BCOWM02EQVTQ length=112 xy=1829_1292 region=2 run=R_2011_05_27_09_01_34_
AGCACTGTAGAGAGTTTGATTATGGGCTCAGGATGAACGCTGGCGGCGTGCTTAACACATGCAAGTCGAGCGAAGCGATTCGGATGAAGTTTTCGGATGGATTTTGGATTGA
          AGAGTTTGATYMTGG-CTCAG
>G3BCOWM02DWXE6 length=114 xy=1488_0416 region=2 run=R_2011_05_27_09_01_34_
TAGTATCAGCAGAGTTTGATTCTGGGCTCAGGATGAACGCTGGCGGCGTGCTTAACACATGCAAGTCGAGCGAAGCACTTGGGATTGATTTCTTCGGAATGAAATCTCATTTGA
          AGAGTTTGATYMTGG-CTCAG

I don't know if this is a true indel, or if perhaps the primers should be listed as AGAGTTTGATYMTGGGCTCAG in your mapping file instead? Indels will create a lot of mismatches, since it doesn't do an alignment, it just counts nucleotides that don't match up.

If there are some reads with and some reads without indels, you could do the pooled primer approach, e.g. put:
AGAGTTTGATYMTGGGCTCAG,AGAGTTTGATYMTGGCTCAG
for each of your LinkerPrimerSequence values (don't leave any spaces around the comma).

I hope this helps,
Tony

Angela Zou

unread,
Aug 11, 2017, 1:29:39 PM8/11/17
to Qiime 1 Forum
Hi Tony,

It worked, thank you! 
Reply all
Reply to author
Forward
0 new messages