QIIME not recognising primers or barcodes

Angela Zou

unread,

Aug 10, 2017, 3:00:29 PM8/10/17

to Qiime 1 Forum

Hello, it appears that split_libraries is not recognising the barcodes or the primers that I have listed in the mapping file, however using grep I can see that the primers and barcodes are listed are in the fna file. This is the output from my split_libraries command:

split_libraries.py -f G3BCOWM02.fna -q G3BCOWM02.qual -o split_libraries/G3BCOWM02 -l 300 -L 600 -m map_file2.txt -b 10 -z truncate_only -t -s 20

split_library_log.txt output:

Number raw input seqs 166533

Length outside bounds of 300 and 600 16965

Num ambiguous bases exceeds limit of 6 52

Missing Qual Score 0

Mean qual score below minimum of 20 35

Max homopolymer run exceeds limit of 6 1129

Num mismatches in primer exceeds limit of 0: 148324

Number of sequences with identifiable barcode but without identifiable reverse primer: 20

-z truncate_only option enabled; sequences without a discernible reverse primer as well as sequences with a valid barcode not found in the mapping file may still be written.

Sequence length details for all sequences passing quality filters:

Raw len min/max/avg 387.0/622.0/525.4

Wrote len min/max/avg 357.0/592.0/490.5

Barcodes corrected/not 0/0

Uncorrected barcodes will not be written to the output fasta file.

Corrected barcodes will be written with the appropriate barcode category.

Corrected but unassigned sequences will not be written unless --retain_unassigned_reads is enabled.

Total valid barcodes that are not in mapping file 0

Sequences associated with valid barcodes that are not in the mapping file will not be written.

Barcodes in mapping file

Num Samples 10

Sample ct min/max/mean: 1 / 6 / 2.80

Sample Sequence Count Barcode

10B.1 6 TAGTATCAGC

6A.2 4 CGTGTCTCTA

5A.2 4 ATATCGCGAG

3A.2 4 AGCACTGTAG

8A.1 3 ACGAGTGCGT

7A.2 2 CTCGCGTGTC

4A.2 2 ATCAGACACG

9B.2 1 TCTCTATGCG

2A.2 1 AGACGCACTC

1A.2 1 ACGCTCGACA

Total number seqs written 28

Aside from the num mismatches being extremely high, there are also very few sequences matching to barcodes.

Here is when I used grep to find matching primers/barcodes in G3BCOWM02.fna:

grep -c AGAGTTTGAT[A-Z][A-Z]TGGGCTCAG G3BCOWM02.fna (my primer sequence is AGAGTTTGATYMTGGCTCAG)

This returns:

156465

Or grep -c ACGAGTGCGT -m5 G3BCOWM02.fna

16416

I also used split_libraries on 3 other files with the same barcodes and primers and they all worked fine, so I am not sure why it is not working for this particular file. I have attached the first 500 lines of my fna file as well as my mapping file so you can also take a look. Thank you for your help.

test.fna

map_file2.txt

Jose Antonio Navas Molina

unread,

Aug 11, 2017, 10:12:02 AM8/11/17

to Qiime 1 Forum

Hi Angela,

I forwarded your question to another dev. We will back to you ASAP.

Thanks,

TonyWalters

unread,

Aug 11, 2017, 10:48:08 AM8/11/17

to Qiime 1 Forum

Hello Angela,

I looked a few of your sequences and matched them up to the primer (with a - to fix an indel):

>G3BCOWM02EB08T length=112 xy=1660_0699 region=2 run=R_2011_05_27_09_01_34_

CGAGAGATACAGAGTTTGATCATGGGCTCAGGATGAACGCTGGCGGCGTGCCTAATACATGCAAGTCGAGCGAAGCGCTTGCCTCTGATCCTTCGGGTGAAGAGGCTTGTGA

AGAGTTTGATYMTGG-CTCAG

>G3BCOWM02EQVTQ length=112 xy=1829_1292 region=2 run=R_2011_05_27_09_01_34_

AGCACTGTAGAGAGTTTGATTATGGGCTCAGGATGAACGCTGGCGGCGTGCTTAACACATGCAAGTCGAGCGAAGCGATTCGGATGAAGTTTTCGGATGGATTTTGGATTGA

AGAGTTTGATYMTGG-CTCAG

>G3BCOWM02DWXE6 length=114 xy=1488_0416 region=2 run=R_2011_05_27_09_01_34_

TAGTATCAGCAGAGTTTGATTCTGGGCTCAGGATGAACGCTGGCGGCGTGCTTAACACATGCAAGTCGAGCGAAGCACTTGGGATTGATTTCTTCGGAATGAAATCTCATTTGA

AGAGTTTGATYMTGG-CTCAG

I don't know if this is a true indel, or if perhaps the primers should be listed as AGAGTTTGATYMTGGGCTCAG in your mapping file instead? Indels will create a lot of mismatches, since it doesn't do an alignment, it just counts nucleotides that don't match up.

If there are some reads with and some reads without indels, you could do the pooled primer approach, e.g. put:

AGAGTTTGATYMTGGGCTCAG,AGAGTTTGATYMTGGCTCAG

for each of your LinkerPrimerSequence values (don't leave any spaces around the comma).

I hope this helps,

Tony

Angela Zou

unread,

Aug 11, 2017, 1:29:39 PM8/11/17

to Qiime 1 Forum

Hi Tony,

It worked, thank you!

Reply all

Reply to author

Forward