split_libraries problem with MiSeq, mismatches in primer (macqiime 1.9.1)

140 views
Skip to first unread message

Sam

unread,
Apr 19, 2016, 8:41:00 AM4/19/16
to Qiime 1 Forum
Hi all,

Split_libraries keeps writing empty .fna files and I am struggling to sort the problem. After running:

split_libraries.py -m Payler_3819B_map_corrected.txt -o split_libraries_Payler3819B_s15 -f Payler_3819B.fna -q Payler_3819B.qual -b 8

in the output split_library_log.txt it shows:

Length outside bounds of 200 and 1000 3
Num ambiguous bases exceeds limit of 6 0
Missing Qual Score 0
Mean qual score below minimum of 25 0
Max homopolymer run exceeds limit of 6 1405
Num mismatches in primer exceeds limit of 0: 272777


My mapping file looks like this:


#SampleID BarcodeSequence LinkerPrimerSequence Description

101-P-MS515F AAAACAAA GTGCCAGCMGCCGCGGTAA 101-P-MS515F

29-MS515F AAAACAAC GTGCCAGCMGCCGCGGTAA 29-MS515F

44-MS515F AAAACAAG GTGCCAGCMGCCGCGGTAA 44-MS515F

MER-MS515F AAAACAAT GTGCCAGCMGCCGCGGTAA MER-MS515F

WER-MS515F AAAACACA GTGCCAGCMGCCGCGGTAA WER-MS515F


Here are some random excerpts from my .fna file before split libraries


>101-P-MS515F::M02696:21:000000000-AFKTF:1:2114:28714:18709

AAAACAAATACGTAGGTGGCAAGCGTTGTCCGGAATTATTGGGCGTAAAGCGCGAGCAGGCGGTCTCATAAGTCTGATGTGAAAGCCCACGGCTCAACCGAGGAAGGTCATTGGAAACTGGGGGACTTGAGTGCAGAAGAGGAAAGTGGAATTCCGCGTGTAGCGGTGAAATGCGTAGAGATGTGGAGGAACACCAGTGGCGAAGGCGACTTTCTGGTCTGTAACTGACGCTGAGGCGCGAAAGCGTGGGGAGCAAACAGG

>29-MS515F::M02696:21:000000000-AFKTF:1:2107:16203:25258

AAAACAACTACGTAGGTGGCAAGCGTTATCCGGAATTATTGGGCGTAAAGCGCGCGCAGGCGGTTTCTTAAGTCTGATGTGAAAGCCCACGGCTCAACCGTGGAGGGTCATTGGAAACTGGGGAACTTGAGTGCAGAAGAGAAAAGCGGAATTCCACGTGTAGCGGTGAAATGCGTAGAGATGTGGAGGAACACCAGTGGCGAAGGCGGCTTTTTGGTCTGTAACTGACGCTGAGGCGCGAAAGCGTGGGGAGCAAACAGG

>WER-MS515F::M02696:21:000000000-AFKTF:1:2102:20315:17355

AAAACACATACGGAGGGGGCGAGCGTTGTCCGAGGTTACTGGGCGTAAAGGGCGCGTAGACGGGGTGGCAAGTCCGCTGTGAAAGCCCGGCGCTTAACGCCGGAGGGGCGGTGGATACTGCCAGTCTTGAAGGTGCTAGGGACAGATGGAATTACCAGTGTAGCGGTGAAATGCGTAGATATTGGTAGGAACACCAGTGGCGAAGGCGGTCTGTTGGAGCACTCATGACGCTGAGGCGCGAAAGCTGGGGGAGCGAACGGG


And here are some random excepts from my .qual file before running split libraries


>101-P-MS515F::M02696:21:000000000-AFKTF:1:2114:28714:18709

40 40 40 40 40 40 40 40 37 37 33 33 36 36 38 38 38 33 32 33 37 37 39 37 39 39 39 39 39 39 39 38 34 36 39 38 36 35 35 37 38 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 34 34 40 34 40 40 40 40 30 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 29 40 40 40 18 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 28 14 40 40 40 40 40 40 40 40 40 37 40 40 40 40 40 40 40 40 31 40 31 40 31 40 40 40 40 40 40 40 27 27 40 27 27 28 40 40 40 40 40 40 40 40 40 40 40 40 40 40 34 28 26 40 15 40 40 40 40 40 25 40 40 40 40 40 27 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 25 31 40 28 30 40 40 40 40 40 40 27 40 14 40 40 37 40 40 40 40 40 40 38 40 40 40 38 40 20 40 38 37 40 40 40 40 40 20 37 15 38 38 38 39 37 37 17

>WER-MS515F::M02696:21:000000000-AFKTF:1:2107:20652:25386

40 40 40 40 40 40 40 40 37 32 35 33 32 37 36 37 38 37 20 33 37 33 38 39 39 39 39 39 39 37 37 38 17 32 36 35 38 18 37 39 39 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 38 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 37 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 34 40 38 40 40 40 40 40 38 40 40 40 34 40 40 38 40 40 38 35 32 40 38 40 39 40 40 40 38 40 40 40 40 36 39 39 39 38 37 37 40 40 40 40 40 39 37 38 38 38 38 37 37 37 38

>44-MS515F::M02696:21:000000000-AFKTF:1:1106:23550:16882

40 40 40 40 40 40 40 40 37 37 33 33 33 38 38 37 38 37 33 37 38 38 38 38 38 38 38 38 38 39 32 36 37 39 38 38 38 39 39 39 39 39 38 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 39 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 39 40 40 40 39 40 40 40 40 40 40 39 40 40 17 40 39 40 38 40 38 38 40 38 40 40 40 40 39 40 40 37 39 40 39 40 40 40 39 40 39 40 40 38 40 38 39 39 39 39 39 39 39 39 39 39 39


Does anyone have a solution for this problem? Any help will be much appreciated.


Cheers,


Sam

TonyWalters

unread,
Apr 19, 2016, 10:01:19 AM4/19/16
to Qiime 1 Forum
Hello Sam,

First, may I ask why you are using split_libraries.py instead of split_libraries_fastq.py (the latter is designed for Illumina data, the former for 454 data)?

Second, I am not seeing a sequence that looks like your primer in the read. Immediately downstream of your barcode are sequences that look roughly like TACGTAGGTGGCAAGCG. split_libraries.py looks for the primer sequence right after the barcode sequence, so it's not finding anything that looks like the primer sequenes in this case.

It's possible, depending upon the protocol (e.g. if it was like Caporaso et al 2010), the primer would not be included in the read. If this is the case, and the sequencing company appended the barcode at the beginning of the reads, I would go back to the fastq files and follow this process instead:
1. Run extract_barcodes.py (http://qiime.org/scripts/extract_barcodes.html) with a barcode length 8 to get the separate barcodes and amplicon read files.
2. Run these through split_libraries_fastq.py (http://qiime.org/scripts/split_libraries_fastq.html) as the -i and -b parameters.

Hope this helps,
Tony

Sam

unread,
Apr 19, 2016, 11:08:37 AM4/19/16
to Qiime 1 Forum
Hi Tony,

Thank you very much for the reply.

So the sequencing centre (Research and Testing, methods attached, I am working from the "Fasta read for qiime" stage you will see on the diagram) has only supplied me with .fna files, not fastq, which is why I was trying just the split command, since I didn't have a separate barcode file (I am a bit new to this).

I have tried this:

extract_barcodes.py -f Payler_3819B.fna -l 8

but it returns an error when used on my fna. file:

AssertionError: Non-header line passed as input. Header must start with '@'.


Will this work on this file? 


I did manage to run this through the demultiplex_fasta.py script, which allowed me to pick otu's which couldn't before it. However, I realise split libraries does some additional quality filtering thought it might be a good idea to run it though that instead.


Any ideas?


Thanks again,


Sam


Data_Analysis_Methodology.pdf

TonyWalters

unread,
Apr 19, 2016, 12:44:57 PM4/19/16
to Qiime 1 Forum
Hello Sam,

How about we try this approach-convert the fasta/qual files to fastq via the convert_fastaqual_fastq.py script (http://qiime.org/scripts/convert_fastaqual_fastq.html), and then try the extract_barcodes.py (it will only take fastq files), and the split_libraries_fastq.py to demultiplex.

You probably don't want to try clustering directly on the current data-it still looks like it has the barcode at the beginning of the reads, and you don't want these in the clustered data.

-Tony

Sam

unread,
Apr 19, 2016, 3:50:38 PM4/19/16
to Qiime 1 Forum
Ah brilliant! Once I got that bit sorted, everything else worked including split_libraries_fastq.py

Thanks very much Tony, really appreciate the help

Sam
Reply all
Reply to author
Forward
0 new messages