Hello,
I'm trying to demultiplex some Miseq PE runs that were generated using custom barcodes similar to the Earth Microbiome Project's 16S protocol. I have already read through Tony Walters workaround on this topic since Qiime does not support demultiplexing of dual-barcodes. But I'm a bit confused about a few things in his post.
A.
Tony said:
"""
After step 3, you would want to filter the combined barcodes. If the labels haven't been changed by the stitching process, you should be able to use these commands:
egrep '^@' X| tr -d '@' > seqs_to_keep.txt
filter_fasta.py -i Y -o stitched_barcodes.fastq -s seqs_to_keep.txt
where X is the stitched fastq file, and Y is the combined barcodes fastq file.
"""
I don't understand why this step is needed. When I run these commands, the output file named "stitched-barcodes.fastq" is the same as the input file named "combined-barcodes.fastq"
B.
Most important question - how is QIIME using the combined barcodes fastq to demultiplex the samples? We are using barcodes that are 15bp in length and after combining the I1 and I2 fastqs, the length is 30bp. Is QIIME slicing the barcodes based on user-specified length and then searching for them in R1 and R2 fastqs separately? I'm trying to understand why would one combine I1 and I2 into a single sequence?
Thanks so much!
Taruna
Most important question - how is QIIME using the combined barcodes fastq to demultiplex the samples?
4. Alter your mapping file to have barcodes that are a combined version of the first reads (created in step 1) and the second reads (step 2), so if your first read barcode was ATCCG and the second read barcode was CCGAAT, then your BarcodeSequence in the mapping file would be ATCCGCCGAAT. You want to make sure all of your final barcodes are unique and the mapping file doesn't have any errors when you run check_id_map.py on it.
Is QIIME slicing the barcodes based on user-specified length and then searching for them in R1 and R2 fastqs separately?
skbio.parse.sequences._exception.FastqParseError: Failed qual conversion for seq id: M02034:378:000000000-B8MRY:1:1101:17502:1596. This may be because you passed an incorrect value for phred_offset.
split_libraries_fastq.py -i $COMBINED_FASTQS/fastqjoin.join.fastq -b $COMBINED_FASTQS/fastqjoin.join_barcodes.fastq -m $MAP/16S-bac-QIIME-mapping-MEmicrobiome-combined-RC.txt -o $SPLIT --rev_comp_mapping_barcodes -q 19 -r 5 -p 0.70 --barcode_type 30 --phred_offset 33
python/2.7.12/qiime-env/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2889: RuntimeWarning: Mean of empty slice.
out=out, **kwargs)
Input file paths
Mapping filepath: /rhome/taruna/shared/taruna/memb/qiime-files/mapping/16S-bac-QIIME-mapping-MEmicrobiome-combined-RC.txt (md
5: bde95139910d0d32f9e3f3c3f2a4f3a3)
Sequence read filepath: /rhome/taruna/shared/taruna/memb/data-raw/uc-davis/16S/combined-fastqs/fastqjoin.join.fastq (md5: ddd
4f2871939baa9243f0a6837a47ef3)
Barcode read filepath: /rhome/taruna/shared/taruna/memb/data-raw/uc-davis/16S/combined-fastqs/fastqjoin.join_barcodes.fastq (
md5: 93428fb08588171c0f4bf2871d11d8bf)
@M02034:378:000000000-B8MRY:1:1101:17502:1596TCTTACCTATTAGTGGTTGAACAGCATTTGACTCAGATAGTAATCCACGCTCTTTTAAAATGTCAACAAGAGAATCTCTACCATGAACAAAATGTGACTCATATCTAAACCAGTGCTTGACGAACGTGCCAAGCATATTAAGCCACTTCTCCTCATCTAACGCGTCAGTTTTTGACAGAATCGTTAGTTGATGGCGAAAGGTCGCAAAGTAAGAGCTTCTCGAGCTGCGCAAGGATAGGTCGAATTTTCTCATTTTCCGCCAGCAGTCCACTTCGATTTAATTCGTAAACAAGCAGTAGTAATTCCTGCTTTATCAAGATAATTTTTCGACTCATCAGAGATATCCGAAAGTGTTAACTTC+A@-A<C@E-CA<EEG<@D:,,;<C,C@,,6,,<66,,CC--;6C<@,6,@F:CEFE,,,,<;,,,,;C,,60-;,CFF66CE8F?.,CFE,@CCF<.<C<1C7<@E<9<C,1/<,BE@C7<:7/@0B=,=@,BC:EBFEAB@705:?BEC?C?DE<?,;:??=CGGFFFGFGFFFFEFGFGEFGAGGGGGCEDGGFFCGGGEFFGGCFDDCFDGGFGFFFGGGFGGGFGFFGGGF>C?>FGGGGGGFFGFGGGEFFEGFEDEGGFGFF9GECFCEFAFCDGGGGGGEE4ECC,<FCFFGFFEF>@E8FC<9<C,,C<CC,@C,+;,,6,FFC?C-,FCC,ECC7@@EC,CF9E<FGCB86-@M02034:378:000000000-B8MRY:1:1101:10612:1603TAGTTATATGGCTGTTGGTTTCTATGTGTCTAAATATGCTAACAAATAGCCAGATTTGGTACTTGCTGCTAAAGGCTTAGGAGCCAAAGAATGGAACAACTCACTAAAAATCAAGCTGTCGCTACTTCCCAAGAAGCTGTTCAGAATCAGAATGAGCCGCAACTTCGGGATGAAAATGCTCACAATGACAAATCTGTCCACGGAGTGCTTAATCCAACTTACCAAGCTGGGTTACGACGCGACGCCGTTCAACCAGATATTGAAGCAGAACGCAAAAAGAGAGATGAGATTGAGGCTGGGAAAAGTTACTGTAGCCGACGTTTTGGCGGCGCAACCTGTGACGACAAATCTG+--8ABAE<EAFGF@A<@B?<FC<;C<C,,CC,,;C,,,,6-,;6,,,,,,,6,,C,<,,,,;,C06CC<E6/.;6,,;,52-C,,6,CCD-C296.C56,,9,9C<22:2,,1..5C8EBFCE:=FF,5;?<>.4C=A,==FFCF,:BACF@AEEGDGCFDFDEBBF,FFFGFG,FFGFGF@E;CFFCFGFFFFFGGGGGGGFEFDFFGGGGGGFGDCDCFGGFFGEGDGDGGGGEGGGGFFCGEGFGGGFD=FFF?FGFFGEFD?GGFCGFGGGGGGFFGCFGCFGGFGGGFF8FFEFCDFFE<,,CC,C7F@+6:CB@<?F=4F@:CCFEAE<C@@BAB@,CECCA-AAA@M02034:378:000000000-B8MRY:1:1101:15935:1610TGCCAGTTAAATAGCTTGCAAAATACGTGGCCTTATGGCTACAGTATGCCCATCGCAGTTCGCTACACGCAGGACGCTTTTTCACGTTCTGGTTGGTTGTGGCCTGTTGATGCTAAAGGTGAGCCGCTTAAAGCTACCAGTTATATGGCTGTTGGTTTCTATGTGGCTAAATATGTTAACAAAAAGTCAGATATGGACCTTGCTGCTAAAGGTCTAGGAGCTAAAGAATGGAACAACTCACTAAAAACCAAGCTGTCGCTACTTCCCAAGAAGCTGTTCAGAATCAGAATGAGCCGCAACTTCGGGATGAAAATGCTCACAATGACAAATCTGTCCACGGAGTGCTTAATCCAACTTACCAAGC+-8BCCCCFFA<-@C9,@E,A<<EF<F<F,:@,C@<CF<,;E<,FC,CE<A6@C688,,8@68CF,8,C:@@::FFC+B8CEEECFEE,CC;6:EB7FF-CF/4CFBFED9FCFEC<9CBCAFF@?8>FFFE?E?F<<B=BC<FE=E5@AFCFCDFFGGBD?D=EDFFGGGFFECFFFFFDDEB8CEFFEFGGGEGFGGGFFGGFE9BFGGGGGGGEFFFEGGGFCGFGGED?FGGGGGGGFF?GGFDGFAGGGGGGGGGGGEEFFGGFFFECCGDGFFGFGGGGGGFGFCEFEGGFCFFGGGGGGFGGGGGGGFFE@CEFFF,FF@CE<EDAC8:D?BFE@@F9CEGDFFEB<FFFCGGCBCA6@M02034:378:000000000-B8MRY:1:1101:13285:1617TATGAGGGACATAAAAAGTAAAAATGTTTACAGTAGAGTCAATAGCAAGGCCACGACGCAATGGAGAAAGACGGAGAGCGCCAACGACGTCCATCTCGAAGGAGTCGCCAGCGATAACCGGAGTAGTTGAAATGGTAATAAGACGACCAATCTGACCAGCAAGGAAGCCAAGATGGGAAAGGTCATGCGGCATACGCTCGGCGCCAGTTTGAATATTAGACATAATTTATCCTCAAGTAAGGGGCCGAAGCCCCTGCAATTAAAATTGTTGACCACCTACATACCAAAGACGAGCGCCTTTACGCTTGCCTTTAGTACCTCGCAACGGCTGCGGACGACCG+--8A-CEF;DFC9<@9@:AE,,,6F<C,CCC,CF<,,,6;;6<<,C,C,@,;,6@66,8+6BCC--,:,29,8@:@+,C7@CECF?+CE>@B:EF?E.5+8-+,8EFCF7CFC3B<;ABCECCF>;CEAFEGCF,ED9GG@FCGFFCFFGFGFFFFFFDEFCGFAGGGGGGFCGGGFGFGGGFDCCGGGGFGGGGGFGGGGGGFFFGFGGGGGGGGGFFFFF9FGGGGGGGGGFCFGGGFGCCGGGGGGGGGFF<GDFEGGGGGGFFEFGFGGGGGGFFGGGGGFDGGGGEF@EEEDAF@CGEF8DFCFC8;FFGGFDGCFB@6FGFGGGEGF@FGCCB@-@M02034:378:000000000-B8MRY:1:1101:10067:1621TACCTTTAGCGTTAAGGTACTGAATCTTTCTAGTCGCTGTAGGCGGAAAACGAACATTCGCAAGTGTAAACATAGTGCCATGCTCAGGAACAAAGAAACGCGGCACAGAATGTTTATAGGTCTGTTGAACACGACCAGAAAACTGGCCTAACGACGTTTGGTCAGTTCCATCAACATCATAGCCAGATGCCCAGAGATTAGAGCGCATGACAAGTAAAGGACGGTTGTCAGCGTCATAAGAGGTTTTACCTCCAAATGAAGAAATAACATCATGGTAACGCTGCATGAAGTAATCACGTTCTTGGTCAGTATGCAAATTAGCATAAGCAGCTTGCAGACCCATAATGTCAATG+--8ABCEFFFGBB;@;;A<BF9C6CF<,C,;,<C,++,686C<,+++7+6,@+66+,,6,,8.@,,<<,,,,<666,9C,9<<FE<CA<CC1,9E8,,,,@+:=28=8?<EAFEFDF=E9EFFEEFA,A@<ECC>@GFGFFAAD@D:DE8=C=FEFFGEBFGFGFFFFEEDEGGGGFF@DFGFGEGFFFGGGGFFEC?GGGGGEGFD<CGFGGGGFE8BDEGGGGFCCDFGGFGGGGGGGGGGGGGGGGFAF>;E;GFEFGGGFAFCGCAGGFGGGGGGDFFFGGDGFFGGGGGGEGGFCGCCEE9CFCFF<DFECC,GFEADCEGFAGDE@;<E,EFDEF9FC,CE<CCCA-@M02034:378:000000000-B8MRY:1:1101:17728:1623TCCCGCGTTGCGTCTATTATGGAAAACACCAATCTTTTCAAGCAACAGCAGGTTTCCGAGATCATGCGCCAAATGCTTCCTCAAGCTCAAACGGCTGGTCAGTATTTTACTAATGACCAAATCAAAGAAATGACTCGCAAGGTTAGTGCTGAGGTTGACTTAGTTCATCAGCAAACGCAGAATCAGCGGTATGGCTCTTCTCATATTGGCGCTACTGCAAAGGATATTTCTAATGTCGTCACTGATGCTGCTTCTGGTGTGGTTGATATTTTTCATGGTATTGATAAAGCTGTTGCCGATACTTGGAACAATTTCTGGAAAGACGGTAAAGCTGATGGTATTGGCTCTAATTTGTCTAGGAAATAACCGTCAGGG+-A-8@FGGG>@@CFGGCFA@,,,,,CC,,B,6C,C@E9F9-FC<,C,66CFGCFF<;,,++,,6@,68@66+,:6CEC,99CC<9,9CFE/=,8=F/,C?94C,<CEEF9,0,<<,A?,,9:?58,/,3?F9?EB<B@418?EF<5B,7F>9=;D<DA@BC<FDFFEDFDE>EE@FFGFGDD<@FGBEFFFFGGGGGFFFGFFF@F@GGGEBFDEFD?GFCGGGGGFGGCGFC<GGEDGGFDGGGGFGGGGFGGF=GFGGGFEGGEGCGGGGGE@6B?@<FFBFF>5GGGDFA@DGGGCFFGGFFGGGGGGGFGGFDFF<EEFCCDFC@@E<@E@<C;,B@@@DD<,A;@8<<A6,C6EBE,AF,C@FC6@BCA-@M02034:378:000000000-B8MRY:1:1101:11657:1630TCGGAAAACATTATTAATGGCGTCGAGTGTCCGGTTAAAGCCGCTGAATTGTTCGCGTTTACCTTGCGTGTACGCGCAGGAAACACTGACGTTCTTACTGACGCAGAAGAAAACGTGCGTCAAAAATTACGTGCAGAAGGAGTGATGTAATGTCTAAAGGTAAAAAACGTTCTGGCGCTCGCCCTGGTCGTCCGCAGCCGTTGCGAGGTACTAAAGGCAAGCGTAAAGGCGCTCGTCTTTGGTATGTAGGTGGTCAACAATTTTAATTGCAGGGGCTTCGGCCCCTTACTTGAGGATAAATTATGTCTAATATTCAAACTGGCGCCGAGCGTATGCCGCATGACCTTTCCCATCTTGG+-8A@-@@C<9EC<<EDFECE@@@+C::,,6<,8@EC6,,-C,6+7++;;6CEC,88+8@@,CFGF,CCF,@6CCE:++A><4C9ECF=<E@FFAFF,BE90?7+656.?8.A8E,E7B>C@<@CE9BB<FDDF@FFFF?><FCFFE=FFBFFFGGFFAGGGGGE@C:DGFFECE,E@GGGGEGGDGGGGFEGGGFEEEGEEEGFGGGGGGGGGGGGGGGGFFGGFEGGGGGFDGFGGGGGGGGGFGGGGGGGGGFGGFDF9GGGGFGGGGGFDCGGEGGGGGGFGGGGGGGGGGFGGGGGEFFGGGGFFF9E9FCGEC67EEGF:CEEC:;DC7B@CDAGGGGGFCF?DEGFCA9CCB@M02034:378:000000000-B8MRY:1:1101:15928:1636AGAGCGGTCAGTAGCAATCCAAACTCTGTCACTCGTCAGAAAATCGAAATCATCTTCGGTTAAATCCAAAACGGCAGAAGCCTGAATGAGCTTAATAGAGGCCAAAGCGGTCTGGAAACGTACGGATTGTTCAGTAACTTGACTCATGATTTCTTACCTATTAGTGGTTGAACAGCATCGGACTCAGATAGTAATCCACGCTCTTTTAAAATGTCAACAAGAGAATCTCTACCATGAACAAAATGTGACTCATATCTAAACCAGTCCTTGACGAACGTGCCAAGCATATTAAGCCACTTCTCCTCATCCAACGCGTCAGTTTTTGACAGAGTCGTTAGTTGATGGCGAAAGG+-88BCCFFF7F6,C<EFF,?,,C@F,CCC,,6C,CC,,8;C<,EC,,@,,;6CCEE<,,@@:E<E@C<8.<6@@EGE7E/EFFGG<FAEDFGFEEF9<80CB<8,BFDEEGCF><99CBDGGFF7CFECFGFFFGGFFEFFFFFGFFEAGGGEFGGGGFBEBGFFFFEFGFGGGGFGGGGGGFGGGFGGFGGGGGGEBGGGGGFGGGGGGGGGGGGFECBGGGGGDGDFGGGGGGFCGGGGGGGGGGGGGFEGGGGGFFGGGGGEEGCGGGGGGFFF=GGGGGGGGGFGFCFGFGFFFGGFDGGFC,EE@FCFEFA@C:DEBFE;-FFBB+@FAGEGGGGFFCCEFECCCCC@M02034:378:000000000-B8MRY:1:1101:17550:1639TGTCGCATTGCATTCATCAAACGCTGAATAGCAAAGCTTCTACGCGATTTCATAGTGTAGGCCTCCAGCAATCTTGAACACTCATCCTTAATACCTTTCTTTTTGGGGTAATTATACTCATCGCGAATATCCTTAAGAGGGCGTTCAGCAGCCAGCTTGCGGCAAAACTGCGTAACCGTCTTCTCGTTCTCTAAAAACCATTTTTCGTCCCCTTCGGGGCGGTGGTCTATAGTGTTATTAATATCAAGTTGGGGGAGCACATTGTAGCATTGTGCCAATTCATCCATTAACTTCTCAGTAACAGATACAAACTCATCACGAACGTCAGAAGCAGCCTTATGGCCGTCAACATACATATCACCATTATCGAACTCAACGCCCTGCG+A@8ACFGGGFDGFFFFFF<,,CB+B@+6CC,,<@6C,,,;-6CC@++@@,;@C@EE,,,,,CFGAC,CE8<FE,6C,,C,EE@@,,:,9C<E.CCFFE,C9FEF@FGDC74CF9F<EA<EEDFD7+7@7F,5EE9<?<1FFEGGFCC9?EF,6?@DFG;BFFCB>CB@DECCCEADBCCEFGCFF<GGGGFFFCAECFFGGFGGGG>>FGGFGGFGFGGGGGGDGGFFFGCGGGGGGFGFE@FDGFFGGGGCFDGGGGGGGFFDGGGGGGGGGGGGGGFGFE=ED>GGGEGGDGGFGGFFDGCGGGGGCGGGGFGGGFEFFGGGGGFGGFGGFGGGGFFCGFEFE<@EF@EEFCDCF;GGGGFFCF<EFFAE8C7@87GGCCC@-@M02034:378:000000000-B8MRY:1:1101:20470:1640TCAGTGTTTCCTGCGCGTACACGCAAGGTAAACGTGAACAATTTAGCTGCTTTAACCGGACGCTCTACTCCATTAATAATGTTTTTCGTAAATTCAGCGCCTTCCATGATGAGACAGGCCGTTTGAATGTTGACGGGATGAACATAATAAGAAATGACGGCAGCAATAAACTCAACAGGAGCAGGAAAGCGAGGGTATCCTACAAAGTCCAGCGTACCATAAACGCAAGCCTCAACGCAGCGACGAGCACGAGAGCGGTCAGTAGCAATCCAAACTTTGTTACTCGTCAGAAAATCGAAATCATCTTCGGTTAAATCCAAAACGGCAGAAGCCTGAATGAGCTTAATAGAGGCCAAAGCGGTCTGG+@AAAAFFFDF9<CCC@CB==,<B++7+,;6,,,8,,,,;-6C@,,,6,,;EE<66;6+++77@E,,,6,9,6<,66C,/?,CCF,,68:=.,<C,9,994B?C<CAEF,?<.,,96/++8>CCE4<5C/CAA=50@7+@+FB<5A87C@@E,F=D9?>68?E7,C>C97=CCC9AFFFFFFFFGEGFCF@D:?GGEFDEFFFDD?DD<>GGGGFFGGGGFGGGEFGGGFFECFCFGGGGGGGFGGGGGFEFGGFGFCGGFGGGGDGDGGGGGGFGFDFFFEEGGGGGGGGGFEFGGFGFFCGFFFEEFFGGFGGF8GE@@@@GFAFGFEEFFEEC@,AA-EA@@,FFGGFGEF@@7GGFD@CB-CA
@M02034:378:000000000-B8MRY:1:1101:17502:1596TCTTTCCCTACACGATTTTTTTTTTCTCTC+CCCCCFFEFEECEF7--,,66++++,6;6,@M02034:378:000000000-B8MRY:1:1101:10612:1603TCTTTCCCTACACGATTTTTTTTTTTTTTT+CCCCCD@EFDEEEF7--,,6+++666++++@M02034:378:000000000-B8MRY:1:1101:15935:1610TCTTTCCCTACACGATTTTTTCTTTTTTTT+CCCCCGGGGGGGGGE--,,6+-,,,,,+6+@M02034:378:000000000-B8MRY:1:1101:13285:1617TCTTTCCCTACACGATTTTTTCTTTTTTTT+CCCCCFEFGGGDGG@--,,6+-6,,,,+6+@M02034:378:000000000-B8MRY:1:1101:10067:1621TCTTTCCCTACACGATTCTTTTTTCTTTTT+CCCCCF@EFEEEFD+--,,---,6,;,,6,@M02034:378:000000000-B8MRY:1:1101:17728:1623TCTTTCCCTACACGATTCTTTCTTTTTTTC+@CCCCFDFFDE<FF7--68@--,,,,,+6,@M02034:378:000000000-B8MRY:1:1101:11657:1630TCTTTCCCTACACGATTCTTTTTTTTTTTT+CCCCCGEDGDE<EF:-6,,86-;++6++67@M02034:378:000000000-B8MRY:1:1101:15928:1636TCTTTCCCTACACGATTTTTTTTTTTTTTC+CCCCCGGGGGGGGGG--,,6++6+++6+7,@M02034:378:000000000-B8MRY:1:1101:17550:1639TCTTTCCCTACACGATTTTTTTTTTTTCTC+CCCCCGGGGGGGGFE--,8-+++++++,,,@M02034:378:000000000-B8MRY:1:1101:20470:1640TCTTTCCCTACACGATTTTTTTTTCCTTTT+9CCCCEFFEED@DF@-66,-6+++,,,,,,