Understanding demultiplexing dual-barcoded reads

422 views
Skip to first unread message

Taruna

unread,
Jul 24, 2017, 6:41:38 PM7/24/17
to qiime...@googlegroups.com

Hello,


I'm trying to demultiplex some Miseq PE runs that were generated using custom barcodes similar to the Earth Microbiome Project's 16S protocol. I have already read through Tony Walters workaround on this topic since Qiime does not support demultiplexing of dual-barcodes. But I'm a bit confused about a few things in his post.


A. 


Tony said: 


"""

After step 3, you would want to filter the combined barcodes. If the labels haven't been changed by the stitching process, you should be able to use these commands:

egrep '^@' X| tr -d '@' > seqs_to_keep.txt

filter_fasta.py -i Y -o stitched_barcodes.fastq -s seqs_to_keep.txt

where X is the stitched fastq file, and Y is the combined barcodes fastq file.

"""


I don't understand why this step is needed. When I run these commands, the output file named "stitched-barcodes.fastq" is the same as the input file named "combined-barcodes.fastq"


B. 


Most important question - how is QIIME using the combined barcodes fastq to demultiplex the samples? We are using barcodes that are 15bp in length and after combining the I1 and I2 fastqs, the length is 30bp. Is QIIME slicing the barcodes based on user-specified length and then searching for them in R1 and R2 fastqs separately? I'm trying to understand why would one combine I1 and I2 into a single sequence?


Thanks so much! 

Taruna

Colin Brislawn

unread,
Jul 25, 2017, 12:17:37 AM7/25/17
to Qiime 1 Forum
Hello Taruna,

Thanks for getting in touch. Tony is definitely is best person to answer this question, but let me see if I can help you get started.


Most important question - how is QIIME using the combined barcodes fastq to demultiplex the samples?

Looks like this workaround involves concatenating the two barcodes together into a single barcode. Tony says it best:
4. Alter your mapping file to have barcodes that are a combined version of the first reads (created in step 1) and the second reads (step 2), so if your first read barcode was ATCCG and the second read barcode was CCGAAT, then your BarcodeSequence in the mapping file would be ATCCGCCGAAT. You want to make sure all of your final barcodes are unique and the mapping file doesn't have any errors when you run check_id_map.py on it.

So Tony is suggesting that you take your two barcodes (ATCCG and CCGAAT) and you make a qiime mapping file with a new concatenated barcode (ATCCGCCGAAT) 

Is QIIME slicing the barcodes based on user-specified length and then searching for them in R1 and R2 fastqs separately? 
Nope. Qiime 1.9.1 looks for a single barcode. Tony concatenates the two barcodes because a single barcode is all qiime 1 can find.

Does that help? I can see cc Tony to see if he has other suggestions.

Colin


PS. Another options would be to see if the sequencing center could provide them to you in a demultiplexed format. If I had a data set like this, I might let the sequencing core do demultiplexing, then jump in with a script like add_qiime_labels.py or multiple_split_libraries_fastq.py. 

Taruna

unread,
Jul 25, 2017, 9:55:54 AM7/25/17
to Qiime 1 Forum
Hi Colin! Thanks so much! But I'm still confused as to why Tony is suggesting to combined two barcodes into a single sequence. Aren't the two barcodes present in different places within the read?? 

Yeah I agree with you. Next time, we should just ask the sequencing center to demultiplex for us. It's too much of a hassle to demultiplex dual-indexed reads. 

If you don't mind cc'ing Tony so he can explain the reason behind the concatenation of the barcodes, that'd be great! 

Taruna

unread,
Jul 25, 2017, 9:56:57 AM7/25/17
to Qiime 1 Forum
Oops! Forgot to say thank you once again! Too early in the morning :-/

TonyWalters

unread,
Jul 25, 2017, 10:05:17 AM7/25/17
to Qiime 1 Forum
Hello Taruna,

Usually barcodes are at the very beginning of the reads. There are exception for the staggered (or "phased") situations where a few random bases are added to increase overall read heterogeneity, but I don't think that case applies here.

That original thread used some custom scripts-you should be able to use extract_barcodes.py (http://qiime.org/scripts/extract_barcodes.html) instead of the custom scripts that were in that prior post to read the paired-end barcodes fastq files and create a single barcodes file. The split_libraries_fastq.py file uses a single fastq file for barcodes, rather than paired option, thus the concatenation of the barcodes file into a single fastq file.

-Tony

Taruna

unread,
Jul 25, 2017, 12:00:47 PM7/25/17
to Qiime 1 Forum
Thanks Tony! We actually have index files that the sequencing center gave us... so guessing I don't need to use the extract_barcodes.py script, right? Please let me know if you think I should still use this extraction script. Thanks so much!

TonyWalters

unread,
Jul 25, 2017, 12:07:21 PM7/25/17
to Qiime 1 Forum
Is it a single index file? Or paired? If it's paired, you would still would want to get a single concatenated read fastq file for split_libraries_fastq.py. This is assuming that you're data aren't already demultiplexed (one or a pair of fastq files per sample)?

Taruna

unread,
Jul 25, 2017, 12:20:26 PM7/25/17
to Qiime 1 Forum
We have four files - R1, R2, I1, I2 and our data are not demultiplexed. So I'm planning on doing the following according to your recommendations in the original post.

A.  Combine the barcodes using combine_fastq_barcodes.py.
B.  Modify the mapping file so it matches the fastQ from A.
C.  Join the R1 and R2 fastQ files using join_paired_ends.py with -b option like Angus suggested.
D.  Run split_libraries_fastq.py with the combined barcodes fastQ, combined reads fastQ and the modified mapping file.  

Does that sound okay? 

TonyWalters

unread,
Jul 25, 2017, 12:29:16 PM7/25/17
to Qiime 1 Forum
I'd suggest this approach:

A.  Combine the barcodes using qiime's extract_barcodes.py script, specifying these options: -fastq1 X -fastq2 Y --bc1_len Z --bc2_len A --input_type barcode_paired_end
where X is your I1 barcodes file, Y is your I2 barcodes file, Z is the length of the barcodes in the I1 file, and A is the length of the barcodes in the I2 file.
B.  Modify the mapping file so it matches the combined barcode1 + barcode2 in the fastQ from A.
C.  Join the R1 and R2 fastQ files using join_paired_ends.py with -b option like Angus suggested (this will be the same)
D.  Run split_libraries_fastq.py with the combined barcodes fastQ, combined reads fastQ and the modified mapping file.

-Tony  

Taruna

unread,
Jul 25, 2017, 4:13:40 PM7/25/17
to qiime...@googlegroups.com
Awesome! Thanks, Tony! So I completed A-C but am having a little bit of an issue with split_libraries_fastq.py with an incorrect value for phred_offset. Please see below.  

skbio.parse.sequences._exception.FastqParseError: Failed qual conversion for seq id: M02034:378:000000000-B8MRY:1:1101:17502:1596. This may be because you passed an incorrect value for phred_offset.

I dug around a bit in the forum and found someone who had a similar issue. Per your suggestion in that post, I searched for a few lines before and after this header and nothing seems to be out of order. You also suggested to update fastq-join in one of the posts... so I sent a request to our cluster team.

But in the meantime, I tried to specify the --phred_offset option with 33 but got an error. 


Command:
split_libraries_fastq.py -i $COMBINED_FASTQS/fastqjoin.join.fastq -b $COMBINED_FASTQS/fastqjoin.join_barcodes.fastq -m $MAP/16S-bac-QIIME-mapping-MEmicrobiome-combined-RC.txt -o $SPLIT --rev_comp_mapping_barcodes -q 19 -r 5 -p 0.70 --barcode_type 30 --phred_offset 33


Error:

python/2.7.12/qiime-env/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2889: RuntimeWarning: Mean of empty slice.
 
out=out, **kwargs)


We want to use both reads so is our best option just to update fastq join?

Also, the combined barcode file produced by join_paired_ends.py contains barcodes that are not in our mapping file. I'm guess these are PhiX related. Will the PhiX reads by removed after running split_libraries_fastq.py or should I remove PhiX reads outside of QIIME?

Thanks for your help! 
Taruna
 

TonyWalters

unread,
Jul 26, 2017, 12:40:57 AM7/26/17
to Qiime 1 Forum
Hello Taruna,

The --phred_offset 33 is the setting you want to use, but it looks like it's not writing any files out for some reason. Did it create a log file? It should tell us why the reads aren't being written (not matching barcodes, short reads after quality truncation, etc.).

-Tony

Taruna

unread,
Jul 26, 2017, 12:26:30 PM7/26/17
to Qiime 1 Forum
Morning Tony! Yes it did create a log file but its not very informative. Pasting the contents of that log file below. 

Input file paths
Mapping filepath: /rhome/taruna/shared/taruna/memb/qiime-files/mapping/16S-bac-QIIME-mapping-MEmicrobiome-combined-RC.txt (md
5: bde95139910d0d32f9e3f3c3f2a4f3a3)
Sequence read filepath: /rhome/taruna/shared/taruna/memb/data-raw/uc-davis/16S/combined-fastqs/fastqjoin.join.fastq (md5: ddd
4f2871939baa9243f0a6837a47ef3)
Barcode read filepath: /rhome/taruna/shared/taruna/memb/data-raw/uc-davis/16S/combined-fastqs/fastqjoin.join_barcodes.fastq (
md5
: 93428fb08588171c0f4bf2871d11d8bf)

TonyWalters

unread,
Jul 26, 2017, 1:00:14 PM7/26/17
to Qiime 1 Forum
Hello Taruana,

So the log only had those filepaths/md5 sums?

Can you get a sample of the first 10 sequences from both the barcodes and the join.fastq file and post those? It would be good to get an idea of what the reads look like, and see what the barcodes look like (in comparison to your mapping file barcodes).

-Tony

Taruna

unread,
Jul 27, 2017, 2:54:18 AM7/27/17
to qiime...@googlegroups.com
Sure thing Tony! Here you go.

fastqjoin.join.fastq - first 10 reads

@M02034:378:000000000-B8MRY:1:1101:17502:1596
TCTTACCTATTAGTGGTTGAACAGCATTTGACTCAGATAGTAATCCACGCTCTTTTAAAATGTCAACAAGAGAATCTCTACCATGAACAAAATGTGACTCATATCTAAACCAGTGCTTGACGAACGTGCCAAGCATATTAAGCCACTTCTCCTCATCTAACGCGTCAGTTTTTGACAGAATCGTTAGTTGATGGCGAAAGGTCGCAAAGTAAGAGCTTCTCGAGCTGCGCAAGGATAGGTCGAATTTTCTCATTTTCCGCCAGCAGTCCACTTCGATTTAATTCGTAAACAAGCAGTAGTAATTCCTGCTTTATCAAGATAATTTTTCGACTCATCAGAGATATCCGAAAGTGTTAACTTC
+
A@-A<C@E-CA<EEG<@D:,,;<C,C@,,6,,<66,,CC--;6C<@,6,@F:CEFE,,,,<;,,,,;C,,60-;,CFF66CE8F?.,CFE,@CCF<.<C<1C7<@E<9<C,1/<,BE@C7<:7/@0B=,=@,BC:EBFEAB@705:?BEC?C?DE<?,;:??=CGGFFFGFGFFFFEFGFGEFGAGGGGGCEDGGFFCGGGEFFGGCFDDCFDGGFGFFFGGGFGGGFGFFGGGF>C?>FGGGGGGFFGFGGGEFFEGFEDEGGFGFF9GECFCEFAFCDGGGGGGEE4ECC,<FCFFGFFEF>@E8FC<9<C,,C<CC,@C,+;,,6,FFC?C-,FCC,ECC7@@EC,CF9E<FGCB86-
@M02034:378:000000000-B8MRY:1:1101:10612:1603
TAGTTATATGGCTGTTGGTTTCTATGTGTCTAAATATGCTAACAAATAGCCAGATTTGGTACTTGCTGCTAAAGGCTTAGGAGCCAAAGAATGGAACAACTCACTAAAAATCAAGCTGTCGCTACTTCCCAAGAAGCTGTTCAGAATCAGAATGAGCCGCAACTTCGGGATGAAAATGCTCACAATGACAAATCTGTCCACGGAGTGCTTAATCCAACTTACCAAGCTGGGTTACGACGCGACGCCGTTCAACCAGATATTGAAGCAGAACGCAAAAAGAGAGATGAGATTGAGGCTGGGAAAAGTTACTGTAGCCGACGTTTTGGCGGCGCAACCTGTGACGACAAATCTG
+
--8ABAE<EAFGF@A<@B?<FC<;C<C,,CC,,;C,,,,6-,;6,,,,,,,6,,C,<,,,,;,C06CC<E6/.;6,,;,52-C,,6,CCD-C296.C56,,9,9C<22:2,,1..5C8EBFCE:=FF,5;?<>.4C=A,==FFCF,:BACF@AEEGDGCFDFDEBBF,FFFGFG,FFGFGF@E;CFFCFGFFFFFGGGGGGGFEFDFFGGGGGGFGDCDCFGGFFGEGDGDGGGGEGGGGFFCGEGFGGGFD=FFF?FGFFGEFD?GGFCGFGGGGGGFFGCFGCFGGFGGGFF8FFEFCDFFE<,,CC,C7F@+6:CB@<?F=4F@:CCFEAE<C@@BAB@,CECCA-AAA
@M02034:378:000000000-B8MRY:1:1101:15935:1610
TGCCAGTTAAATAGCTTGCAAAATACGTGGCCTTATGGCTACAGTATGCCCATCGCAGTTCGCTACACGCAGGACGCTTTTTCACGTTCTGGTTGGTTGTGGCCTGTTGATGCTAAAGGTGAGCCGCTTAAAGCTACCAGTTATATGGCTGTTGGTTTCTATGTGGCTAAATATGTTAACAAAAAGTCAGATATGGACCTTGCTGCTAAAGGTCTAGGAGCTAAAGAATGGAACAACTCACTAAAAACCAAGCTGTCGCTACTTCCCAAGAAGCTGTTCAGAATCAGAATGAGCCGCAACTTCGGGATGAAAATGCTCACAATGACAAATCTGTCCACGGAGTGCTTAATCCAACTTACCAAGC
+
-8BCCCCFFA<-@C9,@E,A<<EF<F<F,:@,C@<CF<,;E<,FC,CE<A6@C688,,8@68CF,8,C:@@::FFC+B8CEEECFEE,CC;6:EB7FF-CF/4CFBFED9FCFEC<9CBCAFF@?8>FFFE?E?F<<B=BC<FE=E5@AFCFCDFFGGBD?D=EDFFGGGFFECFFFFFDDEB8CEFFEFGGGEGFGGGFFGGFE9BFGGGGGGGEFFFEGGGFCGFGGED?FGGGGGGGFF?GGFDGFAGGGGGGGGGGGEEFFGGFFFECCGDGFFGFGGGGGGFGFCEFEGGFCFFGGGGGGFGGGGGGGFFE@CEFFF,FF@CE<EDAC8:D?BFE@@F9CEGDFFEB<FFFCGGCBCA6
@M02034:378:000000000-B8MRY:1:1101:13285:1617
TATGAGGGACATAAAAAGTAAAAATGTTTACAGTAGAGTCAATAGCAAGGCCACGACGCAATGGAGAAAGACGGAGAGCGCCAACGACGTCCATCTCGAAGGAGTCGCCAGCGATAACCGGAGTAGTTGAAATGGTAATAAGACGACCAATCTGACCAGCAAGGAAGCCAAGATGGGAAAGGTCATGCGGCATACGCTCGGCGCCAGTTTGAATATTAGACATAATTTATCCTCAAGTAAGGGGCCGAAGCCCCTGCAATTAAAATTGTTGACCACCTACATACCAAAGACGAGCGCCTTTACGCTTGCCTTTAGTACCTCGCAACGGCTGCGGACGACCG
+
--8A-CEF;DFC9<@9@:AE,,,6F<C,CCC,CF<,,,6;;6<<,C,C,@,;,6@66,8+6BCC--,:,29,8@:@+,C7@CECF?+CE>@B:EF?E.5+8-+,8EFCF7CFC3B<;ABCECCF>;CEAFEGCF,ED9GG@FCGFFCFFGFGFFFFFFDEFCGFAGGGGGGFCGGGFGFGGGFDCCGGGGFGGGGGFGGGGGGFFFGFGGGGGGGGGFFFFF9FGGGGGGGGGFCFGGGFGCCGGGGGGGGGFF<GDFEGGGGGGFFEFGFGGGGGGFFGGGGGFDGGGGEF@EEEDAF@CGEF8DFCFC8;FFGGFDGCFB@6FGFGGGEGF@FGCCB@-
@M02034:378:000000000-B8MRY:1:1101:10067:1621
TACCTTTAGCGTTAAGGTACTGAATCTTTCTAGTCGCTGTAGGCGGAAAACGAACATTCGCAAGTGTAAACATAGTGCCATGCTCAGGAACAAAGAAACGCGGCACAGAATGTTTATAGGTCTGTTGAACACGACCAGAAAACTGGCCTAACGACGTTTGGTCAGTTCCATCAACATCATAGCCAGATGCCCAGAGATTAGAGCGCATGACAAGTAAAGGACGGTTGTCAGCGTCATAAGAGGTTTTACCTCCAAATGAAGAAATAACATCATGGTAACGCTGCATGAAGTAATCACGTTCTTGGTCAGTATGCAAATTAGCATAAGCAGCTTGCAGACCCATAATGTCAATG
+
--8ABCEFFFGBB;@;;A<BF9C6CF<,C,;,<C,++,686C<,+++7+6,@+66+,,6,,8.@,,<<,,,,<666,9C,9<<FE<CA<CC1,9E8,,,,@+:=28=8?<EAFEFDF=E9EFFEEFA,A@<ECC>@GFGFFAAD@D:DE8=C=FEFFGEBFGFGFFFFEEDEGGGGFF@DFGFGEGFFFGGGGFFEC?GGGGGEGFD<CGFGGGGFE8BDEGGGGFCCDFGGFGGGGGGGGGGGGGGGGFAF>;E;GFEFGGGFAFCGCAGGFGGGGGGDFFFGGDGFFGGGGGGEGGFCGCCEE9CFCFF<DFECC,GFEADCEGFAGDE@;<E,EFDEF9FC,CE<CCCA-
@M02034:378:000000000-B8MRY:1:1101:17728:1623
TCCCGCGTTGCGTCTATTATGGAAAACACCAATCTTTTCAAGCAACAGCAGGTTTCCGAGATCATGCGCCAAATGCTTCCTCAAGCTCAAACGGCTGGTCAGTATTTTACTAATGACCAAATCAAAGAAATGACTCGCAAGGTTAGTGCTGAGGTTGACTTAGTTCATCAGCAAACGCAGAATCAGCGGTATGGCTCTTCTCATATTGGCGCTACTGCAAAGGATATTTCTAATGTCGTCACTGATGCTGCTTCTGGTGTGGTTGATATTTTTCATGGTATTGATAAAGCTGTTGCCGATACTTGGAACAATTTCTGGAAAGACGGTAAAGCTGATGGTATTGGCTCTAATTTGTCTAGGAAATAACCGTCAGGG
+
-A-8@FGGG>@@CFGGCFA@,,,,,CC,,B,6C,C@E9F9-FC<,C,66CFGCFF<;,,++,,6@,68@66+,:6CEC,99CC<9,9CFE/=,8=F/,C?94C,<CEEF9,0,<<,A?,,9:?58,/,3?F9?EB<B@418?EF<5B,7F>9=;D<DA@BC<FDFFEDFDE>EE@FFGFGDD<@FGBEFFFFGGGGGFFFGFFF@F@GGGEBFDEFD?GFCGGGGGFGGCGFC<GGEDGGFDGGGGFGGGGFGGF=GFGGGFEGGEGCGGGGGE@6B?@<FFBFF>5GGGDFA@DGGGCFFGGFFGGGGGGGFGGFDFF<EEFCCDFC@@E<@E@<C;,B@@@DD<,A;@8<<A6,C6EBE,AF,C@FC6@BCA-
@M02034:378:000000000-B8MRY:1:1101:11657:1630
TCGGAAAACATTATTAATGGCGTCGAGTGTCCGGTTAAAGCCGCTGAATTGTTCGCGTTTACCTTGCGTGTACGCGCAGGAAACACTGACGTTCTTACTGACGCAGAAGAAAACGTGCGTCAAAAATTACGTGCAGAAGGAGTGATGTAATGTCTAAAGGTAAAAAACGTTCTGGCGCTCGCCCTGGTCGTCCGCAGCCGTTGCGAGGTACTAAAGGCAAGCGTAAAGGCGCTCGTCTTTGGTATGTAGGTGGTCAACAATTTTAATTGCAGGGGCTTCGGCCCCTTACTTGAGGATAAATTATGTCTAATATTCAAACTGGCGCCGAGCGTATGCCGCATGACCTTTCCCATCTTGG
+
-8A@-@@C<9EC<<EDFECE@@@+C::,,6<,8@EC6,,-C,6+7++;;6CEC,88+8@@,CFGF,CCF,@6CCE:++A><4C9ECF=<E@FFAFF,BE90?7+656.?8.A8E,E7B>C@<@CE9BB<FDDF@FFFF?><FCFFE=FFBFFFGGFFAGGGGGE@C:DGFFECE,E@GGGGEGGDGGGGFEGGGFEEEGEEEGFGGGGGGGGGGGGGGGGFFGGFEGGGGGFDGFGGGGGGGGGFGGGGGGGGGFGGFDF9GGGGFGGGGGFDCGGEGGGGGGFGGGGGGGGGGFGGGGGEFFGGGGFFF9E9FCGEC67EEGF:CEEC:;DC7B@CDAGGGGGFCF?DEGFCA9CCB
@M02034:378:000000000-B8MRY:1:1101:15928:1636
AGAGCGGTCAGTAGCAATCCAAACTCTGTCACTCGTCAGAAAATCGAAATCATCTTCGGTTAAATCCAAAACGGCAGAAGCCTGAATGAGCTTAATAGAGGCCAAAGCGGTCTGGAAACGTACGGATTGTTCAGTAACTTGACTCATGATTTCTTACCTATTAGTGGTTGAACAGCATCGGACTCAGATAGTAATCCACGCTCTTTTAAAATGTCAACAAGAGAATCTCTACCATGAACAAAATGTGACTCATATCTAAACCAGTCCTTGACGAACGTGCCAAGCATATTAAGCCACTTCTCCTCATCCAACGCGTCAGTTTTTGACAGAGTCGTTAGTTGATGGCGAAAGG
+
-88BCCFFF7F6,C<EFF,?,,C@F,CCC,,6C,CC,,8;C<,EC,,@,,;6CCEE<,,@@:E<E@C<8.<6@@EGE7E/EFFGG<FAEDFGFEEF9<80CB<8,BFDEEGCF><99CBDGGFF7CFECFGFFFGGFFEFFFFFGFFEAGGGEFGGGGFBEBGFFFFEFGFGGGGFGGGGGGFGGGFGGFGGGGGGEBGGGGGFGGGGGGGGGGGGFECBGGGGGDGDFGGGGGGFCGGGGGGGGGGGGGFEGGGGGFFGGGGGEEGCGGGGGGFFF=GGGGGGGGGFGFCFGFGFFFGGFDGGFC,EE@FCFEFA@C:DEBFE;-FFBB+@FAGEGGGGFFCCEFECCCCC
@M02034:378:000000000-B8MRY:1:1101:17550:1639
TGTCGCATTGCATTCATCAAACGCTGAATAGCAAAGCTTCTACGCGATTTCATAGTGTAGGCCTCCAGCAATCTTGAACACTCATCCTTAATACCTTTCTTTTTGGGGTAATTATACTCATCGCGAATATCCTTAAGAGGGCGTTCAGCAGCCAGCTTGCGGCAAAACTGCGTAACCGTCTTCTCGTTCTCTAAAAACCATTTTTCGTCCCCTTCGGGGCGGTGGTCTATAGTGTTATTAATATCAAGTTGGGGGAGCACATTGTAGCATTGTGCCAATTCATCCATTAACTTCTCAGTAACAGATACAAACTCATCACGAACGTCAGAAGCAGCCTTATGGCCGTCAACATACATATCACCATTATCGAACTCAACGCCCTGCG
+
A@8ACFGGGFDGFFFFFF<,,CB+B@+6CC,,<@6C,,,;-6CC@++@@,;@C@EE,,,,,CFGAC,CE8<FE,6C,,C,EE@@,,:,9C<E.CCFFE,C9FEF@FGDC74CF9F<EA<EEDFD7+7@7F,5EE9<?<1FFEGGFCC9?EF,6?@DFG;BFFCB>CB@DECCCEADBCCEFGCFF<GGGGFFFCAECFFGGFGGGG>>FGGFGGFGFGGGGGGDGGFFFGCGGGGGGFGFE@FDGFFGGGGCFDGGGGGGGFFDGGGGGGGGGGGGGGFGFE=ED>GGGEGGDGGFGGFFDGCGGGGGCGGGGFGGGFEFFGGGGGFGGFGGFGGGGFFCGFEFE<@EF@EEFCDCF;GGGGFFCF<EFFAE8C7@87GGCCC@-
@M02034:378:000000000-B8MRY:1:1101:20470:1640
TCAGTGTTTCCTGCGCGTACACGCAAGGTAAACGTGAACAATTTAGCTGCTTTAACCGGACGCTCTACTCCATTAATAATGTTTTTCGTAAATTCAGCGCCTTCCATGATGAGACAGGCCGTTTGAATGTTGACGGGATGAACATAATAAGAAATGACGGCAGCAATAAACTCAACAGGAGCAGGAAAGCGAGGGTATCCTACAAAGTCCAGCGTACCATAAACGCAAGCCTCAACGCAGCGACGAGCACGAGAGCGGTCAGTAGCAATCCAAACTTTGTTACTCGTCAGAAAATCGAAATCATCTTCGGTTAAATCCAAAACGGCAGAAGCCTGAATGAGCTTAATAGAGGCCAAAGCGGTCTGG
+
@AAAAFFFDF9<CCC@CB==,<B++7+,;6,,,8,,,,;-6C@,,,6,,;EE<66;6+++77@E,,,6,9,6<,66C,/?,CCF,,68:=.,<C,9,994B?C<CAEF,?<.,,96/++8>CCE4<5C/CAA=50@7+@+FB<5A87C@@E,F=D9?>68?E7,C>C97=CCC9AFFFFFFFFGEGFCF@D:?GGEFDEFFFDD?DD<>GGGGFFGGGGFGGGEFGGGFFECFCFGGGGGGGFGGGGGFEFGGFGFCGGFGGGGDGDGGGGGGFGFDFFFEEGGGGGGGGGFEFGGFGFFCGFFFEEFFGGFGGF8GE@@@@GFAFGFEEFFEEC@,AA-EA@@,FFGGFGEF@@7GGFD@CB-CA

fastqjoin.join_barcodes.fastq - first 10 barcodes

@M02034:378:000000000-B8MRY:1:1101:17502:1596
TCTTTCCCTACACGATTTTTTTTTTCTCTC
+
CCCCCFFEFEECEF7--,,66++++,6;6,
@M02034:378:000000000-B8MRY:1:1101:10612:1603
TCTTTCCCTACACGATTTTTTTTTTTTTTT
+
CCCCCD@EFDEEEF7--,,6+++666++++
@M02034:378:000000000-B8MRY:1:1101:15935:1610
TCTTTCCCTACACGATTTTTTCTTTTTTTT
+
CCCCCGGGGGGGGGE--,,6+-,,,,,+6+
@M02034:378:000000000-B8MRY:1:1101:13285:1617
TCTTTCCCTACACGATTTTTTCTTTTTTTT
+
CCCCCFEFGGGDGG@--,,6+-6,,,,+6+
@M02034:378:000000000-B8MRY:1:1101:10067:1621
TCTTTCCCTACACGATTCTTTTTTCTTTTT
+
CCCCCF@EFEEEFD+--,,---,6,;,,6,
@M02034:378:000000000-B8MRY:1:1101:17728:1623
TCTTTCCCTACACGATTCTTTCTTTTTTTC
+
@CCCCFDFFDE<FF7--68@--,,,,,+6,
@M02034:378:000000000-B8MRY:1:1101:11657:1630
TCTTTCCCTACACGATTCTTTTTTTTTTTT
+
CCCCCGEDGDE<EF:-6,,86-;++6++67
@M02034:378:000000000-B8MRY:1:1101:15928:1636
TCTTTCCCTACACGATTTTTTTTTTTTTTC
+
CCCCCGGGGGGGGGG--,,6++6+++6+7,
@M02034:378:000000000-B8MRY:1:1101:17550:1639
TCTTTCCCTACACGATTTTTTTTTTTTCTC
+
CCCCCGGGGGGGGFE--,8-+++++++,,,
@M02034:378:000000000-B8MRY:1:1101:20470:1640
TCTTTCCCTACACGATTTTTTTTTCCTTTT
+
9CCCCEFFEED@DF@-66,-6+++,,,,,,


TonyWalters

unread,
Jul 27, 2017, 3:14:58 AM7/27/17
to Qiime 1 Forum
Hello Taruna,

I blasted a few of those reads on ncbi, and everything I'm getting is PhiX genome. There may be other 16S somewhere in the reads, and we're just getting a bunch of them at the beginning for some reason (what was the % spike-in of PhiX?).

The barcodes, at least for these reads, look odd, but perhaps that's not surprising if they are just part of the PhiX genome rather than being your barcodes.

Does your sequencing center have anything to say about demultiplexing the reads? The Caporaso approach in the EMP is different from the normal sequencing approach (where the barcodes are at the beginning of the read(s)); your sequencing center may have a way to try and demultiplex the reads directly, and give you paired fastq files per sample.

I don't know how much digging this will take, but it would be good to find one of your raw reads (so before we did anything with extract_barcodes.py) that hits 16S data on blast, and then look at the full sequence and figure out what the barcodes look like at the end, and if they match your mapping file, or if it's off by a few bases (e.g. random bases before the barcode).
Reply all
Reply to author
Forward
0 new messages