join_paired_end.py fastq-join with barcodes error - Reached end of index-reads file...

78 views
Skip to first unread message

Michael Baron

unread,
Feb 23, 2016, 8:51:11 AM2/23/16
to qiime...@googlegroups.com
Hello everybody (I'll become a regular soon!),

I'm running into an error when attempting to join paired end reads from an Illumina MiSeq run.

join_paired_ends.py -f $TMPDIR/bioc3101_2016_read1.fastq -r $TMPDIR/bioc3101_2016_read2.fastq -b $TMPDIR/bioc3101_2016_barcodes.fastq -o $TMPDIR/fastq-join_joined

The above code eventually returns me the following:

Traceback (most recent call last):
  File "/imports/home0/sejj036/qiime191/bin/join_paired_ends.py", line 178, in <module>
    main()
  File "/imports/home0/sejj036/qiime191/bin/join_paired_ends.py", line 174, in main
    write_synced_barcodes_fastq(assembly_fp, index_reads)
  File "/imports/home0/sejj036/qiime191/lib/python2.7/site-packages/qiime/join_paired_ends.py", line 68, in write_synced_barcodes_fastq
    " paired-end ID processed was:\n\'%s\'\n" % (joined_label))
StopIteration:

Reached end of index-reads file before iterating through joined paired-end-reads file! Except for missing paired-end reads that did not survive assembly, your index and paired-end reads files must be in the same order! Also, check that the index-reads and paired-end reads have identical headers. The last joined paired-end ID processed was:
'M01520:130:000000000-ALUEB:1:2106:16170:1516 1:N:0:0'

I'm not sure whether my sequence files are at fault, or something is (yet again) failing in my run on the HPC cluster.

When I checked the head and tails of my sequences everything seems to add up - e.g. the highlighted sequence IDs all match:

[sejj036@login09 761977.1.]$ head bioc3101_2016_read1.fastq
@M01520:130:000000000-ALUEB:1:2106:15023:1418 1:N:0:0
TACAGAGGGTGCTAGCGTTGTTCGGTTTCATTGGGCGTACAGTTCGTGTAGGCGTTTTTTTCTGTCATTTGTTATATCCCTCGTCTCTACCTTGGTTCGTCGCTTTTTACTGGCTTGCTTTATTTCCTTCTTTTTGTGTTGTATTCCCTGTTTATCGTTTTCATGCGTATCTATCTGGATGCTCTCCTTTGGCTTTGGCTTCTTCCTTGTTTGTTTCTGTCTCTGTTTCTCTTTTTCTTGTTTATCAATC
+
AAAAA111AA1A1A1FEEEEFEE0000B12BDD10AAE///222A/A/BF11AA//AF/>E012B22B2@2B0222B@2@B>//?F022>111111/1//<///?/11/<F21/00/?@111>@22212111111/00?0/01??1<111>1=111.0>..10=00.<...=0=000000//0000/<0;0/:/./0///./0009;00;0///;/0;00;0;C00;000;000009/9///-9//////
@M01520:130:000000000-ALUEB:1:2106:14763:1432 1:N:0:0
TACGTAGGGGTCTAGCGTTTCTCTTTTTCACTGTGCGTTTTTTTTTCTTATTCTTTTTTTTTTTTCTTTTTTTTTCTCCTTTTTCTCTTCTCCTTTTCTTCCTTTTTTTCTTTTTTTCTTTTTTTCTTTATTTTTTTTTTTTTTTTCTTTTTTATTTTTTTATTTCTTATCTATTTTTTTTTACTCCTTTTTCTTTTTCTTCTTTCTTTTCCTTTTCTTTCTCTTTTTTTCTTTTTCTTTTTTATCTCTC
+
>1>111>1A11>BG3F1AE01BF3331B02AFDD1ABB0//////A01D22222D111>>E//>E01>B1EE?//0B>11121?0?222>>1111211<<<1?1?11>-0<1111-</==D00;:/0000:0:00---:-----9-////BF>-//9/:---/;;//9/B/B/99///-9;--//////9/9BB//9//9:/9///9/9/////9//9///9/9///:--/////9//9/9/;-9/////
@M01520:130:000000000-ALUEB:1:2106:16060:1443 1:N:0:0
TACGAAGGACCCGAGCGTTATCCGGATTTATTGGGTTTATAGGTTGCGTAGGCGGCGCGGTACGTCGGGTGTGATATCTCGGTGCTTATCTCCGTATCTTCTTTCGCTACTTCCGTGCTTGAGGTCTGGAGTGGTGACTGGTATTTTCTGTGTAGCGGTGTACTGCGTAGCTATCGTATGTATGTCCAGTGTCTTATTCGGGTCTCTGGCCCGTTCCTGTCTCTTTGGCTCTTTTTTCTGTGTAGCATTC


[sejj036@login09 761977.1.]$ head bioc3101_2016_read2.fastq
@M01520:130:000000000-ALUEB:1:2106:15023:1418 3:N:0:0
CCTTTTTTCTCCCCTCTCTTTCTCTTTTCTTCTTCTTTATCTTTCCTTTTTGTCTCCTTCTCCTCTGTTTTTCCTCCCTATATCTCCGCATTTCCCCTCTACTCCTTTTTTTCCACTCCCCTCTTTCTTCCTCTTGCTTGCCTTTTTCTTTCTTCTTTCCTCTTTTTTGCCTTTGTCTTTCTCTCTTGTCTTTTCTTTCCTCCTACCTGCCCTTTACTCCCAGTCTTTTCTACCCTCGTTTTCACCCTCT
+
>1>>>D@1BD3B1A111BBFGF3A333DD13A2AA2AB2D2222AA122A1/A22AABEF1AA10112D1BA0BE10/01B1DAF2@///??D221>0?F112@@11211>>>B2>100?/<?F1122B21B>F1111B1/B11B<B<1>1221@21@2@1111>@1//11111111=>=<111111111<<1100000000/<00.00...;::0000/./0:000:00:000.0.-....000099./
@M01520:130:000000000-ALUEB:1:2106:14763:1432 3:N:0:0
CCTTTTTTCTCCCCTCTCTTTCTCTTCTCTTCTTCTTTTTCTTTCCTTTTTTCCTCCTTCTCCTCTTTTTTTCTTCCTAATATCTACTCTTTTCTCCTCTCCTCTCTCCTTTCCTCTTTCCTCTCCCTTTCTCCTTTCTTCCTTTATCCTTTTCTTTTCTTTTTTTTTTCTCTTTTTTTTCTCCTCTTCCTTTTTTTTTCTTCTTCTCTCCCTTTCCTCTCTTTTTTTCTTTCCTTCTCTTTCTCTCTCT
+
1111>DD>>F@@AA111BBEGDE333ADA3D12AA2AB11011A2112BB1A01DA0BABB1A1A111AAA>>1E21121D2DBF2@B12BBED121B>G0010101101B@@1111112>B1F0000111B21112>B12112B1BB122212111B211>1<<<</-01111111-<@/0000/:;00::000----/=00;00;000///;90//////;///99/////////////;/;///;//
@M01520:130:000000000-ALUEB:1:2106:16060:1443 3:N:0:0
CCCGTTTTCTCCCCTGTCCTTCTTGCCTCTGCGTCAGGAACTGTCCTGTGACCCGCCTTCTCCTCTGGTCTTCCTTTCGTTATCTACGCATTTCACCGCTCCTCCGTTATTTCCATTCTCCTCTCCTTTCCTCCATCTCGGTTGTTTCGATTTCTGTTTCGTATTTTTTCTCCGTTATTTCTCACCCTCCTTACCTCTCCTCCTTCGCACTCTTTCTTCCCTTTATCTCCTTCTTTCTCTTGTTTCCTTC


[sejj036@login09 761977.1.]$ head bioc3101_2016_barcodes.fastq
@M01520:130:000000000-ALUEB:1:2106:15023:1418 2:N:0:0
TACTTGTCTGAT
+
>1111111311@
@M01520:130:000000000-ALUEB:1:2106:14763:1432 2:N:0:0
TTTCACCAGGTT
+
>1111B1111@1
@M01520:130:000000000-ALUEB:1:2106:16060:1443 2:N:0:0
TTTCACTAGGTG



[sejj036@login09 761977.1.]$ tail bioc3101_2016_read1.fastq
+
AAAA1DC?F11B1BGFGGGCCAGEEG?E1B1DDFGHGGGGHH1E/AAEAEGG1EE?/>//>EFHHH2B11>FFFGHFH1>/</?FGDBGH210?>?CCHHH/<//??2FGHF11<?/AF?CFFDD111F<F.EG./...=CC@:C0C00;FGGFBF@GCAEFGFGGGGFGGGGGG90B/ABBFF-BB-9@@FF@@9@FE9@@-9-AEFFFF/;//9/BFBF/BFFBB?--:A---AB--9--;-;?A99A
@M01520:130:000000000-ALUEB:1:1109:18959:28767 1:N:0:0
TACAGAGGTCTCAAGCGTTGTTCGGATTCATTGGGCGTAAAGGGCGCGTAGGCGGCTAGGTAAGTCAGTGGTGAAATCTCCCAGCTTAACTCGGAAACTGCCATTGATACTATTTATCTTGAATACTTTGGAGGTAAGCGGAATTTGTCATGTAGCGGTGAAATGCTTAGATATGACATAGAACACCAATTGCGAAGGCAGCTTACTATACAGTTATTGACACTGAGGCACGAAAGCGTGGGGATCAAAC
+
AAAA11BAC11B3B1FFGG1AAFEEG0E1D2DFFGGEGGGHH1B//AAAEEF1EG///00BBFGHF@F22BGGFFHGH21100BFG1BBF21<E/?E<GHGF11>BGFGHH22222@@F@GF1FF<2@F1F1CG<<1?1A@C@<1111=1>FGFDD@CFGF0GDGGGHCHGHGH000000C;<CCGGE.C0C0C-;?EA.;9B009;000;0999;/;BBF/;BBBFFE-9A99-AB-9A@@@F9;-B9B
@M01520:130:000000000-ALUEB:1:1109:19301:28793 1:N:0:0
TTCTTTTTTTTCTCTCTTTTTTCTTATTTACTTTTCTTTCTTTTTTCTTCTTCTTTTTTTTTTTTCTTTTTTTTTTTCTCCTTTCTTTTCTTTTTTTTTTCTTTTTTTTCTTTTTTTCTTTTTTTTTTTTTTTTTTTTTTTTTTTTCTTTTTTCTCTTTTTTTTTCTTATTTTTCTTTTTTTTTTCCTTTTTCTTTTTCTTCTTTCTTTTTCTTTTTTTTCTCTTTTTTTCTTTTTCTTTTTTTTCTTTC
+
>11113@BA10B13B1BFG1DAABD2ADA22DBFDAAE12211D1/0AA22D221A11/>>///>01@111>////>01211<>>F111211111//->-<0=000-;-/;:0:0:-/;00009-9--------@--------9;-////;/9///9/://--99//B///;/;/99///9----////;9B////9///:/9///;//////;://9--/;9B9/9/-9//////////9/--9/////


[sejj036@login09 761977.1.]$ tail bioc3101_2016_read2.fastq
+
AA1>AFCAFFFFGGB1AECGGGG000EGHFH1AEEE/A2A11BAFCFF10A///EEEGGHHGGG>0EE>/EGH@GDC0B1GGFHGGGGGFGFHHFHDEGGECGGF0BB01GHHG0A/</FHGHHBG0</A<CH0FGB<<--<=GH0GH000/AC--:;0CCCB9CFB9CF0009.9.9CFGBF?.9/;9CBBF/9/--9;-AAFB;9--;-A-BFBB-A@---;///99BB-----;;@9----;A9-9-
@M01520:130:000000000-ALUEB:1:1109:18959:28767 3:N:0:0
CCTGTTTGCTCCCCTCGCTTTCGTGCCTCAGCGTCAATAACTGTCCAGTAAGCTGCCTTCGCAACTGGTGTTCCTCCTCATATCTAAGCATTTCACCGCTACACCTCACATTCCGCTTTCCTCTTCAGTACTCAAGACAAACAGTCTCAATCGCAGTTCCCCAGTTAAGCTCGTAGCTTTCACCTCTGACTTACCTACCCTCCTACGCACCCTTTCCGCCCACTCATTCCGAACACCGCTTGCGACCTCT
+
AA1>AF@B@DFFGG1AEFAGGGA000FHFGG1BBEEAA2111A2A1DB212111BBFGGHHG///0B0B0FEH22100>BFBFGGG1BFFGGFHFG1EEGEE1>0001/1BFGF0>///BFFHF11212B1BG1BG10B00/B@G1@F2211/<//<?1?100CCD11?F11<<.>.11=<1=<0<000=0D00/00<<C0:CG00;-.;:B99FB0..;@9.://090;/..---;-9--;------:/
@M01520:130:000000000-ALUEB:1:1109:19301:28793 3:N:0:0
CCTTTTTTCTCCCCTCTCTTTCTTTCCTCTTCTTCTTTTTTTTTCCTTTTTTCCTCCTTCTCCTCTTTTTTTCCTCCTTTTCTCTTCTCTTTTCTCTTCTTCTCCTTTCTTTCCTCTCTCCTCTCCCTTCCTCTTTCTTTTCTTTTTCCTCTTCTTTTTCCTTTTTTTTCTTTTTTTTTTCTCTTTTTTCTTTTCTTTCCTCCTTCTTTCTCTTTTCTCCCTTTTTTTTTTTTTTTTTCTTTCTTCCTTT
+
111>>BD>>B31111110ABE3A3331D11112A221B11//AA0112211/01AA0BBB1111011@11>>01B10012B2B2B21@@2BBE21222>B12211BB212B>21>1B0<10<0B0000B00B1@1122111211>1<01>111111<?101111<<---/00000--::?/0099000-/990;000000;0/:00009////9://////-/9//-9----------/////////9//


[sejj036@login09 761977.1.]$ tail bioc3101_2016_barcodes.fastq
+
>111111>1111
@M01520:130:000000000-ALUEB:1:1109:18959:28767 2:N:0:0
TACTGTACTGAT
+
>1>1>33B3333
@M01520:130:000000000-ALUEB:1:1109:19301:28793 2:N:0:0
TTTTTTTCTTTT
+
>1111111B331

The count_seqs programme is currently running, so I can't quite jet confirm that they are all equal length.

edit: it finished, and all appears to be in order.

count_seqs.py -i bioc3101_2016_barcodes.fastq,bioc3101_2016_read1.fastq,bioc3101_2016_read2.fastq
16807541  : bioc3101_2016_barcodes.fastq (Sequence lengths (mean +/- std): 12.0000 +/- 0.0000)
16807541  : bioc3101_2016_read1.fastq (Sequence lengths (mean +/- std): 250.0000 +/- 0.0000)
16807541  : bioc3101_2016_read2.fastq (Sequence lengths (mean +/- std): 250.0000 +/- 0.0000)
50422623  : Total


Anyone got any ideas?

Thanks,
Michael

Michael Baron

unread,
Feb 23, 2016, 9:08:09 AM2/23/16
to qiime...@googlegroups.com
This get's even weirder. When I search for the sequence returned by the error:

grep -n M01520:130:000000000-ALUEB:1:2106:16170:1516 bioc3101_2016_barcodes.fastq
629:@M01520:130:000000000-ALUEB:1:2106:16170:1516 2:N:0:0

That is only on line 629. Perplexing.

And the next sequence ID is identical across all three sequences files again:

[sejj036@login09 761977.1.]$ sed '633q;d' bioc3101_2016_read1.fastq
@M01520:130:000000000-ALUEB:1:2106:15692:1516 1:N:0:0
[sejj036@login09 761977.1.]$ sed '633q;d' bioc3101_2016_read2.fastq
@M01520:130:000000000-ALUEB:1:2106:15692:1516 3:N:0:0
[sejj036@login09 761977.1.]$ sed '633q;d' bioc3101_2016_barcodes.fastq
@M01520:130:000000000-ALUEB:1:2106:15692:1516 2:N:0:0


Jamie Morton

unread,
Feb 23, 2016, 11:31:07 AM2/23/16
to Qiime 1 Forum
Hi Michael,

Its possible that the ordering is wonky.

Can you run the following command and post the output?

echo "tail -n 10 $TMPDIR/bioc3101_2016_read1.fastq"
echo
"tail -n 10 $TMPDIR/bioc3101_2016_read2.fastq"

Best,
Jamie

Michael Baron

unread,
Feb 23, 2016, 11:54:28 AM2/23/16
to Qiime 1 Forum
Hi Jamie,

That's what I thought as well, but both the beginning and the end align. That is besides the identifier that specifies which read the sequence is from: 1:N:0:0 vs 3:N:0:0

tail -n 10 bioc3101_2016_read1.fastq
+
AAAA1DC?F11B1BGFGGGCCAGEEG?E1B1DDFGHGGGGHH1E/AAEAEGG1EE?/>//>EFHHH2B11>FFFGHFH1>/</?FGDBGH210?>?CCHHH/<//??2FGHF11<?/AF?CFFDD111F<F.EG./...=CC@:C0C00;FGGFBF@GCAEFGFGGGGFGGGGGG90B/ABBFF-BB-9@@FF@@9@FE9@@-9-AEFFFF/;//9/BFBF/BFFBB?--:A---AB--9--;-;?A99A
@M01520:130:000000000-ALUEB:1:1109:18959:28767 1:N:0:0
TACAGAGGTCTCAAGCGTTGTTCGGATTCATTGGGCGTAAAGGGCGCGTAGGCGGCTAGGTAAGTCAGTGGTGAAATCTCCCAGCTTAACTCGGAAACTGCCATTGATACTATTTATCTTGAATACTTTGGAGGTAAGCGGAATTTGTCATGTAGCGGTGAAATGCTTAGATATGACATAGAACACCAATTGCGAAGGCAGCTTACTATACAGTTATTGACACTGAGGCACGAAAGCGTGGGGATCAAAC
+
AAAA11BAC11B3B1FFGG1AAFEEG0E1D2DFFGGEGGGHH1B//AAAEEF1EG///00BBFGHF@F22BGGFFHGH21100BFG1BBF21<E/?E<GHGF11>BGFGHH22222@@F@GF1FF<2@F1F1CG<<1?1A@C@<1111=1>FGFDD@CFGF0GDGGGHCHGHGH000000C;<CCGGE.C0C0C-;?EA.;9B009;000;0999;/;BBF/;BBBFFE-9A99-AB-9A@@@F9;-B9B
@M01520:130:000000000-ALUEB:1:1109:19301:28793 1:N:0:0
TTCTTTTTTTTCTCTCTTTTTTCTTATTTACTTTTCTTTCTTTTTTCTTCTTCTTTTTTTTTTTTCTTTTTTTTTTTCTCCTTTCTTTTCTTTTTTTTTTCTTTTTTTTCTTTTTTTCTTTTTTTTTTTTTTTTTTTTTTTTTTTTCTTTTTTCTCTTTTTTTTTCTTATTTTTCTTTTTTTTTTCCTTTTTCTTTTTCTTCTTTCTTTTTCTTTTTTTTCTCTTTTTTTCTTTTTCTTTTTTTTCTTTC
+
>11113@BA10B13B1BFG1DAABD2ADA22DBFDAAE12211D1/0AA22D221A11/>>///>01@111>////>01211<>>F111211111//->-<0=000-;-/;:0:0:-/;00009-9--------@--------9;-////;/9///9/://--99//B///;/;/99///9----////;9B////9///:/9///;//////;://9--/;9B9/9/-9//////////9/--9/////
[sejj036@login06 761977.1.]$ tail -n 10 bioc3101_2016_read2.fastq
+
AA1>AFCAFFFFGGB1AECGGGG000EGHFH1AEEE/A2A11BAFCFF10A///EEEGGHHGGG>0EE>/EGH@GDC0B1GGFHGGGGGFGFHHFHDEGGECGGF0BB01GHHG0A/</FHGHHBG0</A<CH0FGB<<--<=GH0GH000/AC--:;0CCCB9CFB9CF0009.9.9CFGBF?.9/;9CBBF/9/--9;-AAFB;9--;-A-BFBB-A@---;///99BB-----;;@9----;A9-9-
@M01520:130:000000000-ALUEB:1:1109:18959:28767 3:N:0:0
CCTGTTTGCTCCCCTCGCTTTCGTGCCTCAGCGTCAATAACTGTCCAGTAAGCTGCCTTCGCAACTGGTGTTCCTCCTCATATCTAAGCATTTCACCGCTACACCTCACATTCCGCTTTCCTCTTCAGTACTCAAGACAAACAGTCTCAATCGCAGTTCCCCAGTTAAGCTCGTAGCTTTCACCTCTGACTTACCTACCCTCCTACGCACCCTTTCCGCCCACTCATTCCGAACACCGCTTGCGACCTCT
+
AA1>AF@B@DFFGG1AEFAGGGA000FHFGG1BBEEAA2111A2A1DB212111BBFGGHHG///0B0B0FEH22100>BFBFGGG1BFFGGFHFG1EEGEE1>0001/1BFGF0>///BFFHF11212B1BG1BG10B00/B@G1@F2211/<//<?1?100CCD11?F11<<.>.11=<1=<0<000=0D00/00<<C0:CG00;-.;:B99FB0..;@9.://090;/..---;-9--;------:/
@M01520:130:000000000-ALUEB:1:1109:19301:28793 3:N:0:0
CCTTTTTTCTCCCCTCTCTTTCTTTCCTCTTCTTCTTTTTTTTTCCTTTTTTCCTCCTTCTCCTCTTTTTTTCCTCCTTTTCTCTTCTCTTTTCTCTTCTTCTCCTTTCTTTCCTCTCTCCTCTCCCTTCCTCTTTCTTTTCTTTTTCCTCTTCTTTTTCCTTTTTTTTCTTTTTTTTTTCTCTTTTTTCTTTTCTTTCCTCCTTCTTTCTCTTTTCTCCCTTTTTTTTTTTTTTTTTCTTTCTTCCTTT
+
111>>BD>>B31111110ABE3A3331D11112A221B11//AA0112211/01AA0BBB1111011@11>>01B10012B2B2B21@@2BBE21222>B12211BB212B>21>1B0<10<0B0000B00B1@1122111211>1<01>111111<?101111<<---/00000--::?/0099000-/990;000000;0/:00009////9://////-/9//-9----------/////////9//

Best,
Michael

Michael Baron

unread,
Feb 23, 2016, 12:07:16 PM2/23/16
to Qiime 1 Forum
Some hard google-ing brought up this notebook: http://nbviewer.jupyter.org/urls/gist.githubusercontent.com/jennomics/9746664/raw/2098a407805d48434fc4627bf275bbbb338b3ff4/gistfile1.txt

For whatever reason, s/he seems to adjust the read identifier. Is that still required? Do I need to replace 2:N:0:0 in the barcode file with 1:N:0:0?


On Tuesday, February 23, 2016 at 4:54:28 PM UTC, Michael Baron wrote:
That's what I thought as well, but both the beginning and the end align. That is besides the identifier that specifies which read the sequence is from: 1:N:0:0 vs 3:N:0:0

Merci,
Michael 

Michael Baron

unread,
Feb 23, 2016, 12:14:51 PM2/23/16
to qiime...@googlegroups.com
I'm checking what the joined forward and reverse sequences without using the barcode will look like. One of the other threads here on the forum indicated that this might be the problem.

There were some weird problems with ea-utils on a Mac. Though I'm working on a Red Hat 6 (or 7) cluster, one never knows.

Also, the error message actually mentions the joined sequences file.

edit: the failed run actually produced a joined fastq file, but the barcode-file was empty.

[sejj036@login06 fastq-join_joined]$ head -n 12 fastqjoin.join.fastq
@M01520:130:000000000-ALUEB:1:2106:16170:1516 1:N:0:0
TACAGAGGGGGCAAGCGTTGTTCGGAATTACTGGGCGTAAAGGGCGCGTAGGCGGCCTTCTAAGTCGAACGTGAAATCCCTGGGCTCAACCCGGGAACTGCGTCCGATACTGGAAGGCTTGAATCCGGGAGAGGGATGCGGAATTCCAGGTGTAGCGGTGAAATGCGTAGATATCTGGAGGAACACCGGTGGCGAAGGCGGCATCCTGGACCGGTATTGACGCTGAGGAGCGAAAGCCAGGGGAGCAAACGGG
+
BBBABF;AADEDFFGGGGGG;GHGGGFFHHGHHFFHGGG+BFFCHEEEEEG?D9E>@6G?FBEFHHG?FGEHHEEHGH7F$B?EFG2FCF:;$?F<CFGHHDA@>DFC>CHFFG>>>GHHGE<60ECGGHHGGCCCFE<BGFBCFFGHHFFFEGEBHFGHHFEEFFFFGGFHHGBFGGHHGEE@EGEEFFFFFHE:EEEFGB:EBGFFFGFFC1;D;FFEE9FDFGEF$ACGGGGGEG8GGFFFFAFF4A>A>
@M01520:130:000000000-ALUEB:1:2106:16740:1519 1:N:0:0
TACAGGGGGAGCAAGCGTTGTTCGGAATTACTGGGCGTAAAGGGCGCGTAGGCGGCCTTCTAAGTCGAACGTGAAATCCCTGGGCTCAACCCAGGAACTGCGTCCGATACTGGAAGGCTTGAATCCGGGAGAGGGATGCGGAATTCCAGGTGTAGCGGTGAAATGCGTAGATATCTGGAGGAACACCGGTGGCGAAGGCGGCATCCTGGACCGGTATTGACGCTGAGGCGCGAAAGCCAGGGGAGCAAACGGG
+
AAB?5CABB0A5C9GFGGG?99HGEGGGGGH<HFCFG7C,BFGFEA7EEEE?G>(E7G7HBF;GGGGAFEEGHE;FFGGGHC/CFG/CBC00CGEC<<FDDD7</GDC>AHFHFFC><GFFC<=DC0G?HHHGGCDHFGGFBBCFFHGFFFFGEGCEE;GFFBB>:>DGBBFBBBFFHH9DE?9>CEEABEEAHEEEEEAB15A01;AAA;/A19*A9A4911EEEEA000GF;GGGFFGGFFFFAFF3>>>>
@M01520:130:000000000-ALUEB:1:2106:14606:1523 1:N:0:0
TACGTAGGTGGCAAGCGTTGTCCGGAATTATTGGGCGTAAAGCGCGCGCAGGTGGTCCTTTAAGTCTGATGTGAAAGCCCACGGCTCAACCGTGGAGGGTCATTGGAAACTGGGGGACTTGAGTGCAGGAGAGAAGAGTGGAATTCCACGTGTAGCGGTGAAATGCGTAGATATCGAGAGGAACACCAGTGGCGAAGGCGAGTCTCTGGACGACTCCTGACACTGAGGCACGAAAGCCGGGGGAGCAAACAGG
+
BBBB3BF@FFBCB4AFFFGFG:HGGGGGHHD;HGDHFGC*BF:GGG/GGGFC-FFE7GG7GG4FGHHHFGHHHHFF<:91:?E<>E2EGF2@EHFDHG?4G>FHFF5F>FGEB/D/,>GHHFFHBG@<FGFGB1F<DGHHFBB<C=00<EFHHG>0FBEGGEFGFBEGGF>EFFCC1B@2FEF>B9CFFFEF/HEFEEFGBB;BFFBFFA92A1F(DFGFD1A;1GFA00EGGEEE91FGGFF:FFFFAAAAA

Michael Baron

unread,
Feb 24, 2016, 9:42:22 AM2/24/16
to Qiime 1 Forum
I managed to solve the issue by just removing all the `X:N:0:0`-type identifiers across all three sequence files. I'm not sure whether this is a bug, a feature, or a mistake by my sequencing people. It would be great if a Qiime developer could clarify.

Here's some example code:
sed 's/ 1:N:0:0//g' bioc3101_2016_read1.fastq > bioc3101_2016_read1.fixed.fastq

Thanks,
Michael

Michael Baron

unread,
Feb 25, 2016, 7:54:12 AM2/25/16
to qiime...@googlegroups.com
Well, I rejoiced too early. While the joining went seemingly ok, it now gives me errors when I try to split the library with the updated barcode file:

split_libraries_fastq.py -i $TMPDIR/fastq-join_joined/fastqjoin.join.fastq -b $TMPDIR/fastq-join_joined/fastqjoin.join_barcodes.fastq -o $TMPDIR/slout -m $TMPDIR/map.tsv --barcode_type 12

Traceback (most recent call last):
  File "/imports/home0/sejj036/qiime191/bin/split_libraries_fastq.py", line 365, in <module>
    main()
  File "/imports/home0/sejj036/qiime191/bin/split_libraries_fastq.py", line 344, in main
    for fasta_header, sequence, quality, seq_id in seq_generator:
  File "/imports/home0/sejj036/qiime191/lib/python2.7/site-packages/qiime/split_libraries_fastq.py", line 317, in process_fastq_single_end_read_file
    parse_fastq(fastq_read_f, strict=False, phred_offset=phred_offset)):
  File "/imports/home0/sejj036/qiime191/lib/python2.7/site-packages/skbio/parse/sequences/fastq.py", line 174, in parse_fastq
    seqid)
skbio.parse.sequences._exception.FastqParseError: Failed qual conversion for seq id: M01520:130:000000000-ALUEB:1:2106:16170:1516. This may be because you passed an incorrect value for phred_offset.

Turns out the sed-id refers to the first sequence in the files.
Reply all
Reply to author
Forward
0 new messages