Can I do split libraries on MiSeq reads without barcode reads?

1,804 views
Skip to first unread message

Zoey

unread,
Feb 13, 2013, 2:06:20 PM2/13/13
to qiime...@googlegroups.com
We sequenced some 16S rRNA gene amplicons on our collaborator's MiSeq, and got the raw fastq file that looks like the seqs below. 
It seems that there are only reads_1 and reads_2 in the file, and the index reads are missing. 
In addition, the 1:N:0 or 2:N:0 in the header are missing the sample number as seen in the MiSeq fastq files. 
I have the following questions for doing splitting libraries on this file:
1. Do the seqs look like they've already been de-multiplexed? If yes, is there a way to reverse it using the current fastq file?
If no to Q1, then 
2. Is the index-reads info still in this fastq file? 
3. If yes, Is there a way I can extract the index reads? 
4. If no, can I split libraries in Qiime without an index read file in the input?
Thank you for any feedback. 
@MISEQ04:37:000000000-A2G8E:1:1101:14157:1957 1:N:0:TCCACAGGAGT
TACAGAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCGCGTAGGTGGTTTGTTAAGTTGGATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATTCAAAACTGACTGACTAGAGTATGGTAGAGGGTGGTGGAATTTCCTGTGTAGCGGTGAAATGCGTAGATATAGGAAGGAACACCAGTGGCGAAGGCGACCACCTGGACTGATACTGACACTGAGGTGCGAAAGCGTGGGGAGCAAACA
+
?????BB?DDDDDEDDEEEEFFHIHECFFHHHIIIFHHHHIIIEHHHHEHHHAEFEHHEHHEGHHHHHHHHHHHFFFHHHHHHFFEFEFFFFFFFEEEFFFFFFFFFEFFFFFFFFEFFEFFEEFFFFFFFEEFEEFDEEEEEFFFFFFECEEEFEDDED?AACEEEEEDAEEEFEFEFEEEEEEEEEFECEE>?>?8A>;???EEFEEEFFCEE?*1::A:A0CCECA*14)48AEEEE>;;8;88:AC#
@MISEQ04:37:000000000-A2G8E:1:1101:14157:1957 2:N:0:TCCACAGGAGT
ACGGACTACCCGGGTTTCTAATCCTGTTTGCTCCCCACGCTTTCGCACCTCAGTGTCAGTATCAGTCCAGGTGGTCGCCTTCGCCACTGGTGTTCCTTCCTATATCTACGCATTTCACCGCTACACAGGAAATTCCACCACCCTCTACCATACTCTAGTCAGTCAGTTTTGAATGCAGTTCCCAGGTTGAGCCCGGGGATTTCACATCCAACTTAACAAACCACCAACCCGCGCTTTACGCCCAGCAATTC
+
?????@@BDDDDDDDDEFFFFFCFFHHHHHHGFHHHHHCDDHHDEDEHHHHHHHFEHFHGGGHHHHHHFCFHHHHF=EEEDCDEHEHHFCFHHEFHFHHFFFFFFFFFFFEEDEEDDDDEE<6@EBCEEFFFFECEEEEECEEE8:CEFFAECECEFAEFE?CEEEECAAAEAEEEFFCEFFFFEE?CEEAEFE'.8?88:?*:AE:CE?*1*:?C*?A?EAEE###########################
@MISEQ04:37:000000000-A2G8E:1:1101:14713:1991 1:N:0:TCCACAGGAGT
AACGGAGGGGGCAAGTGTTTCTCGCAATGACTGGGCCTAAAGGGCACGCAGGTGGTTTTCGACAACAGGTATTTCGGTTAAACACTGCAGGCTAACAACAGGTCTGGAATATCTACTAGGAAACTAAGAGTAGTGCTCAGGTCTTTAGAATTGCTAGCGGAGGGGTGGAATCCGGCGAGGCTAGTAGGAATGCTTATGAGTGAAGGCAATTTTCTGGAGCTGACTGACGCTCAGGTGCGCAAGCATGGGGA
+
9?????@@DDDDDDDDFFFFFFIEHHHHHHHIIHHIIHHHIHHHIHHEHHHHAEFEHIIIHHHH=FHHHC=DFHFFHEHHFFFFFFFFFDEEEFFFFFFFFFBEEFE=BEEEFFFFEEEEAECEFFFFFFCCEEFFFF?AECAEFFEEEEFFFFFEEEDD8<>DD)8>AEECEA?D?D?D>C?C??:E1?CEEAE?:CAECEAEFFFE8AEEF:?:A:8?*?*:?CAEEEEADCC*0??DD8<?ECEEEE#
@MISEQ04:37:000000000-A2G8E:1:1101:14713:1991 2:N:0:TCCACAGGAGT
ACGGACTACTGGGGTATCTAATCCTATTTGATCCCCATGCTTGCGCACCTGAGCGTCAGTCAGCTCCAGAAAATTGCCTTCACTCATAAGCATTCCTACTAGCCTCGCCGGATTCCACCCCTCCGCTAGCAATTCTAAAGACCTGAGCACTACTCTTAGTTTCCTAGTAGATATTCCAGACCTGTTGTTAGCCTGCAGTGTTTAACCGAAATACCTGTTGTCGCAAACCACCTGCGTGCCCTTTAGGCCCA
+
AAA?AABBDDDDDD<AFFFGFGHIHFFHHIIHHHHIHIIIIIHHHH@HHIIFHHHHHHHIIHHIIHHGHHFHIIIHHHFCECGHHFHIIHHHHHHHHHHHHHFHDHHHHGGGGGDEEGDEGCGGEEGGGGGGEEGGGEGEEEGCGGCGCEGGGGGGGGGEGGGGEGGEGG?CGGGGGEGGGGGGGGCGEGGCEEGGGGEECEG?C:?828<CCE?EGGGCCCC*.).CC?CEECE8CEC*11CEEE#####
@MISEQ04:37:000000000-A2G8E:1:1101:13997:2108 1:N:0:TCCACAGGAGT
TACGTAGGGTGCGAGCGTTAATCGGAATTACTGGGCGTAAAGCGTGCGCAGGCGGTTAATTAAACCAGTTGTGAAATCCCCGGGGTCAACCTGGGAATTGCATCTGTGACTGTATAGCTAGAGTACGGTAGAGGGGGATGGGATTCAGCGGGTAGCCGGGAAAAGCGTAGATATGCCGAGGAAACACGGAGGCGAAGGGAATTCTCTGGAACTGGACTTGCGCTCCTGCACGAAAAGCTGGGGAGGAAACA
+
?????BB?BDDDBBBDDDEEFFHIHHHHHHHIHHHIHHHHIHHHEHECEHECEHH<<<,,,,5,,44+4C,@D,CF,,@FF);@))34AAC################################################################################################################################################################
@MISEQ04:37:000000000-A2G8E:1:1101:13997:2108 2:N:0:TCCACAGGAGT
ACGGACTACAAGGGTTTCTAATCCTGTTTGCTCCCCACGCTTTCGTGCATGAGCGTCAGTACAGGTCCAGAGGATTGCCTTCGCCATCGGTGTTCCTCCGCATATCTACGCATTTCACTGCTACACGCGGAATTCCATCCCCCTCTACCGTACTCTAGCTATACAGTCACAGATGCAATTCCCAGGTTGAGCCCGGGGATTTCACAACTGTCTTATATAACCGCCTGCGCACGCTTTACGCCCAGCAATTC
+
?????@@BDDDBDD?BEFFFFFFHIIHHHHHIIHHHIC=DDFFGHHFHHIIIHFCCEEHGHIHHH-AEFHDDFFHHHFGGFFHHHHHFHECDEEDHHDFCDEDDFFDFFF@DDED=DEED=,ACFFAEDEDDAEFFFFE?C??8EEEF:8).:AAAEF?CEAECEA?:::CC:?EEEFFE?CCECE*?*:?ADDD84)*1:?EEEECA*00::*::CE:?>'.A?EDD;''')08*AEAD48?######

arp

unread,
Feb 13, 2013, 2:21:23 PM2/13/13
to qiime...@googlegroups.com
Hi Zoey,

The reads do not appear to have been demultiplexed.  I think that whether or not the indexing barcodes are in those sequences is a question for your collaborators or the sequencing center.  If you know some of the barcodes you'd be expecting to see you can run

grep <barcode> <fastq_file>

to see if the barcode <barcode> is present in the FASTQ file <fastq_file> and, if so, if they're in a consistent location.

Adam


--
 
---
You received this message because you are subscribed to the Google Groups "Qiime Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qiime-forum...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Zoey

unread,
Feb 13, 2013, 2:36:22 PM2/13/13
to qiime...@googlegroups.com
Hi Adam,

Thanks for your prompt reply. I checked a few barcodes, and they are indeed in the reads. The grep command also hit some shorter reads that look like this:
@MISEQ04:37:000000000-A2G8E:1:2114:16702:27621 1:N:0:AGCGTGTCCAA

Is this one of the barcode reads?

Thanks again! 
Zoey

arp

unread,
Feb 13, 2013, 2:48:23 PM2/13/13
to qiime...@googlegroups.com
Yes, it appears that the barcodes are in the headers.  Usually barcodes are Golay 12-bp barcodes, but yours are 11 bp; is this something you were expected?  It's possible that the MiSeq was not configured properly to get the full 12 bp of the barcodes, but that might be a question for your collaborators.

Please see this post where someone had a similar question and Tony Walters, another QIIME developer, posted a custom script to parse out the barcodes (thus making your reads amenable to processing with split_libraries_fastq.py)

Adam

Zoey

unread,
Feb 13, 2013, 3:14:14 PM2/13/13
to qiime...@googlegroups.com
Thank you, Adam. I also just found that the barcodes were 11 bp. Our collaborator (JGI) used the same barcodes as your lab, just the primers are staggered. I checked the barcode list they made, and the barcodes were also 11 bp. I haven't figured out why yet. 
Also the script from the previous post came in really handy! I was going to collect all the barcode reads from the fastq files using my very basic python knowledge, and am so happy to see that this has already been resolved before. 
Best,
Zoey

Zoey

unread,
Feb 13, 2013, 4:47:36 PM2/13/13
to qiime...@googlegroups.com
I tried Tony's code for parsing the barcode file, and it worked very well. :) Thank you guys! 

Just as a reminder for anyone who has a similar fastq file as mine, besides separating reads_1 and reads_2 from the raw reads, the barcode read file can only contain either 1:N:0 or 2:N:0 in the headers. In other words, the reads_1 barcode fastq should only contain the barcodes for reads_1, and vice versa. Otherwise, Qiime will complain about un-matching barcode fastq and read fastq files. 
Message has been deleted

Jai Ram Rideout

unread,
Mar 5, 2013, 9:08:18 PM3/5/13
to qiime...@googlegroups.com
Hi Mike,

Would you mind creating issues for the two things you mentioned above (file format docs and -b option to split_libraries_fastq.py)? Our issue tracker is here:


Thanks,
Jai


On Wed, Feb 27, 2013 at 9:58 AM, Mike <soilbd...@gmail.com> wrote:
I just wanted to say that I also have the same issue and will attempt to use Tony's script as well, with some modification. I am also working with JGI and the output looks like:


@MISEQ03:74:000000000-A2TYR:1:1101:14024:2827#TCCACAGGAGT/1
TACAGAGGGTGCGAGCGTTAATCGGATTTACTGGGCGTAAAGCGTGCGTAGGCGGCTTTTTAAGTCGGATGTGAAATCCCTGAGCTTAACTTAGGAATTGCATTCGATACTGGGAAGCTAGAGTATGGGAGAGGATGGTAGAATTCCAGGTGTAGCGGTGAAATGC
+
<???<BB?B5?<BBBBCCEEFFHFH?EHFFGHFHHHHHHHHHHHCCEC>EEEGHHACCFHFDHHCCDECDCDBBFFFBFFFFBDEDDDEBDEEEEEEEEE;BEE?ECBEEEC?C;;BEEEEEEEEEEAEEEEEEEEAECCEEEECEEEEE:AEEEE??E8A:CEAE
@MISEQ03:74:000000000-A2TYR:1:1101:8326:5023#TCCACAGGAGT/1
TACAGAGGATGCAAGCGTTATCCGGAATGATTGGGCGTAAAGCGTCTGTAGGTGGCTTTTTAAGTCCGCCGTCAAATCCCAGGGCTCAACCCTGGACAGGCGGTGGAAACTGCCAAGCTGGAGTACGGTAGGGGCAGAGGGAATTTCCGGTGGAGCGG
+
?????BB?DDDDDDDDEEEEFFIHEHHI/AGHIICFHHHHHFHHHHHIHHHIIIFHIHHIIIIIHHFEHHHHEHHHHHHHHHHHHHHHHHHHHHHFFFFFFEE@EEAEECEFFF=BEFEFFC?CEFEDAEEFFDDDEFFFE8ACEE?CED8>A*??>>
@MISEQ03:74:000000000-A2TYR:1:1101:7246:7419#TCCACAGGAGT/1
TACAGAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCGCGTAGGTGGTTAGTTAAGTTGGATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATTCAAAACTGACTGACTAGAGTATGGTAGAGGGTGGTGGAATTTCCTGTGTAGCGGTGAAATGC

So, it seems to me that most sequencing facilities do not see the generation of a barcode.fastq file as standard output. For example, I was told directly from the sequencing facility that the generation of barcode fastq files are not part of their standard pipline but they hope to have this added in the future. This makes me think that barcode fastq files are not standard? In which case, I'd like to know why the qiime software expects this file? Is this generated from Illumina software? Do the qiime developers have any suggestions that I can pass along to the sequencing facility in this regard?

Since many of use deal with a variety of sequencing centers, it would be great if the qiime developers can make a web page that outlines all the necessary files /formats that the end-user should expect to receive from the sequencing facility that makes it easier to import into qiime. This way we can just e-mail that link to the facility so there is no ambiguity about what files/ formats are needed. I think this would save a lot of time on many ends. I realize this is, to some degree, spelled out in the Illumina tutorial links, but I think a more explicit page to help the sequencing facilities prep the data would be useful. Anyway, just a suggestion. :-)

From what I read of the split_libraries_fastq.py script, reading-in the barcode fastq files via the -b flag is optional. So, I assume I can skip the need for the barcoded fastq files by simply adding the barcodes to the Mapping file? If not, then the wording of the script should be changed to indicate that -b is required.

As always, qiime community, thanks for all your hard work and help!

-M
Reply all
Reply to author
Forward
0 new messages