Illumina Reads with Included Barcodes

94 views
Skip to first unread message

SJDebenport

unread,
Jul 12, 2013, 8:14:41 AM7/12/13
to qiime...@googlegroups.com
Hello everyone,

I have a dataset of paired end reads from an Illumina run, and I would like to work with these as individual forward and reverse read datasets. I used PandaSeq to stitch together my 16S reads, but I have ITS reads which do not overlap, and I would like to work with these individually.  I have a fastq file with the barcodes included in the sequence header, but not a separate barcode file.  The file format is as shown here:

@MCIC-SOLEXA_0051_FC:1:1:14637:1026#CGATGT/1
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+MCIC-SOLEXA_0051_FC:1:1:14637:1026#CGATGT/1
cQRQOXXXXX_T___WTWWTQTVTV_____BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
@MCIC-SOLEXA_0051_FC:1:1:4065:1039#CGATGT/1
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+MCIC-SOLEXA_0051_FC:1:1:4065:1039#CGATGT/1
KPPPQWWWWWQQ________BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
@MCIC-SOLEXA_0051_FC:1:1:4391:1040#CGATGT/1
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+MCIC-SOLEXA_0051_FC:1:1:4391:1040#CGATGT/1
BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB

With the barcode being the #CGATGT portion of the sequence name. Is there any way to split these sequences using this format and split_libraries_fastq.py?

Thank you,
Spencer

Tony Walters

unread,
Jul 12, 2013, 10:30:44 AM7/12/13
to qiime...@googlegroups.com
Hello Spencer,

There isn't a script in QIIME that does this directly, although there is an unofficial script which I've put here:

If you click on the "download gist" box on the left side, you'll get a compressed file that you can extract, and then run with a command like so:
python parse_bc_reads_labels.py fastq_fp bc_reads.fastq '#' 2
where fastq_fp is the filepath of your fastq file you want to get the barcodes from. You should get the output in the bc_reads.fastq file.

-Tony 


--
 
---
You received this message because you are subscribed to the Google Groups "Qiime Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qiime-forum...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

SJDebenport

unread,
Jul 15, 2013, 6:10:44 AM7/15/13
to qiime...@googlegroups.com
Hello Tony,

Thank you for the response! I tried to use this script, but ended up with the following error:

"Traceback (most recent call last):
  File "./parse_bcs_from_fastq_labels.py", line 22, in <module>
    from cogent.parse.fastq import MinimalFastqParser
ImportError: No module named cogent.parse.fastq"

Do you happen to know a way around that?

Thank you!

Spencer

Tony Walters

unread,
Jul 15, 2013, 10:08:16 AM7/15/13
to qiime...@googlegroups.com
Hello Spencer,

This does require PyCogent, sorry I didn't mention that. You can get the latest release here: http://sourceforge.net/projects/pycogent/files/PyCogent/1.5.3/PyCogent-1.5.3.tgz/download

PyCogent is required for QIIME too (although you might have an older version of QIIME with an earlier version of PyCogent).
-Tony

SJDebenport

unread,
Jul 15, 2013, 11:49:21 AM7/15/13
to qiime...@googlegroups.com
Hello Tony,

I downloaded and installed PyCogent according to the README file, and had the same error pop up. Right now I am running that parse script from a folder on my desktop - does it need to be in another location to work properly? I am using MacQIIME 1.7.0 and supposedly have PyCogent 1.5.3 as shown in the output of print_wiime_config.py:

System information
==================
         Platform: darwin
   Python version: 2.7.3 (default, Dec 19 2012, 09:12:08)  [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)]
Python executable: /macqiime/bin/python

Dependency versions
===================
                     PyCogent version: 1.5.3
                        NumPy version: 1.5.1
                   matplotlib version: 1.1.0
                  biom-format version: 1.1.2
                QIIME library version: 1.7.0
                 QIIME script version: 1.7.0
        PyNAST version (if installed): 1.2
RDP Classifier version (if installed): rdp_classifier-2.2.jar
          Java version (if installed): 1.6.0_51

QIIME config values
===================
                     blastmat_dir: None
                         sc_queue: all.q
      topiaryexplorer_project_dir: None
     pynast_template_alignment_fp: /macqiime/greengenes/core_set_aligned.fasta.imputed
                  cluster_jobs_fp: /macqiime/QIIME/bin/start_parallel_jobs.py
pynast_template_alignment_blastdb: None
assign_taxonomy_reference_seqs_fp: /macqiime/greengenes/gg_12_10_otus/rep_set/97_otus.fasta
                     torque_queue: friendlyq
   template_alignment_lanemask_fp: /macqiime/greengenes/lanemask_in_1s_and_0s
                    jobs_to_start: 1
                cloud_environment: False
                qiime_scripts_dir: /macqiime/QIIME/bin/
            denoiser_min_per_core: 50
                      working_dir: None
                    python_exe_fp: /macqiime/bin/python
                         temp_dir: /tmp/
                      blastall_fp: blastall
                 seconds_to_sleep: 60
assign_taxonomy_id_to_taxonomy_fp: /macqiime/greengenes/gg_12_10_otus/taxonomy/97_otu_taxonomy.txt

Do you have another idea on why this might not be working?

Thank you,
Spencer

Tony Walters

unread,
Jul 15, 2013, 12:12:13 PM7/15/13
to qiime...@googlegroups.com
did you start the macqiime environment before running it?

I'm heading to the airport, so I don't think I'll be able to respond to further queries for some time.

SJDebenport

unread,
Jul 15, 2013, 12:16:36 PM7/15/13
to qiime...@googlegroups.com
Oh of course I forgot to start up the macqiime environment first! That worked perfectly once I did.

Thank you again!

Spencer
Message has been deleted

Tony Walters

unread,
Jul 16, 2013, 8:06:24 AM7/16/13
to qiime...@googlegroups.com
Hello Spencer,

On your test samples, I had to lower the quality thresholds quite a bit to get data written out. Check your log file to see if you are getting a lot of sequences under the "Read too short after quality truncation".

Lower the -p value, and raise the -r -n values and see if that helps get output.

-Tony


On Tue, Jul 16, 2013 at 5:32 AM, SJDebenport <spencer....@gmail.com> wrote:
After separating the barcodes into an individual file, I have having some issues with split_libraries_fastq.py now not returning any sequences.

My Sequence file looks like this: 
@MCIC-SOLEXA_0051_FC:1:1:14637:1026#CGATGT/1
CATTTGAGCAGATTTGTCGTCACAGGTTGCGCCGCCAAAACGTCCGCTACAGTAACTTTTCCCAGGCTCAATCTCATC
+MCIC-SOLEXA_0051_FC:1:1:14637:1026#CGATGT/1
cQRQOXXXXX_T___WTWWTQTVTV_____BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
@MCIC-SOLEXA_0051_FC:1:1:4065:1039#CGATGT/1
GCTACGGGAGGCAGCAGCAAGGAATCTTCCACAATGGGCGCAAGCCTGATGGAGCAACGCCGCGTGCGGGAGGACGCC
+MCIC-SOLEXA_0051_FC:1:1:4065:1039#CGATGT/1
KPPPQWWWWWQQ________BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
@MCIC-SOLEXA_0051_FC:1:1:4391:1040#CGATGT/1
TCATCGATGAAGAACGCAGCAAAATCCGATACCTGGTGTGAATTGCAGAATCCCGCGAACCATCGAGATTTTGCACGC

While my trimmed barcode file looks like this:
@MCIC-SOLEXA_0051_FC:1:1:14637:1026#CGATGT/1
CGATGT
+MCIC-SOLEXA_0051_FC:1:1:14637:1026#CGATGT/1
FFFFFF
@MCIC-SOLEXA_0051_FC:1:1:4065:1039#CGATGT/1
CGATGT
+MCIC-SOLEXA_0051_FC:1:1:4065:1039#CGATGT/1
FFFFFF
@MCIC-SOLEXA_0051_FC:1:1:4391:1040#CGATGT/1
CGATGT

When I run split_libraries_fastq.py -i input_seqs.fq -b bc_reads.fq -m Map.txt -o slout/ --barcode_type 6, the resulting seqs.fna file contains nothing. Any idea why this might be occurring?

Thank you,
Spencer

Spencer Debenport

unread,
Jul 16, 2013, 8:08:46 AM7/16/13
to qiime...@googlegroups.com
Hello Tony,

I actually realized that I left out the --rev_comp_barcode modifier and am trying that out right now. I'll try those suggestions after this!

Thank you,
Spencer


--
 
---
You received this message because you are subscribed to a topic in the Google Groups "Qiime Forum" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/qiime-forum/eY_Nef5L6OI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to qiime-forum...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.
 
 



--
------
Spencer J. Debenport
Ph.D Candidate
Department of Plant Pathology
The Ohio State University - OARDC
Reply all
Reply to author
Forward
0 new messages