Illumina Reads with Included Barcodes

SJDebenport

unread,

Jul 12, 2013, 8:14:41 AM7/12/13

to qiime...@googlegroups.com

Hello everyone,

I have a dataset of paired end reads from an Illumina run, and I would like to work with these as individual forward and reverse read datasets. I used PandaSeq to stitch together my 16S reads, but I have ITS reads which do not overlap, and I would like to work with these individually. I have a fastq file with the barcodes included in the sequence header, but not a separate barcode file. The file format is as shown here:

@MCIC-SOLEXA_0051_FC:1:1:14637:1026#CGATGT/1

NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

+MCIC-SOLEXA_0051_FC:1:1:14637:1026#CGATGT/1

cQRQOXXXXX_T___WTWWTQTVTV_____BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB

@MCIC-SOLEXA_0051_FC:1:1:4065:1039#CGATGT/1

NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

+MCIC-SOLEXA_0051_FC:1:1:4065:1039#CGATGT/1

KPPPQWWWWWQQ________BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB

@MCIC-SOLEXA_0051_FC:1:1:4391:1040#CGATGT/1

NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

+MCIC-SOLEXA_0051_FC:1:1:4391:1040#CGATGT/1

BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB

With the barcode being the #CGATGT portion of the sequence name. Is there any way to split these sequences using this format and split_libraries_fastq.py?

Thank you,

Spencer

Tony Walters

unread,

Jul 12, 2013, 10:30:44 AM7/12/13

to qiime...@googlegroups.com

Hello Spencer,

There isn't a script in QIIME that does this directly, although there is an unofficial script which I've put here:

https://gist.github.com/walterst/5984883

If you click on the "download gist" box on the left side, you'll get a compressed file that you can extract, and then run with a command like so:

python parse_bc_reads_labels.py fastq_fp bc_reads.fastq '#' 2

where fastq_fp is the filepath of your fastq file you want to get the barcodes from. You should get the output in the bc_reads.fastq file.

-Tony

--

---
You received this message because you are subscribed to the Google Groups "Qiime Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qiime-forum...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

SJDebenport

unread,

Jul 15, 2013, 6:10:44 AM7/15/13

to qiime...@googlegroups.com

Hello Tony,

Thank you for the response! I tried to use this script, but ended up with the following error:

"Traceback (most recent call last):

File "./parse_bcs_from_fastq_labels.py", line 22, in <module>

from cogent.parse.fastq import MinimalFastqParser

ImportError: No module named cogent.parse.fastq"

Do you happen to know a way around that?

Thank you!

Spencer

Tony Walters

unread,

Jul 15, 2013, 10:08:16 AM7/15/13

to qiime...@googlegroups.com

Hello Spencer,

This does require PyCogent, sorry I didn't mention that. You can get the latest release here: http://sourceforge.net/projects/pycogent/files/PyCogent/1.5.3/PyCogent-1.5.3.tgz/download

PyCogent is required for QIIME too (although you might have an older version of QIIME with an earlier version of PyCogent).

-Tony

SJDebenport

unread,

Jul 15, 2013, 11:49:21 AM7/15/13

to qiime...@googlegroups.com

Hello Tony,

I downloaded and installed PyCogent according to the README file, and had the same error pop up. Right now I am running that parse script from a folder on my desktop - does it need to be in another location to work properly? I am using MacQIIME 1.7.0 and supposedly have PyCogent 1.5.3 as shown in the output of print_wiime_config.py:

System information

==================

Platform: darwin

Python version: 2.7.3 (default, Dec 19 2012, 09:12:08) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)]

Python executable: /macqiime/bin/python

Dependency versions

===================

PyCogent version: 1.5.3

NumPy version: 1.5.1

matplotlib version: 1.1.0

biom-format version: 1.1.2

QIIME library version: 1.7.0

QIIME script version: 1.7.0

PyNAST version (if installed): 1.2

RDP Classifier version (if installed): rdp_classifier-2.2.jar

Java version (if installed): 1.6.0_51

QIIME config values

===================

blastmat_dir: None

sc_queue: all.q

topiaryexplorer_project_dir: None

pynast_template_alignment_fp: /macqiime/greengenes/core_set_aligned.fasta.imputed

cluster_jobs_fp: /macqiime/QIIME/bin/start_parallel_jobs.py

pynast_template_alignment_blastdb: None

assign_taxonomy_reference_seqs_fp: /macqiime/greengenes/gg_12_10_otus/rep_set/97_otus.fasta

torque_queue: friendlyq

template_alignment_lanemask_fp: /macqiime/greengenes/lanemask_in_1s_and_0s

jobs_to_start: 1

cloud_environment: False

qiime_scripts_dir: /macqiime/QIIME/bin/

denoiser_min_per_core: 50

working_dir: None

python_exe_fp: /macqiime/bin/python

temp_dir: /tmp/

blastall_fp: blastall

seconds_to_sleep: 60

assign_taxonomy_id_to_taxonomy_fp: /macqiime/greengenes/gg_12_10_otus/taxonomy/97_otu_taxonomy.txt

Do you have another idea on why this might not be working?

Thank you,

Spencer

Tony Walters

unread,

Jul 15, 2013, 12:12:13 PM7/15/13

to qiime...@googlegroups.com

did you start the macqiime environment before running it?

I'm heading to the airport, so I don't think I'll be able to respond to further queries for some time.

SJDebenport

unread,

Jul 15, 2013, 12:16:36 PM7/15/13

to qiime...@googlegroups.com

Oh of course I forgot to start up the macqiime environment first! That worked perfectly once I did.

Thank you again!

Spencer

Message has been deleted

Tony Walters

unread,

Jul 16, 2013, 8:06:24 AM7/16/13

to qiime...@googlegroups.com

Hello Spencer,

On your test samples, I had to lower the quality thresholds quite a bit to get data written out. Check your log file to see if you are getting a lot of sequences under the "Read too short after quality truncation".

Lower the -p value, and raise the -r -n values and see if that helps get output.

-Tony

On Tue, Jul 16, 2013 at 5:32 AM, SJDebenport <spencer....@gmail.com> wrote:

After separating the barcodes into an individual file, I have having some issues with split_libraries_fastq.py now not returning any sequences.

My Sequence file looks like this:

@MCIC-SOLEXA_0051_FC:1:1:14637:1026#CGATGT/1

CATTTGAGCAGATTTGTCGTCACAGGTTGCGCCGCCAAAACGTCCGCTACAGTAACTTTTCCCAGGCTCAATCTCATC

+MCIC-SOLEXA_0051_FC:1:1:14637:1026#CGATGT/1
cQRQOXXXXX_T___WTWWTQTVTV_____BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
@MCIC-SOLEXA_0051_FC:1:1:4065:1039#CGATGT/1

GCTACGGGAGGCAGCAGCAAGGAATCTTCCACAATGGGCGCAAGCCTGATGGAGCAACGCCGCGTGCGGGAGGACGCC

+MCIC-SOLEXA_0051_FC:1:1:4065:1039#CGATGT/1
KPPPQWWWWWQQ________BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB

@MCIC-SOLEXA_0051_FC:1:1:4391:1040#CGATGT/1

TCATCGATGAAGAACGCAGCAAAATCCGATACCTGGTGTGAATTGCAGAATCCCGCGAACCATCGAGATTTTGCACGC

While my trimmed barcode file looks like this:

@MCIC-SOLEXA_0051_FC:1:1:14637:1026#CGATGT/1

CGATGT

+MCIC-SOLEXA_0051_FC:1:1:14637:1026#CGATGT/1

FFFFFF

@MCIC-SOLEXA_0051_FC:1:1:4065:1039#CGATGT/1

CGATGT

+MCIC-SOLEXA_0051_FC:1:1:4065:1039#CGATGT/1

FFFFFF

@MCIC-SOLEXA_0051_FC:1:1:4391:1040#CGATGT/1

CGATGT

When I run split_libraries_fastq.py -i input_seqs.fq -b bc_reads.fq -m Map.txt -o slout/ --barcode_type 6, the resulting seqs.fna file contains nothing. Any idea why this might be occurring?

Thank you,
Spencer

Spencer Debenport

unread,

Jul 16, 2013, 8:08:46 AM7/16/13

to qiime...@googlegroups.com

Hello Tony,

I actually realized that I left out the --rev_comp_barcode modifier and am trying that out right now. I'll try those suggestions after this!

Thank you,

Spencer

--

---
You received this message because you are subscribed to a topic in the Google Groups "Qiime Forum" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/qiime-forum/eY_Nef5L6OI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to qiime-forum...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--

------

Spencer J. Debenport

Ph.D Candidate

Department of Plant Pathology

The Ohio State University - OARDC

(330) 202-3555 x2863

spencer....@gmail.com | deben...@osu.edu

Reply all

Reply to author

Forward