Upstream processing error: overlapping seq ids

Roody_UF

unread,

Aug 30, 2016, 4:21:24 PM8/30/16

to qiime...@googlegroups.com

Hello all,

I have some data from a collaborator that I have been tackling:

1. It is a combination of 3, 454 runs. With 10 barcodes across 30 samples

2. I have .fastq files. Forward and reverse reads.

-------

System information

==================

Platform: linux2

Python version: 2.7.6 (default, Jun 22 2015, 17:58:13) [GCC 4.8.2]

Python executable: /usr/bin/python

QIIME default reference information

===================================

For details on what files are used as QIIME's default references, see here:

https://github.com/biocore/qiime-default-reference/releases/tag/0.1.3

Dependency versions

===================

QIIME library version: 1.9.1

QIIME script version: 1.9.1

qiime-default-reference version: 0.1.3

NumPy version: 1.8.2

SciPy version: 0.13.3

pandas version: 0.17.1

matplotlib version: 1.3.1

biom-format version: 2.1.4

h5py version: 2.5.0 (HDF5 version: 1.8.11)

qcli version: 0.1.0

pyqi version: 0.3.2

scikit-bio version: 0.2.3

PyNAST version: 1.2.2

Emperor version: 0.9.51

burrito version: 0.9.1

burrito-fillings version: 0.1.1

sortmerna version: SortMeRNA version 2.0, 29/11/2014

sumaclust version: SUMACLUST Version 1.0.00

swarm version: Swarm 1.2.19 [May 25 2016 14:36:46]

gdata: Installed.

-----

so I went through and extracted barcodes for each read, and split_libraries_fastq.py

I then cat the 3 resulting files, from the 3 runs.

To be more specific I extracted barcodes for all samples using the command:

extract_barcodes.py -f A1_R1.fastq -o barcodes_A1 -c barcode_single_end --bc1_len 8

Then I split libraries:

split_libraries_fastq.py -m map_run1.txt -i A1_R1.fastq -o split_lib_A1 --barcode_read_fps barcodes.fastq --barcode_type 8

I then cat all 30 of the resulting seq.fna files together.

Now that I am trying to pick_otus.py I am getting an error that leads me to believe that I have some errors in the upstream processing. I also read another thread that seems similar enough to my situation.

----

bfillings.uclust.UclustParseError: A seq id was provided as a seed, but that seq id already represents a cluster. Are there overlapping seq ids in your reference and input files or repeated seq ids in either? Offending seq id is QiimeExactMatch.SS4_11

---

So my question is: Does the split_libraries_fastq.py have a -n option like split_libraries.py? I know I can convert the fasq files to .qual and .fasta files. Do I have to do that? Is my error in the one by one processing and concatenation? How can I do it more efficiently and avoid these errors?

Thanks for the help!

-Roo

TonyWalters

unread,

Aug 30, 2016, 4:47:22 PM8/30/16

to Qiime 1 Forum

Hello Roody,

I think this option is the one you're looking for:

--start_seq_id

I would stick to using the fastq files in any case- the quality filtering is a bit different for the Illumina (split_libraries_fastq.py) and 454-style (split_libraries.py) data.

-Tony

Roody_UF

unread,

Aug 31, 2016, 10:27:11 AM8/31/16

to Qiime 1 Forum

1. I concatenate all the forward reads from 1 run (so not repeat barcodes)

2. extracted barcodes from the forward file

3. I then ran the join_paired_end script:

join_paired_ends.py -f A1_R1.fastq -r A1_R2.fastq -b A1_barcode/barcodes.fastq -o A1_joined

4. Then tried to run split_libraries:

split_libraries_fastq.py -m map1.txt -i A1_joined/fastqjoin.join -o A1_split_lib --barcode_read_fps A1_barcode/barcodes.fastq --barcode_type 8

Traceback (most recent call last):

File "/usr/local/bin/split_libraries_fastq.py", line 365, in <module>

main()

File "/usr/local/bin/split_libraries_fastq.py", line 344, in main

for fasta_header, sequence, quality, seq_id in seq_generator:

File "/usr/lib/python2.7/dist-packages/qiime/split_libraries_fastq.py", line 322, in process_fastq_single_end_read_file

raise FastqParseError("Headers of barcode and read do not match. Can't continue. "

qiime.split_libraries_fastq.FastqParseError: Headers of barcode and read do not match. Can't continue. Confirm that the barcode fastq and read fastq that you are passing match one another.

Does this come because I concatenated the files, initially? Should I be doing in one at a time or comma separated? Should I forget about the joining paired ends?

thanks!

TonyWalters

unread,

Aug 31, 2016, 11:52:44 AM8/31/16

to Qiime 1 Forum

Hello Roody,

My guess is that your step 4 --barcode_read_fps parameter needs to point to the barcodes fastq that is in the output folder (A1_joined/) from step 3, which should be filtered to have matching labels as the joined data.

Roody_UF

unread,

Sep 6, 2016, 4:07:02 PM9/6/16

to Qiime 1 Forum

Excellent! Thank you so much for your help! I just concatenated the 3 runs with their unique IDs and ran the pick_otus.py with no errors.

Reply all

Reply to author

Forward