I have some data from a collaborator that I have been tackling:
1. It is a combination of 3, 454 runs. With 10 barcodes across 30 samples
2. I have .fastq files. Forward and reverse reads.
Python version: 2.7.6 (default, Jun 22 2015, 17:58:13) [GCC 4.8.2]
Python executable: /usr/bin/python
QIIME default reference information
For details on what files are used as QIIME's default references, see here:
QIIME library version: 1.9.1
QIIME script version: 1.9.1
qiime-default-reference version: 0.1.3
NumPy version: 1.8.2
SciPy version: 0.13.3
pandas version: 0.17.1
matplotlib version: 1.3.1
biom-format version: 2.1.4
h5py version: 2.5.0 (HDF5 version: 1.8.11)
qcli version: 0.1.0
pyqi version: 0.3.2
scikit-bio version: 0.2.3
PyNAST version: 1.2.2
Emperor version: 0.9.51
burrito version: 0.9.1
burrito-fillings version: 0.1.1
sortmerna version: SortMeRNA version 2.0, 29/11/2014
sumaclust version: SUMACLUST Version 1.0.00
swarm version: Swarm 1.2.19 [May 25 2016 14:36:46]
so I went through and extracted barcodes for each read, and split_libraries_fastq.py
I then cat the 3 resulting files, from the 3 runs.
To be more specific I extracted barcodes for all samples using the command:
extract_barcodes.py -f A1_R1.fastq -o barcodes_A1 -c barcode_single_end --bc1_len 8
Then I split libraries:
split_libraries_fastq.py -m map_run1.txt -i A1_R1.fastq -o split_lib_A1 --barcode_read_fps barcodes.fastq --barcode_type 8
I then cat all 30 of the resulting seq.fna files together.
Now that I am trying to pick_otus.py I am getting an error that leads me to believe that I have some errors in the upstream processing. I also read another thread that seems similar enough to my situation.
bfillings.uclust.UclustParseError: A seq id was provided as a seed, but that seq id already represents a cluster. Are there overlapping seq ids in your reference and input files or repeated seq ids in either? Offending seq id is QiimeExactMatch.SS4_11
So my question is: Does the split_libraries_fastq.py have a -n option like split_libraries.py? I know I can convert the fasq files to .qual and .fasta files. Do I have to do that? Is my error in the one by one processing and concatenation? How can I do it more efficiently and avoid these errors?
Thanks for the help!