I received some single-end MiSeq 300 data from a collaborator and have been unsuccessfully attempting to get it passed through lOTUs. It rejects 100% of the barcodes. I ran the same data through split_libraries_fastq.py and everything works as expected, but I prefer to use lOTUs. I can also grep each barcode from the mapping file.
I wonder if it has something to do with the 300 base kit from Illumina? I have looked at the headers, etc. but can't identify the source of the problem.
=========================================================================
LotuS 1.506
=========================================================================
COMMAND
/usr/bin/perl ./
lotus.pl -i /mnt/data2/shandley/vaginal_mircrobiome/16S/april_2016/Undetermined_S0_L001_R1_001.fastq -barcode /mnt/data2/shandley/vaginal_mircrobiome/16S/april_2016/Undetermined_S0_L001_I1_001.fastq -m /mnt/data2/shandley/vaginal_mircrobiome/16S/april_2016/april_mapping_rc.txt -o ./april_test -s sdm_miSeq_300.txt -p miSeq
Checking for updates.. Your LotuS version is up-to-date!
=========================================================================
Reading mapping file
=========================================================================
Running UPARSE de novo sequence clustering..
Running fast LotuS mode..
------------ I/O configuration --------------
Input= /mnt/data2/shandley/vaginal_mircrobiome/16S/april_2016/Undetermined_S0_L001_R1_001.fastq
Output= ./april_test
Barcodes= /mnt/data2/shandley/vaginal_mircrobiome/16S/april_2016/Undetermined_S0_L001_I1_001.fastq
TempDir= ./april_test/tmpFiles/
------------ Configuration LotuS --------------
Sequencing platform=miseq
AmpliconType=SSU
OTU id=0.97
min unique read abundance=2
UCHIME REFDB, ABSKEW=/home/shandley/install_files/lotus_pipeline//DB//rdp_gold.fa, 2
OTU, Chimera prefix=OTU_, CHIMERA_
TaxonomicGroup=bacteria
--------------------------------------------
=========================================================================
Demultiplexing input files
elapsed time: 2 s
=========================================================================
This is sdm (simple demultiplexer) 1.27 beta.
Reading fastq.
/mnt/data2/shandley/vaginal_mircrobiome/16S/april_2016/Undetermined_S0_L001_R1_001.fastq
Setting to Sanger fastq version (q offset = 33).
[#############################################################] 100.00%
sdm 1.27 beta
Input File: /mnt/data2/shandley/vaginal_mircrobiome/16S/april_2016/Undetermined_S0_L001_R1_001.fastq
Output File: ./april_test/tmpFiles//demulti.fna
Statistics of high quality reads
Reads processed: 9,585,081
Rejected: 9,585,081
Accepted: 0 (0 were end-trimmed)
Bad Reads recovered with dereplication: 0
Min/Avg/Max stats Pair 1
- Seq Length : 0/0/0
- Quality : 0/0/0
- Median Seq Length : 0, Quality : 0
- Accum. Error -nan
Trimmed due to:
> 25 avg qual in 20 bp windows : 0
> (0.75) acc. errors, trimmed seqs : 0
Rejected due to:
< min Seq length (250) : 0
< avg Quality (25) : 0
< window (50 nt) avg. Quality (25) : 0
> max Seq length (1000) : 0
> (8) homo-nt run : 0
> (0) amb. Bases : 0
> (1.5) binomial est. errors : 0
Specific sequence searches:
-With fwd Primer remaining (<= 0 mismatches) : 0
-Barcode unidentified (max 0 errors) : 9,585,081
SampleID Barcode Instances
0231.172.1.1 GCACTACCGAAT 0
0675.493.1.1 AACCATCGGGTG 0
0677.491.1.1 ATTATACCTCGG 0
0590.427.1.1 ACCCGTATGATG 0
0713.519.2.1 CTGGAGCATGAC 0
0725.527.1.2 CAAGTGAGAGAG 0
0725.527.2.2 ACATGTCACGTG 0
0725.527.2.3 ACTTCAACTGTG 0
0725.527.2.4 CAGTGATCCTAG 0
0723.529.1.1 ATTTGGGTCATC 0
0723.529.1.2 TCCTACTCCGGT 0
0723.529.1.3 AGGTAGTCCTCA 0
0396.267.2.10 CGATCGAGTGTT 0
0262.198.2.9 TATTGCTCCTCC 0
0303.208.2.9 GTCTTCGTCGCT 0
0662.479.1.2 TACTAGGGCTTG 0
0055.051.1.5 AGCTCTCAGAGG 0
0135.111.1.3 ACTGCATCGAGG 0
0158.129.1.4 TGAATAGTCCGC 0
0421.295.1.2 CAATTGTGCACG 0
0421.295.1.3 GAACTTAGGCCG 0
0422.293.1.2 GATGTGAGCGCT 0
0880.641.1.2 GTGCCATAACCA 0
0847.616.1.2 ATCAGAACCTCG 0
0825.610.1.2 TCCACAGGAGTT 0
0760.569.1.2 CTACGACCATTA 0
0755.549.1.2 TCGACGGTGCAA 0
0730.536.1.2 ACGTCTGTAGCA 0
0671.486.1.2 TAGTCAGGCCAT 0
1031.761.1.2 TTGTATGTGCGT 0
0869.631.1.4 ACAGACCACTCA 0
blank8 ACGAGTGCTATC 0
0444.312.2.4 CACTGGTATATC 0
0691.499.1.2 ACACATGTCTAC 0
Evaluating and writing dereplicated reads..
Dereplication:
Accepted 0 unique sequences ( 2 ); average size in this set is -nan.
Uniques with insufficient abundance: 0 not passing derep conditions
I am using version lOTUs version 1.506 and the script works great for other data sets. I tried with some customized sdm files, but even with the default sdm file it doesn't seem to work. Here is the issued command