Problems with sdm on MiSeq 300 kit data

65 views
Skip to first unread message

Scott Handley

unread,
May 2, 2016, 2:27:04 PM5/2/16
to LotuS rRNA pipeline
Hi Falk,

I received some single-end MiSeq 300 data from a collaborator and have been unsuccessfully attempting to get it passed through lOTUs. It rejects 100% of the barcodes. I ran the same data through split_libraries_fastq.py and everything works as expected, but I prefer to use lOTUs. I can also grep each barcode from the mapping file.

I wonder if it has something to do with the 300 base kit from Illumina? I have looked at the headers, etc. but can't identify the source of the problem.

Here is the first read from the Read_1 file:


@M03475:63:000000000-AMR9K:1:1101:14699:1688 1:N:0:0
TACGTATGGTGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGTGTAGGCGGCACTATAAGTCTGATGTGAAAACCTATGGCTTAACCATAGGATTGCATTGGAAACTGTAGAGCTGGAGTATCGGAGAGGCAAGCGGAATTCCTGGTGTAGTGGTGAAATACGTAGATATCAGGAAGAACATCGGTGGCGAAGGCGGCTTGCTGGTCGATAACTGACGCTAAGGCTCGAAAGTGTGGGAAGCGAACAGGATTAGAAACCCTAGTAGTCCGGCTGACTGACTAGCTCTCAGTGTATT
+
>1>1>1B1131@11A11AE1FAB000BEG12D210F0E12221///B0F2111////>F1D222DFGD1DDGF2211B>F11@1?GH11B@@B1111BF01>FG>B111BF1GBD2B>GFECFGEGG?//B@CC/<CA/<///FGGDGDBHEHFFGFFGA1DDF=1<DFCEHBGH00G/CC.<:/CCA.E?.C@-??CE-??A@--B/9/9--;-;9BF/BAAF9-BBFFF;---:B/FB??-/9A@--9@BE?BFFFB//999A;-BF/FFF??-@-A-B/B/B/9/BBFB///://;/

And here is the associated barcode file:

@M03475:63:000000000-AMR9K:1:1101:14699:1688 1:N:0:0
AGCTCTCAGAGG
+
1AAAAFD1@111

And here is the first few lines from the mapping file:

#SampleID BarcodeSequence LinkerPrimerSequence Primer SampleType Description
0231.172.1.1 GCACTACCGAAT CCGGACTACHVGGGTWTCTAAT 197 FRESH z
0675.493.1.1 AACCATCGGGTG CCGGACTACHVGGGTWTCTAAT 227 FRESH z
0677.491.1.1 ATTATACCTCGG CCGGACTACHVGGGTWTCTAAT 232 FRESH z
0590.427.1.1 ACCCGTATGATG CCGGACTACHVGGGTWTCTAAT 271 FRESH z
0713.519.2.1 CTGGAGCATGAC CCGGACTACHVGGGTWTCTAAT 389 FRESH z
0725.527.1.2 CAAGTGAGAGAG CCGGACTACHVGGGTWTCTAAT 396 FRESH z
0725.527.2.2 ACATGTCACGTG CCGGACTACHVGGGTWTCTAAT 398 FRESH z
0725.527.2.3 ACTTCAACTGTG CCGGACTACHVGGGTWTCTAAT 399 FRESH z
0725.527.2.4 CAGTGATCCTAG CCGGACTACHVGGGTWTCTAAT 400 FRESH z



And here is the lOTUs output:

=========================================================================

          LotuS 1.506
=========================================================================
COMMAND
/usr/bin/perl ./lotus.pl -i /mnt/data2/shandley/vaginal_mircrobiome/16S/april_2016/Undetermined_S0_L001_R1_001.fastq -barcode /mnt/data2/shandley/vaginal_mircrobiome/16S/april_2016/Undetermined_S0_L001_I1_001.fastq -m /mnt/data2/shandley/vaginal_mircrobiome/16S/april_2016/april_mapping_rc.txt -o ./april_test -s sdm_miSeq_300.txt -p miSeq

Checking for updates..  Your LotuS version is up-to-date!
=========================================================================
          Reading mapping file
=========================================================================
Running UPARSE de novo sequence clustering..
Running fast LotuS mode..
------------ I/O configuration --------------
Input=   /mnt/data2/shandley/vaginal_mircrobiome/16S/april_2016/Undetermined_S0_L001_R1_001.fastq
Output=  ./april_test
Barcodes= /mnt/data2/shandley/vaginal_mircrobiome/16S/april_2016/Undetermined_S0_L001_I1_001.fastq
TempDir= ./april_test/tmpFiles/
------------ Configuration LotuS --------------
Sequencing platform=miseq
AmpliconType=SSU
OTU id=0.97
min unique read abundance=2
UCHIME REFDB, ABSKEW=/home/shandley/install_files/lotus_pipeline//DB//rdp_gold.fa, 2
OTU, Chimera prefix=OTU_, CHIMERA_
TaxonomicGroup=bacteria
--------------------------------------------
=========================================================================
          Demultiplexing input files
           elapsed time: 2 s
=========================================================================

This is sdm (simple demultiplexer) 1.27 beta.

Reading fastq.
/mnt/data2/shandley/vaginal_mircrobiome/16S/april_2016/Undetermined_S0_L001_R1_001.fastq

Setting to Sanger fastq version (q offset = 33).

   [#############################################################] 100.00%
sdm 1.27 beta
Input File:  /mnt/data2/shandley/vaginal_mircrobiome/16S/april_2016/Undetermined_S0_L001_R1_001.fastq
Output File: ./april_test/tmpFiles//demulti.fna
Statistics of high quality reads

Reads processed: 9,585,081
Rejected: 9,585,081
Accepted: 0 (0 were end-trimmed)
Bad Reads recovered with dereplication: 0
Min/Avg/Max stats Pair 1
     - Seq Length : 0/0/0
     - Quality :   0/0/0
     - Median Seq Length : 0, Quality : 0
     - Accum. Error -nan
Trimmed due to:
  > 25 avg qual in 20 bp windows :         0
  > (0.75) acc. errors, trimmed seqs :     0
Rejected due to:
  < min Seq length (250)  :                0
  < avg Quality (25)  :                    0
  < window (50 nt) avg. Quality (25)  :    0
  > max Seq length (1000)  :               0
  > (8) homo-nt run  :                     0
  > (0) amb. Bases  :                      0
  > (1.5) binomial est. errors :           0
Specific sequence searches:
  -With fwd Primer remaining (<= 0 mismatches) :           0
  -Barcode unidentified (max 0 errors) :                   9,585,081

SampleID Barcode Instances
0231.172.1.1 GCACTACCGAAT 0
0675.493.1.1 AACCATCGGGTG 0
0677.491.1.1 ATTATACCTCGG 0
0590.427.1.1 ACCCGTATGATG 0
0713.519.2.1 CTGGAGCATGAC 0
0725.527.1.2 CAAGTGAGAGAG 0
0725.527.2.2 ACATGTCACGTG 0
0725.527.2.3 ACTTCAACTGTG 0
0725.527.2.4 CAGTGATCCTAG 0
0723.529.1.1 ATTTGGGTCATC 0
0723.529.1.2 TCCTACTCCGGT 0
0723.529.1.3 AGGTAGTCCTCA 0
0396.267.2.10 CGATCGAGTGTT 0
0262.198.2.9 TATTGCTCCTCC 0
0303.208.2.9 GTCTTCGTCGCT 0
0662.479.1.2 TACTAGGGCTTG 0
0055.051.1.5 AGCTCTCAGAGG 0
0135.111.1.3 ACTGCATCGAGG 0
0158.129.1.4 TGAATAGTCCGC 0
0421.295.1.2 CAATTGTGCACG 0
0421.295.1.3 GAACTTAGGCCG 0
0422.293.1.2 GATGTGAGCGCT 0
0880.641.1.2 GTGCCATAACCA 0
0847.616.1.2 ATCAGAACCTCG 0
0825.610.1.2 TCCACAGGAGTT 0
0760.569.1.2 CTACGACCATTA 0
0755.549.1.2 TCGACGGTGCAA 0
0730.536.1.2 ACGTCTGTAGCA 0
0671.486.1.2 TAGTCAGGCCAT 0
1031.761.1.2 TTGTATGTGCGT 0
0869.631.1.4 ACAGACCACTCA 0
blank8 ACGAGTGCTATC 0
0444.312.2.4 CACTGGTATATC 0
0691.499.1.2 ACACATGTCTAC 0
Evaluating and writing dereplicated reads..


Dereplication:
Accepted 0 unique sequences ( 2 ); average size in this set is -nan.
Uniques with insufficient abundance: 0 not passing derep conditions


I am using version lOTUs version 1.506 and the script works great for other data sets. I tried with some customized sdm files, but even with the default sdm file it doesn't seem to work. Here is the issued command

./lotus.pl -i $DIR/Undetermined_S0_L001_R1_001.fastq \
-barcode $DIR/Undetermined_S0_L001_I1_001.fastq \
-m $DIR/april_mapping_rc.txt \
-o ./test \
-p miSeq

Any thoughts? 

Scott


Falk Hildebrand

unread,
May 4, 2016, 12:11:03 PM5/4/16
to LotuS rRNA pipeline
Hey Scott,
the problem is that I never programmed sdm to take only one fastq + miseq. I'll try to implement something over the next days, but can't promise anything Re timing, so probably best to just go ahead with the python scripts and give a list of demultiplexed files to lotus in the mapping file (see also automap script).
best,
Falk

Scott Handley

unread,
May 5, 2016, 10:48:58 AM5/5/16
to LotuS rRNA pipeline
Hi Falk,

As always, thanks for the awesome reply and the willingness to adapt. I was wondering if it was an issue with single-end reads with lOTUs integrating flash merging, etc. This will hopefully be our only run that is single-end mode. It was a collaborators decision before we got involved. But I am certain others are doing this as well, so there will likely be other lOTUs users with similar requests.

I will work with the automap script in the meantime. Thanks!

Scott

Reply all
Reply to author
Forward
0 new messages