Has anyone had experience running the pipeline described in Burke et al. "A method for high precision sequencing of near full-length 16S rRNA genes on an Illumina MiSeq"? It requires the use of a demux command in the Phylosift package. The scripts can be found on
this github. There is no documentation on this pipeline so I'm flying by the seat of my pants. I'm unable to complete the command successfully on the published data from the paper.
I'm running this command:
./phylosift demux ~/Burke_etal/fastq/ --sample-map ~/Burke_etal/sample_map.txt --barcode-table ~/Burke_etal/barcode_table.txt --output ~/Burke_etal/demux_output/demux_output --debug --cluster --barcode-pos 10
Couple hours later, I'm getting 5 files outputted:
fastq.gz (empty)
random_read1.fa
random_clust1.fa
random_read2.fa (empty)
random_seq.fa (empty)
Problem is the random_ read2 and random_seq files are empty. I've attached the read1.fa and clust1.fa files.
I'm getting a reoccurring error as the script runs that says:
Use of uninitialized value in substr at /home/andrew/Burke_etal/longas-master/scripts/PhyloSift/bin/../lib/Phylosift/Command/
demux.pm line 566, <$FH> line 3329172.
substr outside of string at /home/andrew/Burke_etal/longas-master/scripts/PhyloSift/bin/../lib/Phylosift/Command/
demux.pm line 566, <$FH> line 3329172.
The line # increases by four as it runs the code, so that must be running through each read in the fastq files? It is worth noting that I had to comment out the "use String::Approx 'amatch';" line in the
demux.pm file? If I don't comment out that line, the script errors out immediately. So maybe that is causing an issue?
I've attached the files I'm using as sample_map and barcode_table.txt in case those are formatted incorrectly or something.
-Andrew (University of Colorado Boulder)