Bear with me, I'm mostly a wet bench biochemist. Note that I'm using a virualized version of Ubuntu 14.04 as my platform.
I'm trying to use biopieces to clean up and interpret some RNAseq (small RNAs mainly) data I pulled off of the NCBI SRA database. An example entry is SRR070293, acquired here -->
http://sra.dnanexus.com/runs/SRR070293. All the results I'm using came from Illumina systems, model unspecified.
These downloads come in the .sra file type, but the SRA Toolkit provides a method for converting these into FASTQ. I used this to convert the example entry into SRR070293.fastq. An example looks like this:
@SRR070293.126 HWI-EAS382_30FC7AAXX:7:1:1617:871 length=36
AACTGAGTGGCATAAATCTTTGATCGTATGCCGTCT
+SRR070293.126 HWI-EAS382_30FC7AAXX:7:1:1617:871 length=36
IIIIIIIIIIIIIIIIIIIII=II6IIII3I";I3I
Using the biopieces HowTo guide on cleaning up NGS data, I started stepping through the instructions, only to find that the read_fastq biopiece universally rejects all of the FASTQ files I feed it, with the following output:
dan@dan-VirtualBox:/media/sf_SRA/FASTQ/S$ read_fastq -i SRA967633.fastq
/home/dan/biopieces/code_ruby/lib/maasha/biopieces.rb:537:in `block in options_check_files': File not readable: 'SRA967633.fastq' (ArgumentError)
from /home/dan/biopieces/code_ruby/lib/maasha/biopieces.rb:535:in `each'
from /home/dan/biopieces/code_ruby/lib/maasha/biopieces.rb:535:in `options_check_files'
from /home/dan/biopieces/code_ruby/lib/maasha/biopieces.rb:493:in `block in options_check'
from /home/dan/biopieces/code_ruby/lib/maasha/biopieces.rb:488:in `each'
from /home/dan/biopieces/code_ruby/lib/maasha/biopieces.rb:488:in `options_check'
from /home/dan/biopieces/code_ruby/lib/maasha/biopieces.rb:396:in `options_parse'
from /home/dan/biopieces/code_ruby/lib/maasha/biopieces.rb:75:in `options_parse'
from /home/dan/biopieces/bp_bin/read_fastq:44:in `<main>'
I looked through the google group, and I noted that you've previously suggested problems with the -e setting, so I've tried using the settings '33', '64', and illumina1.8. I've also tried it with the default. As far as I can tell, the quality scores in the FASTQ data range from ! to I, which should match Phred33, but all of the conditions I've fed it give exactly the same error.
I also did some troubleshooting. The SRA database adds some additional description to the @ and + lines, which I tried removing, so that the input looked like the native output:
@HWI-EAS382_30FC7AAXX:7:1:1617:871
AACTGAGTGGCATAAATCTTTGATCGTATGCCGTCT
+HWI-EAS382_30FC7AAXX:7:1:1617:871
IIIIIIIIIIIIIIIIIIIII=II6IIII3I";I3I
This returned the same error.
I also directly converted the FASTQ files into FASTA files using the FASTX toolkit, and used that output as input for the read_fasta biopiece. This worked exactly like the manual said it did.
So to that end, is this a PEBCAK error, or is something wrong with the read_fastq biopiece? Or is it something else?
I'll also add that when I run bp_test, the output shows
Biopieces tested: 86 Tests run: 293 OK: 277 FAIL: 16 WARNING: 0 Time: 78 secs
The failures were in the following biopieces.
plot_distribution
uclust_seq
usearch_seq
blast_seq
blast_seq_pair
I don't think these are related, but I'm not sure if it's important. I can provide the full diagnostic if needed.
Thanks,
Dan