Hi all,
Just started using STAR recently (awesome software!) and decided I should join the group so I can learn more about what the app does and is capable of doing. I've written a shell script to parse through our data files and STAR handles them like a champ. (I'm happy to post the script if anyone is interested.)
Since these were our own data the files were obviously local. Now, however, I would like to tap into the vast array of publicly available data, particularly those available through the NCBI's SRA portal. These are compressed files that can be pulled from their remote location and parsed into FASTQ format using the SRA-toolkit app fastq-dump. The program fastq-dump can save files locally or pipe them to stdout. I was wondering if anyone has been able to (or if it is possible to) replace the --readFilesIn argument with the SRA id and use fastq-dump as the --readFilesCommand argument? I tried formatting the STAR call multiple ways to no avail. Below is one example:
STAR \
--genomeDir ../RNAseq_analysis/STAR_analysis/STAR_ref_genome \
--sjdbGTFfile /Volumes/Wade/RNAseq_analysis/Ref_genome/gencode.v24.annotation.gtf \
--runThreadN 8 \
--outFileNamePrefix ../RNAseq_analysis/STAR_analysis/STAR_aligned/SRR2968938_ \
--readFilesCommand fastq-dump -z \
--outSAMtype BAM Unsorted \
--outReadsUnmapped Fastx \
--outSAMmode Full \
--quantMode TranscriptomeSAM GeneCounts
I know I could just download the SRA file using fastq-dump to save it locally and then run STAR on the local file, but I'd like to set this up so I could process many files through our cluster and there is no reason to keep a local copy of the original (very large) data file.
Any help would be appreciated. Thanks!
Darren