OSX version 2.3.0 with --readFilesCommand option

2,140 views
Skip to first unread message

Shawn Driscoll

unread,
Mar 15, 2013, 2:32:40 PM3/15/13
to rna-...@googlegroups.com
It seems that if my fastq files are gzipped and I use the option --readFilesCommand gunzip what happens is the run unpacks the fastq files, as expected, but then doesn't align anything. If I run STAR a second time but on the un-zipped files, without the --readFilesCommand option, everything works great.  Is it possible that after the unzipping the code doesn't expect the file name to change (ie the .gz is dropped)?


Alexander Dobin

unread,
Mar 15, 2013, 2:41:43 PM3/15/13
to rna-...@googlegroups.com
Hi Shawn,

please use  --readFilesCommand zcat for gzipped files.
STAR spawns zcat processe(s) on your gzipped files and expects the unzipped output to go to stdout. Gunzip does not output to stdout, but creates a new (unzipped) files, so it will not work.
Note that STAR will not unzip into a temporary file, but rather use fifos for on-the-fly un-compression.

Instead of zcat you can use any other command script, provided that it can be run as
$ yourCommand inputFileFrom_readFilesIn > fastqFileForSTAR

Cheers
Alex

Shawn Driscoll

unread,
Mar 15, 2013, 2:45:22 PM3/15/13
to rna-...@googlegroups.com
Ah, I see.  I'm not familiar with zcat.  I did try that but then I got an error saying the read files didn't exist.  It was looking for the read files with .Z extension appended to them.

So I guess alternatively I could make a script like this and use it instead of zcat?

#!/bin/bash
gunzip -c $1

Alexander Dobin

unread,
Mar 15, 2013, 3:02:40 PM3/15/13
to rna-...@googlegroups.com
Thanks Shawn, the "gunzip -c" is a nice solution, apparently more portable than zcat.
You can type it directly into the command line:  --readFilesCommand gunzip -c   --readFilesIn R1 R2 ...

This looks like another discrepancy between OSX and Linux, on Linux zcat is equivalent to "gunzip -c", according to the man.

Shawn Driscoll

unread,
Mar 15, 2013, 4:40:27 PM3/15/13
to rna-...@googlegroups.com
Yeah. I can't understand why both don't use the same base set of Unix commands. I used Linux mostly prior to using OSX so I'm occasionally thrown off. That's solid that I can put the command plus option right in the command call.

By the way I've just started using this in Osx since I just noticed you've got it compiled. I've only aligned 2 samples of 100x2 paired reads so far but I have to say I really like it.  It's super fast compared to Tophat and I'm getting nearly 25% more data aligned. 

I was wondering - do you know if the mapping qualities are appropriate for SNP callers like samtools/bcftools?  Someone told me once that those tools rely on the MAPQ field heavily for probabilities based on confidence of alignment. From what I've heard the MAPQ settings that BWA produces are the most compatible because I think they actually incorporate penalties for gaps and mismatches.  Unfortunately with BWA I have to come up with some kind of transcriptome alignments to genome alignments converter or else, in mouse with 100x2 reads, I'll be excluding nearly 30% of the CDS exons in the genome which is just silly so I'd much rather use something like STAR.

Alexander Dobin

unread,
Mar 15, 2013, 6:49:57 PM3/15/13
to rna-...@googlegroups.com
Unfortunately, STAR's mapping qualities are not very meaningful at the moment and will likely throw off most SNP callers.
The quality scores follow TopHat's definitions (in one if its versions):
MAPQ = 255 for uniquely mapping reads
            =  int(-10*log10(1-1/Nmap)) for multi-mapping reads.

I have it in my plans to work out a good estimate of the mapping quality, but it can happen no earlier then July-August.
Reply all
Reply to author
Forward
0 new messages