--readFileCommand not passing decompressed files appropriately

838 views
Skip to first unread message

Sean Taylor

unread,
Jun 24, 2015, 3:24:10 PM6/24/15
to rna-...@googlegroups.com
Hello, 

My lab is putting together a new linux system with CentOS release 7.1.1503. We have installed RNA-Star through Lab7 BioBuilds r2015.04 (biobuilds.org) which includes RNAStar 2.4.0j.

I was running through some test workflows to make sure things are all running smoothly and hit a snag with RNA-Star. I want to feed in my input files as fastq.gz format, so I included --readFileCommand zcat. Here is the command I used:

$  STAR --genomeDir $RNA_HOME/refs/hg19/star/22 --readFilesCommand zcat --readFilesIn H_KH-540077-Normal-cDNA-1-lib2_ds_10pc_*.fastq.gz --runThreadN 8 --outFileNamePrefix Normal_cDNA1_lib2/ --outSAMstrandField intronMotif

The resulting sam file was empty and the log file indicated that no reads were read. I have tried --readFilesCommand gzip -c and --readFilesCommand gunzip -c with the same result. If I perform decompression first and then pass in the uncompressed fastq files everything appears to work as expected. I have confirmed that zcat and gzip work as expected on their own.

Any advice or help would be appreciated.
Thanks,
Sean

Alexander Dobin

unread,
Jun 25, 2015, 12:38:24 PM6/25/15
to rna-...@googlegroups.com, sea...@gmail.com
Hi Sean,

Please send me the Log.out file of the failed run.

Cheers
Alex

Sean Taylor

unread,
Jun 25, 2015, 1:32:51 PM6/25/15
to rna-...@googlegroups.com, sea...@gmail.com
Here's the log.out and log.final.out from one of my more recent attempts.

Thanks,
Sean
Log.out
Log.final.out

Alexander Dobin

unread,
Jun 25, 2015, 5:06:10 PM6/25/15
to rna-...@googlegroups.com, sea...@gmail.com
Hi Sean,

the Log.out file does not contain anything suspicious, except that the input read stream appears to be empty.
Could you please send me the output of 'ls -lR' on the STAR failed run directory?

Also, could you try to run this with process substitution, but without the  --readFilesCommand :
STAR --genomeDir $RNA_HOME/refs/hg19/star/22 --readFilesIn <(zcat H_KH-540077-Normal-cDNA-1-lib2_ds_10pc_1.fastq.gz) --runThreadN 8 --outFileNamePrefix Normal_cDNA1_lib2/ --outSAMstrandField intronMotif

Cheers
Alex

Sean Taylor

unread,
Jun 25, 2015, 6:26:46 PM6/25/15
to rna-...@googlegroups.com, sea...@gmail.com
Here is ls -IR from the Normal_cDNA1_lib2/ directory (I think that's the one you wanted):
$ ls -IR
Aligned.out.sam  Log.final.out  Log.out  Log.progress.out  SJ.out.tab


Also, I ran with process substitution as you suggested, and that seems to have worked. Log files attached.

Thanks!
Sean
Log.out
Log.final.out

Malcolm Cook

unread,
Jun 27, 2015, 12:19:54 AM6/27/15
to rna-...@googlegroups.com
I'm pretty sure the problem is that the multiple pathnames passed using --readFilesIn must be separated by COMMAs.  Your use of the *  wildcard created a set of SPACE delimited pathnames. 
 

Sean Taylor

unread,
Jun 29, 2015, 12:52:59 PM6/29/15
to rna-...@googlegroups.com
Hi Malcolm,

Thanks for the suggestion. From the Star manual, 
--readFilesIn default: Read1 Read2 string(s): paths to files that contain input read1 (and, if needed, read2)

This implies that space delimited pathnames are what is expected, so I think that should have been fine. Doesn't hurt to verify though, so I tried

$ STAR --genomeDir $RNA_HOME/refs/hg19/star/22 --readFilesIn  ../H_KH-540077-Normal-cDNA-1-lib2_ds_10pc_1.fastq.gz, ../H_KH-540077-Normal-cDNA-1-lib2_ds_10pc_2.fastq.gz --readFilesCommand zcat --runThreadN 8 --outFileNamePrefix Normal_cDNA1_lib2_test/ --outSAMstrandField intronMotif

EXITING: because of fatal INPUT ERROR: number of input files for mate1: 2 is not equal to that for mate2: 1
Make sure that the number of files in --readFilesIn is the same for both mates

Jun 29 08:49:10 ...... FATAL ERROR, exiting

Also, when I pass just a single file in I also get the result I have previously described. So I don't think the space delimiting is likely the issue. I also wanted to verify that the relative path reference wasn't the issue, so I tried passing in the full pathnames, but that also produced the same result. 

So far I have only had success if I pass in the uncompressed files (using space delimiting) or using process substitution as suggested in Alex's last post.

Thanks for your help!
Sean

Malcolm Cook

unread,
Jun 29, 2015, 12:58:05 PM6/29/15
to Sean Taylor, rna-...@googlegroups.com

Hi. Try it one more time. This time without the space next to the comma. Just the comma between the file names
Hope it works.

--
You received this message because you are subscribed to a topic in the Google Groups "rna-star" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rna-star/t0_dF_UwO1M/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rna-star+u...@googlegroups.com.
Visit this group at http://groups.google.com/group/rna-star.

Sean Taylor

unread,
Jun 29, 2015, 1:47:12 PM6/29/15
to rna-...@googlegroups.com
Sure. 
$ STAR --genomeDir $RNA_HOME/refs/hg19/star/22 --readFilesIn  ../H_KH-540077-Normal-cDNA-1-lib2_ds_10pc_1.fastq.gz,../H_KH-540077-Normal-cDNA-1-lib2_ds_10pc_2.fastq.gz --readFilesCommand zcat --runThreadN 8 --outFileNamePrefix Normal_cDNA1_lib2_comma/ --outSAMstrandField intronMotif

This time there was no error reported, but otherwise the result was the same, ie. empty sam file, log file reports no reads read.

Mauricio Losilla

unread,
Jun 29, 2015, 2:41:43 PM6/29/15
to rna-...@googlegroups.com

I am not sure, but I think this is the logic behind readFilesIn (pay attention to spaces and commas):

one single-end file:

--readFilesIn sample1SE 

one pair of paired-end files:

--readFilesIn sample1F sample1R

multiple single-end files (3, in this example):

--readFilesIn sample1SE,sample2SE,sample3SE

multiple paired-end files (3, in this example):

--readFilesIn sample1F,sample2F,sample3F sample1R,sample2R,sample3R



Regarding zcat, and this is just a long shot, I wonder if STAR is unable to find zcat in your system. Would it be worth trying to specify the full path to zcat??


Good luck
Mau

Sean Taylor

unread,
Jun 29, 2015, 3:14:27 PM6/29/15
to rna-...@googlegroups.com
Hi Mau,

That logic matches my assumptions as well.

Regarding zcat, I did try passing in the full path to zcat, but that also was unsuccessful. Thanks for your suggestion.

Sean

Malcolm Cook

unread,
Jun 29, 2015, 4:26:19 PM6/29/15
to Sean Taylor, rna-...@googlegroups.com
Hi,

Well, we fixed one problem at least.

Your call otherwise looks good to my eyes.

Good luck....

Alexander Dobin

unread,
Jun 30, 2015, 6:17:08 PM6/30/15
to rna-...@googlegroups.com, malcol...@gmail.com, sea...@gmail.com
Hi Sean,

this problem looks a bit like the one reported here https://groups.google.com/d/msg/rna-star/WSuTgzb8pC8/d4cIvohd4csJ
though that was Ubuntu-specific. What is your default shell? If it's not bash, it could be causing the problems.

What is the contents of the _STARtmp directory inside the run directory?

Cheers
Alex


To unsubscribe from this group and all its topics, send an email to rna-star+unsubscribe@googlegroups.com.

Sean Taylor

unread,
Jun 30, 2015, 7:26:15 PM6/30/15
to rna-...@googlegroups.com, malcol...@gmail.com, sea...@gmail.com
The default shell is bash:
$ echo $SHELL
/bin/bash

I have run this several times with various permutations as suggested above. I have tried to run these all in different run folders so I could keep track of the log files. Interestingly, not every run folder has a _STARtmp subdirectory. For example, the first time you asked for the ls output of one of my failed run directories, there was no _STARtmp directory present. Perhaps that is expected depending on where it crashes? I don't know. At any rate, for some of the runs I was able to find it, so here are the contents of that. I have also attached the corresponding log file.

$ ls -lR
.:
total 2048
-rwxr-xr-x 1 staylo rleQAS_SCRI-Sudo     0 Jun 29 09:07 Aligned.out.sam
-rwxr-xr-x 1 staylo rleQAS_SCRI-Sudo  1645 Jun 29 09:05 Log.final.out
-rwxr-xr-x 1 staylo rleQAS_SCRI-Sudo 14949 Jun 29 09:07 Log.out
-rwxr-xr-x 1 staylo rleQAS_SCRI-Sudo     0 Jun 29 09:07 Log.progress.out
-rwxr-xr-x 1 staylo rleQAS_SCRI-Sudo     0 Jun 29 09:05 SJ.out.tab
drwxr-xr-x 2 staylo rleQAS_SCRI-Sudo     0 Jun 29 09:07 _STARtmp

./_STARtmp:
total 55296
-rwxr-xr-x 1 staylo rleQAS_SCRI-Sudo      150 Jun 29 09:07 readFilesIn.info
-rwxr-xr-x 1 staylo rleQAS_SCRI-Sudo      175 Jun 29 09:07 readsCommand_read1
-rwxr-xr-x 1 staylo rleQAS_SCRI-Sudo      175 Jun 29 09:07 readsCommand_read2
-rwxr-xr-x 1 staylo rleQAS_SCRI-Sudo 42168422 Jun 29 09:07 tmp.fifo.read1
-rwxr-xr-x 1 staylo rleQAS_SCRI-Sudo 10223616 Jun 29 09:07 tmp.fifo.read2
Log.out

Alexander Dobin

unread,
Jul 1, 2015, 5:05:46 PM7/1/15
to rna-...@googlegroups.com, sea...@gmail.com, malcol...@gmail.com
Hi Sean,

the thing that looks suspicious here is the non-zero size of the tmp.fifo.read1/2 files. These fifo files are supposed to have 0 size.
Do they look like text fastq files? If so, it would mean that the "mkfifo" command in the code to create these files failed, and this may be causing all the problems.
From the attached Log.out file, you have used the re-compiled STAR executable. Could you try the pre-compiled static STAR executable, and again send me the Log.out file, and the contents of the _STARtmp?

Cheers
Alex

Sean Taylor

unread,
Jul 2, 2015, 3:08:29 PM7/2/15
to rna-...@googlegroups.com, malcol...@gmail.com, sea...@gmail.com
Hi Alex,

Yes, these tmp.fifo.read files look just like fastq files in their contents. 

The version of STAR that I am using came as part of a tarball from BioBuilds release 2015-04 (http://biobuilds.org/downloads/). I also was curious if there was some issue with that build, so I downloaded and installed from github:
 
and used what I think is the static version that you are referring to:

/data/Bioinformatics/SampleData/CBW/tools/STAR/bin/Linux_x86_64_static/STAR --genomeDir /data/Bioinformatics/SampleData/CBW/refs/hg19/star/22 --readFilesCommand zcat --readFilesIn ../H_KH-540077-Normal-cDNA-1-lib2_ds_10pc_1.fastq.gz ../H_KH-540077-Normal-cDNA-1-lib2_ds_10pc_2.fastq.gz --outFileNamePrefix Normal_cDNA1_lib2_static/

The sam file still fails to populate, and this is one of those cases where I did not get a _STARtmp file output:
staylo@EWRLNXRD28:/data/Bioinformatics/SampleData/CBW/data/test/Normal_cDNA1_lib2_static$ ls -lR
.:
total 4096
-rwxr-xr-x 1 staylo rleQAS_SCRI-Sudo   774 Jul  2 10:00 Aligned.out.sam
-rwxr-xr-x 1 staylo rleQAS_SCRI-Sudo  1645 Jul  2 10:03 Log.final.out
-rwxr-xr-x 1 staylo rleQAS_SCRI-Sudo 15367 Jul  2 10:00 Log.out
-rwxr-xr-x 1 staylo rleQAS_SCRI-Sudo   246 Jul  2 10:00 Log.progress.out
-rwxr-xr-x 1 staylo rleQAS_SCRI-Sudo     0 Jul  2 10:03 SJ.out.tab

So it seems to me that the general possibilities were roughly
-syntax error (but I think we have eliminated that one)
-an error in the way BioBuilds compiled and distributed STAR (not likely since I am having the same issues with what I downloaded from github)
-an error in STAR (not likely if I am the only one with this problem)
-a system error, incompatibility, setting, or missing dependency on my end (what we are left with, and probable given that this is a new system we are building. I just don't know what exactly is missing)

Any other thoughts?

Thanks,
Sean
Log.out

Alexander Dobin

unread,
Jul 6, 2015, 7:06:11 PM7/6/15
to rna-...@googlegroups.com, sea...@gmail.com, malcol...@gmail.com
Hi Sean,

I think it's some kind of system incompatibility, and I suspect it has something to do with the fifo files.
The STARstatic run did actually complete, but with 0 input reads, according to Log.out - and the _STARtmp was erased after the job finished.
Could you try to run these commands:
$ mkfifo ttt
$ zcat ../H_KH-540077-Normal-cDNA-1-lib2_ds_10pc_1.fastq.gz > ttt &
$ head ttt
This should show 10 first lines from the fastq file. 
Than check the size of the ttt file.
This will test whether the fifo mechanism work at all on your server.

If nothing works, I think you can use the process substitution solution, it will perform as well.

Cheers
Alex

Sean Taylor

unread,
Jul 7, 2015, 3:21:40 PM7/7/15
to rna-...@googlegroups.com, sea...@gmail.com, malcol...@gmail.com
OK, I think we are at the heart of the problem now. 

When I tried mkfifo, I got an error:
$ mkfifo ttt
mkfifo: cannot create fifo ‘ttt’: Operation not permitted

It finally occurred to me that I was working within an NFS mount. A little google searching showed that fifo's don't work on VFAT partitions, which I guess this is. At any rate, I tried copying my files off of the NFS mount into a local directory and ran it again. This time everything worked as expected.

So I think that is the answer. The 'readFilesCommand' likely won't work if you are writing to a NFS mount. If that is the case, use process substitution instead or decompress as a pre-process step.

Thanks for your help and patience getting to the heart of this one.

Best,
Sean

Malcolm Cook

unread,
Jul 7, 2015, 7:46:13 PM7/7/15
to Sean Taylor, rna-...@googlegroups.com
FWIW - I run STAR on NFS mounted gzipped fastq files no problem.   I do ensure temp directory is on local file-system, if it matters.  Options wind up looking like this:

    --readFilesCommand 'zcat' \
    --outTmpDir '$(mktemp -d -u)' \

~Malcolm



Alexander Dobin

unread,
Jul 8, 2015, 3:35:57 PM7/8/15
to rna-...@googlegroups.com, malcol...@gmail.com, sea...@gmail.com
Hi Sean, Malcolm,

I think it's the VFAT partition that does not support the fifo files. Do you have a big enough local drive with Linux partitions on your server? If so, you can use --outTmpDir /path/to/temp/dir/ .

Cheers
Alex
Reply all
Reply to author
Forward
0 new messages