--outReadsUnmapped Fastx files not created for all runs?

Carmen Sandoval

unread,

May 28, 2013, 11:23:01 PM5/28/13

to rna-...@googlegroups.com

Hi all,

I have re-run my data with STAR, this time lowering the minScore and minNMatch thresholds to 0.4, with the purpose of increasing the number of reads mapped to the fly genome. I'm still not geting more than ~60% of reads mapped, so following Alex's suggestions in my previous question, so I want to map all the initially unmapped reads, again with STAR, to the human genome.

However, I have a strange problem: Out of 8 different paired-end samples being mapped [with the --outReadsUnmapped Fastx option], only one of them has the final Unmapped.out.mate1/Unmapped.out.mate2 unmapped reads files.The rest of the sample's Unmapped files are indeed created, but are empty, and instead the unmapped reads are kept within the _tmp folder, still split into different files by threads. I don't understand why this is happening, as they are all run with the exact same shell script and are the same [compressed] file format 'fastq.gz'.

I could merge together the different thread.fastq files, but if there's something I may be missing that could prevent this from happening, please let me know :)

Thanks!

Carmen

_______________

STAR command:

runSTARfly.sh

/path/to/STAR_2.3.0e/STAR --genomeDir /path/to/Genomes/Fly/ --readFilesCommand 'zcat -fc' --readFilesIn $1 $2 --runThreadN 32 --genomeLoad LoadAndRemove --outFilterMultimapNmax 100 --outFilterMultimapScoreRange 2 --outFilterScoreMin 0 --outFilterScoreMinOverLread 0.4 --outFilterMatchNmin 0 --outFilterMatchNminOverLread 0.4 --outFilterType BySJout outFilterMismatchNmax 10 --outFilterMismatchNoverLmax 0.3 --sjdbScore 2 --outReadsUnmapped Fastx --outSAMstrandField None --outSAMmode Full --outSAMattributes Standard --outSAMunmapped None --outStd SAM | samtools view -b -o $3_STAR.bam -S -

Alexander Dobin

unread,

May 29, 2013, 12:29:58 PM5/29/13

to rna-...@googlegroups.com

Hi Carmen,

there were some problems with the Unmapped files in the earlier versions on some systems. Could you please try the latest version:

ftp://ftp2.cshl.edu/gingeraslab/tracks/STARrelease/Alpha/STAR_2.3.1n.tgz

and let me know if it still does not work.

When you say 60%, does this include unique and multi-mappers?

Have you tried to include chrUn from the fly genome? I think the first suspect should always be ribosomal "contamination".

Cheers

Alex

Message has been deleted

Carmen Sandoval

unread,

May 30, 2013, 6:17:24 PM5/30/13

to rna-...@googlegroups.com

Hi Alex,

Yes, I am getting numbers a bit below 60% mapped, including unique and multi-mappers. My fly genome build does indeed include ChrU and ChrUextra, from the overrepresented sequences in FastQ analysis, I know that ~30% of my libraries are ribosomal sequences.:(

I will try with the newer version of STAR and let you know how it goes. :)

Carmen

Carmen Sandoval

unread,

Jun 6, 2013, 12:00:04 AM6/6/13

to rna-...@googlegroups.com

Hi again Alex,

I have tried with the latest version you suggested, but still the final unmapped.mate files do not get any reads. These are still stuck in the _tmp folder per-thread unmapped reads files. Perhaps I am using too high a thread count?

Thanks,

Carmen

My command line:

/path.to/Software/STAR_2.3.1n/STAR --runThreadN 32 --genomeDir /path/to/Genomes/Fly/ --genomeLoad LoadAndRemove --readFilesIn /path/to/R1.fq.gz /path/to/R2.fq.gz --readFilesCommand "zcat -fc" --outStd SAM --outReadsUnmapped Fastx --outSAMmode Full --outSAMstrandField None --outSAMattributes Standard --outSAMunmapped None --outFilterType BySJout --outFilterMultimapNmax 100 --outFilterMultimapScoreRange 2 --outFilterScoreMin 0 --outFilterScoreMinOverLread 0.4 --outFilterMatchNmin 0 --outFilterMatchNminOverLread 0.4 --outFilterMismatchNoverLmax 0.3 --sjdbScore 2

On Wednesday, May 29, 2013 12:29:58 PM UTC-4, Alexander Dobin wrote:

Alexander Dobin

unread,

Jun 7, 2013, 9:56:50 AM6/7/13

to rna-...@googlegroups.com

Hi Carmen,

Could you please send me the Log.out and Log.final.out from the

I will have to run your actual samples to check the problem with the data, would you mind sharing them with me? Of course, it will be completely confidential.

Are you running this on the CSHL cluster or other CSHL machine?

While I am trying to figure this out, you could try to do the following to get the unmapped reads:

Use --outSAMunmapped Within to output unmapped reads into Aligned.out.sam.

Then select the unmapped reads from it, and run Picard’s SamToFastq to get fastqs of unmapped reads.