SAM flags not working for TranscriptomeSAM quantification mode

186 views
Skip to first unread message

Claudia Scheckel

unread,
Jun 17, 2016, 11:47:10 AM6/17/16
to rna-star
Dear Alex,

I am currently using STAR (020201 version) to align ribosome profiling data (single read, stranded).
To assess also rRNA contamination I allow up to 50 alignments but would like to report only one alignment per read (randomly chosen from the top scoring alignment), so I use the following command:

/Users/claudiascheckel/src/STAR/bin/MacOSX_x86_64/STAR --alignSJoverhangMin 8 --runThreadN 4 --outFilterMultimapNmax 50 --quantMode GeneCounts TranscriptomeSAM --outMultimapperOrder Random --outSAMmultNmax 1 --outSAMtype BAM SortedByCoordinate --genomeDir /Users/claudiascheckel/src/Index_STAR/mm10_ENSEMBL --readFilesIn CAD5wt_collapsed_rm3l5l.fasta --outFileNamePrefix CAD5wt_STAR_2/


Adding the --outMultimapperOrder Random --outSAMmultNmax 1 flags reduced the number of output reads for the Genome file as it should, but not for the Transcriptome BAM file: both Transcriptome BAM files (+/- --outMultimapperOrder Random --outSAMmultNmax 1) look identical. I was under the impression that STAR first aligns against the Genome and then uses those reads to output the Transcriptome BAM? Is there another flag I can add that will only report one alignment per read for the Transcriptome BAM? I guess alternatively I could transform the Transcriptome BAM to SAM and filter using SAM tools?

And I would have another unrelated question: I'm currently running the alignments on an iMAC with 32GB memory (which I think it pushing it?) and it works for files with ~30Mio reads but not files with ~70Mio reads. I have tried to align without sorting the BAM file and just outputting a SAM file - but the job is still getting killed. It works if I parse the file into 2 files and align those separately. But are there any flags I could add that would allow me to align larger files? 

Sorry in case any of these questions don't make sense - I'm still sort of new to the field and couldn't find the answers online. 

Thank you so much for your help! 

Best, 

Claudia

Alexander Dobin

unread,
Jun 17, 2016, 1:08:05 PM6/17/16
to rna-star
Hi Claudia,

--outMultimapperOrder Random --outSAMmultNmax 1 will select randomly one genomic alignment for each read.
However, if this alignment is concordant with several transcripts, all of those will be output.
You can select one primary alignment per read using samtools:
samtools view -b -F 0x100 Aligned.toTranscriptome.out.bam > OneAlignPerRead.bam
Note that these are not randomly selected.

To reduce RAM usage you can decrease --limitIObufferSize to 100000000 (or even smaller, until it gives an error).
Another parameter to reduce is --limitOutSJcollapsed, to, say, 500000

Cheers
Alex

Claudia Scheckel

unread,
Aug 15, 2016, 11:09:54 AM8/15/16
to rna-star
Hi Alex!

I am so sorry for the late reply - I completely missed your response!

Unfortunately I still haven't solved my problems. Reducing --limitIObufferSize and --limitOutSJcollapsed also didn't work. But I think I will just move all of my analysis to a big server.

Regarding the transcriptomeSAM file: if I allow multiple alignments and then apply --outMultimapperOrder Random --outSAMmultNmax 1, I get a reduced number of genomic alignments, corresponding to one alignment per read, as expected. But: the number of transcriptome alignments remains the same, independently of me reporting all or just one genomic alignment per read, so adding --outMultimapperOrder Random --outSAMmultNmax 1 does not have an affect on the transcriptomeSAM file. 
Is there a way that allows me to directly generate a transcriptomeSAM, only using one alignment per read (the primary one, randomly selected), which is then assigned to every applicable transcript?
If I were to select the primary alignment from the SAM file I assume the first one in terms of coordinates would be reported or is there a way to randomly select the primary alignment from the transcriptomeSAM file? 

Thank you so much!

Claudia

Alexander Dobin

unread,
Aug 16, 2016, 4:38:50 PM8/16/16
to rna-star
Hi Claudia,

--outMultimapperOrder Random --outSAMmultNmax 1 will randomly select one genomic locus for multi-mappers, check this alignment against the transcriptome, and will output alignments to *all* compatible transcripts.
So a unique mapper in the genomic coordinates may become a multimapper in the transcriptomic coordinates.

If you use --outMultimapperOrder Random --outSAMmultNmax 1, do you see some alignments disappear from Aligned.toTranscriptome.out.bam ?

Cheers
Alex

Alexander Dobin

unread,
Aug 16, 2016, 6:18:58 PM8/16/16
to rna-star
Hi Claudia,

I checked these options, and, as you observed, it looks like there is a bug, and the --outMultimapperOrder Random --outSAMmultNmax 1 do not work for the transcriptome output.
I will fix this problem tomorrow and make a new release.

Cheers
Alex

Claudia Scheckel

unread,
Aug 17, 2016, 3:42:00 AM8/17/16
to rna-star
Hi Alex, 

yes, adding --outMultimapperOrder Random --outSAMmultNmax 1 reduces the reads in Aligned.sortedByCoord.out.bam but not in the Aligned.toTranscriptome.out.bam. I think currently the only way to use one genomic alignment per read as input for the transcriptome alignment is to allow unique genomic alignments, so adding --outFilterMultimapNmax 1. 
I've also tried filtering the Aligned.toTranscriptome.out.bam but I have the impression that the quality scores now refer to the Transcriptome alignment and not the Genome alignment, so that doesn't work either. 

Thank you so much for your help and the new release!

Best, 

Claudia

Alexander Dobin

unread,
Aug 19, 2016, 3:15:59 PM8/19/16
to rna-star
Hi Claudia,

I have fixed the problem, please try this release:

Cheers
Alex

Claudia Scheckel

unread,
Aug 24, 2016, 5:40:27 AM8/24/16
to rna-star
Hi Alex, 

the problem is fixed in the new release! Thank you so much!!!

Best, 

Claudia
Reply all
Reply to author
Forward
0 new messages