STAR --chimOutType WitinBam or Default？

javier zhang

unread,

May 25, 2017, 1:41:28 PM5/25/17

to rna-star

Hi Alex,

I am in trouble in analyzing RNA-seq fusion with Integrate tools which needs the bam file output by STAR. But when I was running 2-pass mapping STAR, I set --chimOutType default. The chimeric alignments have been put into Chimeric.out.sam. What should I do?Merging the two bam file may be not right.

Secondly, I would analysis virus-human fusion and then.I am confused if I could filter the two bam file(Aligned and Chimeric) separately to get the reads that would be aligned to the database of human reference and virus reference, Is there any problem? Do I need to re-run the STAR to get a complete bam file using --chimOutType WitinBam.

Thanks

Javier

Alexander Dobin

unread,

May 31, 2017, 5:34:27 PM5/31/17

to rna-star

Hi Javier,

if you need the chimeric alignments in the BAM file, it's best to re-run STAR with --chimOutType WitinBam. If you are doing manual 2-pass process, you only need to re-run the 2nd pass, this will save 1/2 of the run time.

It may be possible to merge the Chimeric.out.sam into the BAM file, however, it needs to be done carefully, since the normal BAM will contain non-chimeric alignments for some of the reads that have chimeric alignments in the Chimeric.out.sam file.

If you mapped to the combined virus/human genome, then the human/virus fusion will be present as chimeric alignments, in Chimeric.out.sam and, if you used --chimOutType WitinBam, in the BAM file.

Cheers

Alex

Javier

unread,

Jun 1, 2017, 9:34:16 AM6/1/17

to rna-star

Hi Alex

Thanks for your reply.

I have another question.I did manual 2-pass mapping with 86 tumor samples and 20 normal samples using --outFilterType BySJout option. SJ.out.tab files from all runs and GTF files were merged to generate a new genome indices.At the same time, I used --twopassMode Basic to test a few samples separately .I found that uniquely mapped reads(85% average ) of manual 2-pass mapping decreased 10% compared uniquely mapped reads(95% average ) by using --twopassMode Basic.I want to use HTSEQ to do the downstream analysis. I was wondering if I should give up the 2-pass mapping or reset some parameter (--outSJfilterOverhangMin --outSJfilterCountUniqueMin, --outSJfilterCountTotalMin, --outSJfilterDistToOtherSJmin, --outSJfilterIntronMaxVsReadN)to create a highly confident set of detected splice junctions in the SJ.out.tab or reset --alignSJoverhangMin to reduce the multiple alignment.Otherwise, a lot of reads(about 10%) were discarded when running HTSEQ.All these parameters I mentioned above were set to the default when I was running STAR.

If I have more samples，maybe I will get fewer uniquely mapped reads.I want to know how to deal with it.

Thanks

Javier

在 2017年6月1日星期四 UTC+8上午5:34:27，Alexander Dobin写道：

Alexander Dobin

unread,

Jun 2, 2017, 3:54:36 PM6/2/17

to rna-star

Hi Javier,

the manual 2-pass scheme is mostly useful for looking at the differential splicing between the samples. If you are only interested in counting reads per gene, you can drop it in favor of the basic 2-pass scheme (or even 1-pass with annotations).

If you decide to go with the manual 2-pass scheme, first, I would recommend that you check manually a few reads which were unique mappers with --twopassMode Basic but became multimappers with the manual 2-pass method.

You can post these examples for us to discuss. Most likely, you will see that in the latter case multiple alignments go to distinct "annotated" junctions discovered in the 1st pass. Most of these spliced alignments should have short overhangs.

If this is indeed the case, it should be possible to reduce the number of junctions by filtering them after the 1st pass. For instance, you can filter by the number of samples the junction was detected in, or by the number of splices per junction in all samples.

The --outSJfilterOverhangMin --outSJfilterCountUniqueMin, --outSJfilterCountTotalMin, --outSJfilterDistToOtherSJmin, --outSJfilterIntronMaxVsReadN filters can be applied to SJ.out.tab files after the 1st pass with simple scripting. I would not recommend using them directly in the 1st pass since you would need to re-run the 1st pass multiple times to figure out the correct settings.

Cheers

Alex

Javier

unread,

Jun 7, 2017, 8:03:34 AM6/7/17

to rna-star

Hi Alex

Thanks for your reply.

I asked some of my friends who told me that 10% multi-alignment is not a problem and suggested me use RSEM to do the downstream analysis.

I believe that these spliced alignments should have short overhangs.AS my understanding, these short overhangs are not enough to support the junction output into their "own" SJ.out.tab.

Now I am confused that if I need to care about 10% multi-alignment. I am a newer to RNA-seq, I have not seen many results of alignment.If you think 10% multi-alignment is ok, I will ignore this problem.

If you think filtering the sJ.out.tab is a better chionce, I could write some script to filter the SJ.out.tab as you said.But I don't know how to set better parameters and --outSJfilterOverhangMin couldn't be reset in scripts because column 9 in SJ.out.tab is maximum spliced alignment overhang. And maybe setting --alignSJoverhangMin to 6 or 8 is an easier way.

Thanks

Javier

在 2017年6月3日星期六 UTC+8上午3:54:36，Alexander Dobin写道：

Alexander Dobin

unread,

Jun 7, 2017, 1:17:32 PM6/7/17

to rna-star

Hi Javier,

overall, the 10% multimappers should not cause any serious troubles with htseq-count like analysis (i.e. counting reads per gene).

It will very slightly reduce the power to detect the DE genes. However, there might be some genes that are affected more than others.

You could make a scatter plot of read counts per gene with the 2-pass manual vs --twopassMode Basic to check for such outliers.

In general, for read-per-gene counting analyses I would recommend using --twopassMode Basic for each sample and not bothering with the manual 2-pass.

The --outSJfilterOverhangMin defines the minimum allowed "max overhang" = col 9 of the SJ.out.tab , so you can simply filter the SJ.out.tab for the 2nd pass.