Fail to detect known fusion transcript with STAR chimeric alignment

468 views
Skip to first unread message

Ming Su

unread,
Mar 25, 2015, 5:08:45 PM3/25/15
to rna-...@googlegroups.com
Hi, all,

I'm trying STAR-2.4.0j for fusion transcript detection. However, STAR always loses some known chimeric junctions. Here is an fusion transcript for which STAR reports no read.

EML4 exon 5                                                                                                             EML4 exon 6                                        ALK exon 19
GTCGAAAATACCTTCAACACCCAAATTAATACCAAAAGTTACCAAAACTGCAGACAAGCATAAAGATGTCATCATCAACCAAGTGTCACCCACCCCGGAGCCACACCTGCCACTCTCGCTGATCCTCTCTGTGGTGACCT

And here is a read aligned to the complementary strand of this fusion transcript by bowtie fed with manually curated fusion sequence, yet reported unmapped by STAR.

@V946R:00252:00995
TCACCACAGAGAGGATCAGCGAGAGTGGCAGGTGTGGCTCCGGGGTGGGTGACACTTGGTTGATGATGACATCTTTATGCTTGTCTGCAGTTTTGGTAACTTTTGGTATTAATTTGGGTGTTGA
+
<>>?ADCFDDCCCACCCCCC@@@CCCCACCCADCCCACCCACCC8CCC>CCCDCCCBDACACCCC???CCCCCCC>CCCCCACCCCCAA::::.:=::;?EFE:C@C::5:4::/::/:CCA>>

The parameters I used were as followed.

--outFilterType BySJout 
--outFilterMismatchNmax 999 
--outFilterMismatchNoverLmax 0.06 
--outFilterMultimapNmax 20 
--outFilterMatchNminOverLread 0.66 
--outFilterIntronMotifs None 
--outSJfilterReads All 
--outSAMstrandField intronMotif 
--alignSJoverhangMin 8 
--alignSJDBoverhangMin 1 
--alignIntronMin 20 
--alignIntronMax 1000000 
--chimSegmentMin 15 
--chimJunctionOverhangMin 15 
--chimScoreMin 0 
--chimScoreDropMax 20 
--chimScoreSeparation 10 
--chimScoreJunctionNonGTAG -1

I doubt the false negative is caused by the micro-exon (EML4 exon 6). In fact, STAR constantly lost more than 1/2 of reads for other fusions where EML4 exon 6 was involved. And my question is how to set the STAR parameters to correctly detect the fusion transcript above.

Best,
Ming Su

Alexander Dobin

unread,
Apr 4, 2015, 11:45:40 PM4/4/15
to rna-...@googlegroups.com
Hi Ming,

I figured out which filter was preventing the chimeric output for this read. If I map your read with --alignSplicedMateMapLminOverLmate 0.2, it outputs the correc chimera:

1       0       chr2    42490390        3       57M1399N26M57S  *       0       0       ...
1       272     chr2    29448375        3       57M83S  *       0       0       

By default this parameter is 0.66, which prevents spliced alignments with total length of exons < 0.66 of read length. However, this should not apply to chimeric alignments. I will change the behavior of this parameter in the future, but for now please use --alignSplicedMateMapLminOverLmate 0.2 (or smaller) for chimeric detection runs.

Cheers
Alex

Ming Su

unread,
Apr 7, 2015, 3:49:45 AM4/7/15
to rna-...@googlegroups.com
Hi, Alex,

Thanks for your work! Now I can detect this chimera by setting --alignSplicedMateMapLminOverLmate to 0.2. And the number of supporting reads given by STAR is similar to that from mapping with assembled reference. But meanwhile, there are much fewer supporting reads for some chimeric junctions formerly correctly detected. It's strange because I think we are relaxing the filter, yet the output reads number falls. It will take me some time to figure out which reads are missed now. Perhaps you have some tips on why this happens? 

Best,
Ming 

在 2015年4月5日星期日 UTC+8上午11:45:40,Alexander Dobin写道:

Alexander Dobin

unread,
Apr 7, 2015, 2:35:17 PM4/7/15
to rna-...@googlegroups.com
Hi Ming,

the only explanation I can think of without looking at the reads that dropped out is that relaxing this parameter allowed for more chimeras to become multi-mappers.
You can try to compensate for that by reducing --chimScoreSeparation (=10 by default). 
It would be great if you could send me the dropped reads.

Cheers
Alex

Ming Su

unread,
Apr 15, 2015, 9:28:04 PM4/15/15
to rna-...@googlegroups.com
Hi, Alex,

I tried to reduce --chimScoreSeparation to 1, but the results were the same to those with --alignSplicedMateMapLminOverLmate set to 0.2. I have sent some dropped reads as well as some successfully called reads to you. Hope it will be helpful.

Best,
Ming Su

在 2015年4月8日星期三 UTC+8上午2:35:17,Alexander Dobin写道:

Alexander Dobin

unread,
Apr 17, 2015, 5:05:00 PM4/17/15
to rna-...@googlegroups.com
Hi Ming,

I have run STAR (2.4.1a and 2.4.0j) --alignSplicedMateMapLminOverLmate 0.2  --chimSegmentMin 15 --chimJunctionOverhangMin 15
and all of your _dropped reads seem to map to the correct chimeric junctions:

chr2    29446395        +       chr10   32311067        +       2       0       2       E1Z30:00144:02571       29446345        50M43S  32311068     50S43M
chr2    29446395        +       chr10   32311067        +       2       0       2       E1Z30:00209:01956       29446353        42M54S  32311068     42S54M
chr2    29446395        +       chr10   32311067        +       2       0       2       E1Z30:00348:00704       29446369        26M43S  32311068     26S43M
chr2    29446395        +       chr10   32311067        +       2       0       2       E1Z30:00046:01511_dropped       29446353        42M43S       32311068        42S43M
chr2    29446395        +       chr14   104139769       -       2       0       1       E1Z30:01135:02331       29446353        42M41S  104139728    41M42S
chr2    29446395        +       chr14   104139769       -       2       0       1       E1Z30:00159:02374       29446345        50M41S  104139728    41M50S
chr2    29446395        +       chr14   104139769       -       2       0       1       E1Z30:01064:01940       29446345        50M53S  104139716    53M50S
chr2    29446395        +       chr14   104139769       -       2       0       1       E1Z30:00022:00988_dropped       29446345        50M53S       104139716       53M50S
chr2    29446395        +       chr14   104139769       -       2       0       1       E1Z30:00037:01927_dropped       29446345        50M41S       104139728       41M50S
chr4    25665953        +       chr6    117650610       -       1       2       0       E1Z30:00031:00293       25665918        35M49S  117650561    49M35S
chr4    25665953        +       chr6    117650610       -       1       2       0       E1Z30:00017:00590       25665918        35M66S  117650544    66M35S
chr4    25665953        +       chr6    117650610       -       1       2       0       E1Z30:00021:00203_dropped       25665918        35M49S       117650561       49M35S
chr4    25665953        +       chr6    117650610       -       1       2       0       E1Z30:00166:02450_dropped       25665918        35M66S       117650558       14S52M35S

If you see a different behavior in your runs, please send me the Log.out file and the Chimeric.out.* files for these reads.

Cheers
Alex

Ming Su

unread,
Apr 20, 2015, 3:07:16 AM4/20/15
to rna-...@googlegroups.com
Hi, Alex,

I re-ran my data with exactly the same  parameters as yours, and all my dropped reads were correctly mapped. So I think some additional filters I set might lead to the dropped reads. After some tests, finally I figured out when I set "--alignSJDBoverhangMin 1", the reads I listed would be dropped. Hence it seems there is a conflict between "--alignSJDBoverhangMin" and "--alignSplicedMateMapLminOverLmate".

Best,
Ming

在 2015年4月18日星期六 UTC+8上午5:05:00,Alexander Dobin写道:

Alexander Dobin

unread,
Apr 21, 2015, 4:15:14 PM4/21/15
to rna-...@googlegroups.com
Hi Ming,

with --alignSJDBoverhangMin 1, the reads can be spliced across annotated junctions with an overhang as short as 1nt. Because of these some of the reads become multi-mappers.
For the "dropped" reads in your example, the main chimeric segment becomes a multi-mapper, and STAR will not output multi-mapping chimeras.
I only recommend using --alignSJDBoverhangMin 1 if you need highest sensitivity to annotated junctions - at the same time you get increased false positive rate and multi-mapping rate.
The default value of --alignSJDBoverhangMin 3 offers a generally acceptable sensitivity/precision trade-off.

Cheers
Alex
Reply all
Reply to author
Forward
0 new messages