splice-altering variant causes STAR to call incorrect junction

43 views
Skip to first unread message

Lee-kai Wang

unread,
Jan 24, 2021, 1:28:10 PM1/24/21
to rna-...@googlegroups.com
Hi Alex, thanks so much for this software and for all the attention you give to its continued maintenance.

Unnecessary non-technical background: I work in a lab that uses RNA-seq to look for novel splice junctions in undiagnosed disease cases. We often observe a genomic variant that displaces one side of a canonically annotated junction but leaves the other side intact. In this case, one would expect STAR's SJ.out file to accordingly list the new junction with one novel coordinate and one known coordinate. Almost all the time (> 90%), STAR does this correctly, but on rare occasions, it outputs novel coordinates on both the left and right side, off by a few bases. This is problematic for us, because it makes it significantly harder to match up the causal variant to the novel junction (it's much easier if we can expect one end to be a canonically annotated end).

Below and attached I have tried to isolate a minimal example for debugging purposes. Actually, I'm wondering if perhaps one of my assumptions is wrong, which I'll put up top here as a question: When using an sjdb, should I expect STAR to favor a novel junction that shares one end in common with a known annotated junction over a novel junction that shares zero ends with an annotated junction? If the answer is yes, then proceed... If the answer is no, then could that logic be added?

Details for the minimal example:
  • The attached FASTQ files contain 74 read pairs.
  • There is a canonically annotated splice junction at 4:674377-674880 (hg19).
  • The sample contains a heterozygous variant 4:674373A>T, creating a competing donor site that sometimes results in the novel junction 4:674372-674880.
  • When I align the FASTQ files (I am using STAR 2.6.0c with two-pass alignment, an sjdb, and default parameters for everything else), ideally I would see the following types of spliced reads:
    • A: reads without the variant would show the canonically annotated junction 4:674377-674880,
    • B: some reads with the variant would still splice at the canonically annotated junction 4:674377-674880,
    • C: some reads with the variant would splice at the new junction 4:674372-674880.
  • In the attached alignment, I see read types A & B, but I don't see type C. Instead, STAR aligns the novel junction as D: 4:674370-674878. This is a different spelling of C: 4:674372-674880, but without the benefit that one of the sides of the junction matches a canonically annotated junction. I was surprised by this, since I thought STAR would favor C over D because of the sjdb.
Could you let me know which is the expected behavior, and whether it is something that could potentially be fixed?

Thanks very much!
Lee-kai
MYL5_paired_R1.fastq.gz
MYL5_paired_R2.fastq.gz
MYL5.Aligned.sortedByCoord.out.bam
MYL5.Aligned.sortedByCoord.out.bam.bai

Alexander Dobin

unread,
Jan 24, 2021, 4:39:56 PM1/24/21
to rna-star
Hi Lee-kai,

very thoughtful question - and you pretty much figured the answer!
>>>When using an sjdb, should I expect STAR to favor a novel junction that shares one end in common with a known annotated junction over a novel junction that shares zero ends with an annotated junction?
No, only splices with both ends annotated are considered annotated favored (i.e. have a bonus score added to their alignment score).
If only donor or acceptor are annotated, there no alignment score increase.

>>>If the answer is no, then could that logic be added?
I agree this would be useful, and your application is a compelling example. 
I will add it to my TODO list, but I cannot promise an ETA, as I am overwhelmed with other projects now.

Cheers
Alex

free....@gmail.com

unread,
Feb 2, 2021, 1:35:34 PM2/2/21
to rna-star
Thanks for the quick response! Looking a little into the parameters, perhaps a new partialSjdbScore with default value even 0.01 or something else very low but nonzero would resolve this issue for me. Do you prefer that I add as a feature request on GitHub Issues?

Thanks,
Lee-kai

Alexander Dobin

unread,
Feb 2, 2021, 2:37:49 PM2/2/21
to rna-star
Hi Lee-kai,

the alignment score has to be an integer, so the minimum value has to be 1.
A feature request on GitHub will be helpful.

Thanks!
Alex
Reply all
Reply to author
Forward
0 new messages