Custom annotation produces strand misassignment

Jesse Niehaus

unread,

Apr 25, 2024, 10:50:47 AMApr 25

to rna-star

Hello Alex,

I have a peculiar scenario that yields some interesting alignment results.
(I'm using STAR on 10x v3 3' data; GeneFull_ExonOverIntron counting; otherwise using your recommended alignment parameters)

I have an unconventional goal where I want intronic reads for a specific part of a gene to be quantified regardless of strand. To do this, I flipped the strand information on one of the gene's many transcripts (and corresponding exons).

Oddly, this appears to cause the gene's entire orientation to flip such that reads aligned to the original strand are no longer assigned to the gene and all reads on the opposite strand are assigned to the gene.

E.g. GeneX is originally on the plus strand with many isoforms.
I flip the strand orientation of one of GeneX's transcripts and corresponding exons.
Now all reads on the minus strand that span the gene's coordinates, not just the antisense transcript coordinates, are assigned to the gene.
Also, zero reads on the plus strand are assigned to the gene.

Here's a couple alignment tracks in IGV (annotation at the bottom) colored by strand.
The red (+) reads are unassigned despite the + strand 'gene' and many 'transcript' annotations.
The blue (-) reads are assigned to the gene regardless of whether there's an antisense transcript there.

Can you think of what could be causing this behavior?

Thank you for your time!
Best,
Jesse

PlusMinusreads1.PNG

MinusReadsAssigned_withNoAnnotation.PNG

Jesse Niehaus

unread,

Apr 26, 2024, 1:10:56 PMApr 26

to rna-star

Small update:

--
You received this message because you are subscribed to the Google Groups "rna-star" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rna-star+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rna-star/a2a5b743-d458-4569-b6b9-95e1828a29acn%40googlegroups.com.

Jesse Niehaus

unread,

Apr 26, 2024, 1:31:15 PMApr 26

to rna-star

Hello,

I have an update after some testing.
The gene-wide orientation flip appears specific to introns.
Exonic reads in the same orientation as the genes are properly assigned. Intronic reads on the same strand are incorrectly unassigned. Antisense intronic are assigned to the gene (regardless of whether an antisense transcript lands in these coordinates).

I think this behavior could be explained by understanding how star would handle a gene with transcript entries on both strands. I can't find information about how star uses gtf information aside from the assignment order of operations described in this post: https://github.com/alexdobin/STAR/issues/2022

Given my goal I could always add 'dummy' genes in the antisense areas I want to quantify and add them to my genes of interest.

I should also mention these conclusions are based on BAM files. I believe they carry on to the count matrices, but I'm not 100%.
I also should've specified some technicals:
I'm using star/2.7.11a

Here is my final defined parameters
##### Final user re-defined parameters-----------------:
runThreadN 32
genomeDir /path/to/ref
readFilesIn /path/to/fastq_R2_001.fastq.gz /path/to/fastq _R1_001.fastq.gz
readFilesCommand zcat
outFileNamePrefix ./testing
outTmpDir ./testingtmp
outSAMtype BAM SortedByCoordinate
outSAMattributes NH HI NM CR CY UR UY GX GN gx gn CB UB
outBAMsortingThreadN 32
outFilterScoreMin 30
clipAdapterType CellRanger4
soloType CB_UMI_Simple
soloCBstart 1
soloUMIstart 17
soloCBlen 16
soloUMIlen 12
soloBarcodeReadLength 0
soloCBwhitelist /home/references/10xV3_Whitelist_february-2018.txt
soloFeatures GeneFull_ExonOverIntron
soloUMIdedup 1MM_CR
soloCBmatchWLtype 1MM_multi_Nbase_pseudocounts
soloCellFilter EmptyDrops_CR
soloUMIfiltering MultiGeneUMI_CR
soloMultiMappers EM
soloCellReadStats Standard

Apologies for bombarding you w/ messages.
Thanks again for your input.
All the best,
Jesse

On Thu, Apr 25, 2024 at 10:50 AM Jesse Niehaus <kyl...@gmail.com> wrote:

Jesse Niehaus

unread,

Apr 29, 2024, 6:57:44 PMApr 29

to rna-star

Update pt 2:
This behavior resolved itself after changing soloFeatures from 'GeneFull_ExonOverIntron' to " GeneFull_Ex50pAS'.
You mentioned this behavior is expected on an open github question:
' Note that if a gene has transcripts that do not overlap each other[I'm assuming this would be the case for a transcript on the opposite strand], GeneFull will not work correctly, while GeneFull_Ex50pAS will work correctly.'