Custom annotation produces strand misassignment

45 views
Skip to first unread message

Jesse Niehaus

unread,
Apr 25, 2024, 10:50:47 AMApr 25
to rna-star
Hello Alex,

I have a peculiar scenario that yields some interesting alignment results.
(I'm using STAR on 10x v3 3' data; GeneFull_ExonOverIntron counting; otherwise using your recommended alignment parameters)

I have an unconventional goal where I want intronic reads for a specific part of a gene to be quantified regardless of strand. To do this, I flipped the strand information on one of the gene's many transcripts (and corresponding exons).

Oddly,  this appears to cause the gene's entire orientation to flip such that reads aligned to the original strand are no longer assigned to the gene and all reads on the opposite strand are assigned to the gene.

E.g. GeneX is originally on the plus strand with many isoforms.
I flip the strand orientation of one of GeneX's transcripts and corresponding exons.
Now all reads on the minus strand that span the gene's coordinates, not just the antisense transcript coordinates, are assigned to the gene. 
Also, zero reads on the plus strand are assigned to the gene.

Here's a couple alignment tracks in IGV (annotation at the bottom) colored by strand.
The red (+) reads are unassigned despite the + strand 'gene' and many 'transcript' annotations.
The blue (-) reads are assigned to the gene regardless of whether there's an antisense transcript there.

Can you think of what could be causing this behavior?

Thank you for your time!
Best,
Jesse

PlusMinusreads1.PNG
MinusReadsAssigned_withNoAnnotation.PNG

Jesse Niehaus

unread,
Apr 26, 2024, 1:10:56 PMApr 26
to rna-star
Small update:




--
You received this message because you are subscribed to the Google Groups "rna-star" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rna-star+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rna-star/a2a5b743-d458-4569-b6b9-95e1828a29acn%40googlegroups.com.

Jesse Niehaus

unread,
Apr 26, 2024, 1:31:15 PMApr 26
to rna-star
Hello,

I have an update after some testing.
The gene-wide orientation flip appears specific to introns.
Exonic reads in the same orientation as the genes are properly assigned. Intronic reads on the same strand are incorrectly unassigned. Antisense intronic are assigned to the gene (regardless of whether an antisense transcript lands in these coordinates).

I think this behavior could be explained by understanding how star would handle a gene with transcript entries on both strands. I can't find information about how star uses gtf information aside from the assignment order of operations described in this post: https://github.com/alexdobin/STAR/issues/2022

Given my goal I could always add 'dummy' genes in the antisense areas I want to quantify and add them to my genes of interest.

I should also mention these conclusions are based on BAM files. I believe they carry on to the count matrices, but I'm not 100%.
I also should've specified some technicals:
I'm using star/2.7.11a

Here is my final defined parameters
##### Final user re-defined parameters-----------------:
runThreadN                        32
genomeDir                         /path/to/ref
readFilesIn                       /path/to/fastq_R2_001.fastq.gz    /path/to/fastq _R1_001.fastq.gz
readFilesCommand                  zcat
outFileNamePrefix                 ./testing
outTmpDir                         ./testingtmp
outSAMtype                        BAM   SortedByCoordinate
outSAMattributes                  NH   HI   NM   CR   CY   UR   UY   GX   GN   gx   gn   CB   UB
outBAMsortingThreadN              32
outFilterScoreMin                 30
clipAdapterType                   CellRanger4
soloType                          CB_UMI_Simple
soloCBstart                       1
soloUMIstart                      17
soloCBlen                         16
soloUMIlen                        12
soloBarcodeReadLength             0
soloCBwhitelist                   /home/references/10xV3_Whitelist_february-2018.txt
soloFeatures                      GeneFull_ExonOverIntron
soloUMIdedup                      1MM_CR
soloCBmatchWLtype                 1MM_multi_Nbase_pseudocounts
soloCellFilter                    EmptyDrops_CR
soloUMIfiltering                  MultiGeneUMI_CR
soloMultiMappers                  EM
soloCellReadStats                 Standard

Apologies for bombarding you w/ messages.
Thanks again for your input.
All the best,
Jesse

On Thu, Apr 25, 2024 at 10:50 AM Jesse Niehaus <kyl...@gmail.com> wrote:

Jesse Niehaus

unread,
Apr 29, 2024, 6:57:44 PMApr 29
to rna-star
Update pt 2:
This behavior resolved itself after changing soloFeatures from 'GeneFull_ExonOverIntron' to " GeneFull_Ex50pAS'.
You mentioned this behavior is expected on an open github question:
' Note that if a gene has transcripts that do not overlap each other[I'm assuming this would be the case for a transcript on the opposite strand], GeneFull will not work correctly, while GeneFull_Ex50pAS will work correctly.'

Sorry to bother you and thanks again for all your input!
Best,
Jesse
Reply all
Reply to author
Forward
0 new messages