missing gene counts

105 views
Skip to first unread message

Bernadeta D

unread,
May 23, 2020, 1:42:07 PM5/23/20
to rna-...@googlegroups.com
Hi Alex,

I'm looking at STARsolo cell-gene-matix files and it seems to me that some gene associate reads that I can observe in BAM file are dropped even from raw matrix file and I'm not sure why that would be. 

Attached is a screenshot for what I'm looking at: on IGV viewer I can see a read that has TTGTAGGTCCGAACGC cell barcode and is mapped to VTRNA1-3. I then look at the barcode and feature files from STARsolo output and find their row numbers (727833 and 17667 for cell and gene, respectively). Then I look at the matrix to find the rows associated with this particular barcode and gene, but though I find the cell barcode I cannot find the VTRNA1-3 read that, given that it is present in bam file, should be associated with it. I would understand if I'd be looking at filtered files and missing cells because they did not pass the filter but this is looking at the raw cell-matrix and I'm not sure why would these reads be dropped?

Thanks for any insights.

B.
Screen Shot 2020-05-23 at 10.39.08 AM.png

Alexander Dobin

unread,
May 26, 2020, 6:20:39 PM5/26/20
to rna-star
Hi Bernadeta,

it looks like only part of this read overlaps the gene - such reads are not counted towards the genes.
The present policy (following CellRanger) is that only reads that are contained entirely within exons of a gene are counted.

Cheers
Alex

Bernadeta D

unread,
May 26, 2020, 6:43:18 PM5/26/20
to rna-star
Thanks Alex, this solved my confusion :) And would the use of --soloFeatures GeneFull change this behaviour?

Alexander Dobin

unread,
May 27, 2020, 12:13:06 PM5/27/20
to rna-star
Yes, GeneFull will include all reads that overlap (>=1 base) the whole genic locus, i.e. it will include purely exonic reads, exon/intron reads, purely intronic reads, and exon/intergenic reads.
The latter are the ones that will be counted in your example.
Reply all
Reply to author
Forward
0 new messages