Thank you for your continued interaction and troubleshooting friendliness I've had the chance to learn from. This is my first foray into sequencing analysis and I'm struggling to understand the ReadsPerGene.out.tab information despite searching the manual and online.
I downloaded the reference sequence (genD_NCBI_tg.fna) and annotation (spHBV_tg24_sp123_08.gtf) from NCBI (
https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000861825.2/) and renamed them for my purposes. I am mapping to the HBV genome (3.2kb circular genome with numbering beginning from the EcoRI cute site). During indexing, I've verified that the --sjdbOverhang is correct and adjusted the --genomeSAindesNbases to the genome size. During mapping, I adjusted the --outFilter scores as I will eventually be looking for rare splice variants and limited the --readMapNumber to facilitate fast processing neither of which should be giving me this problem. I have yet to fully understand the --outFilterMultimapNmax, but will return to it later.
The issue I'm facing is that I get counts for only 3/4 of the genes (HBVgp1, HBVgp2, HBVgp3, and HBVgp4). The GTF attribute of what is being shown on the ReadsPerGene.out.tab is actually the gene and not exon (not sure why). I had thought it may be because HBVgp2 has multiple overlapping CDS for the gene leading to them being thrown own at the counting step. However, HBVgp4 also has multiple overlapping CDS for the gene and those are registered by the count. Is there a reason I can't get counts for HBVgp2 considering I get counts for the other genes in the annotation? Is my understanding of the ReadsPerGene.out.tab lacking?
I've included the relevant STAR files below and attached the reference sequence and annnotations.
Thank you in advance for you help.
STAR \
--runThreadN 20 \
--runMode genomeGenerate \
--genomeDir
/home/tgrudda1/scr4-cthio1/10X-Visium/clo20_183119/outs/fastq_path/HG23MDRX2/sp.star/sp.index/nicole.s/tgx/08
\
--genomeFastaFiles
/home/tgrudda1/scr4-cthio1/10X-Visium/clo20_183119/outs/fastq_path/HG23MDRX2/sp.star/sp.index/nicole.s/tgx/08/genD_NCBI_tg.fna
\
--sjdbGTFfile
/home/tgrudda1/scr4-cthio1/10X-Visium/clo20_183119/outs/fastq_path/HG23MDRX2/sp.star/sp.index/nicole.s/tgx/08/spHBV_tg24_sp123_08.gtf \
--sjdbGTFtagExonParentTranscript
transcript_id \
--sjdbGTFtagExonParentGene gene_id \
--sjdbGTFfeatureExon gene \
--sjdbOverhang 89 \
--genomeSAindexNbases 4 \
--sjdbFileChrStartEnd
/home/tgrudda1/scr4-cthio1/10X-Visium/clo20_183119/outs/fastq_path/HG23MDRX2/sp.star/sp.index/nicole.s/tgx/08/sjdbFile_08.txt
Star_mapping
STAR --runThreadN 48
\
--genomeDir
/home/tgrudda1/scr4-cthio1/10X-Visium/clo20_183119/outs/fastq_path/HG23MDRX2/sp.star/sp.index/nicole.s/tgx/08
\
--readFilesIn
/home/tgrudda1/scr4-cthio1/10X-Visium/clo20_183119/outs/fastq_path/HG23MDRX2/umi-tool-tg/HB6/HB6_R2_extracted_trimmed.fastq.gz
\
--readFilesCommand zcat \
--outFilterMultimapNmax 30 \
--outSAMtype BAM SortedByCoordinate \
--quantMode TranscriptomeSAM GeneCounts
\
--limitBAMsortRAM 4041611632 \
--outFilterScoreMinOverLread 0.1 \
--outFilterMatchNminOverLread 0.1 \
--outFileNamePrefix
/home/tgrudda1/scr4-cthio1/10X-Visium/clo20_183119/outs/fastq_path/HG23MDRX2/sp.star/mapped/aHb6/nicole.s/tgx/08/08_
\
--readMapNumber 100000 \
echo "FIN"
08_ReadsPerGene
N_unmapped 28232
28232 28232
N_multimapping 51940
51940 51940
N_noFeature 0
9463 10365
N_ambiguous 11714
5501 6213
HBVgp3 814
489 325
HBVgp4 1665
529 1136
HBVgp1 5635
3846 1789
HBVgp2 0
0 0