Hi there,
I tried to do genome index for mouse by using gtf from gencode, and then I mapped my reads to the indexed mouse DB. My commands are as follows:
STAR --runThreadN 32 --runMode genomeGenerate --genomeDir mouse_v87/STAR_db --genomeFastaFiles mouse_v87/Mus_musculus.GRCm38.dna.primary_assembly.fa --sjdbGTFfile gencode.vM13.primary_assembly.annotation.gtf --sjdbOverhang 100 --outFileNamePrefix mouse_v87/mouse_
STAR --runThreadN 32 --genomeDir mouse_v87/STAR_db --readFilesIn my_reads.fastq --outSAMprimaryFlag AllBestScore --outSAMtype BAM SortedByCoordinate --quantMode GeneCounts --outFileNamePrefix my_reads_mouse.star
However, from one of the output files "my_reads_mouse.starReadsPerGene.out.tab" (attached here), only 86 genes are listed. I checked the indexed mouse DB folder, and found that the file "geneInfo.tab" only contains 86 genes. In fact, the gencode gtf file has 50686 genes.
To test my commands, I used ensembl gtf file instead. Then this time, the "geneInfo.tab" file contains all 49671 genes. Because STAR recommends gtf from gencode for human and mouse genomes, I'm wondering how to properly index the genome with gtf from gencode. Thank you!
Best,
Janet