Hi all,
I am using STAR regularly but I just encountered a really weird output with the -quantMode GeneCounts option.
For some of the genes the column 2 output is much much lower than the sum of the stranded columns 3 +4:
here is my output:
| N_unmapped |
598099 |
598099 |
598099 |
| N_multimapping |
382731 |
382731 |
382731 |
| N_noFeature |
112716 |
10608161 |
10694861 |
| N_ambiguous |
1179474 |
44749 |
44951 |
| AT1G01010 |
273 |
133 |
140 |
| AT1G01020 |
249 |
123 |
127 |
| AT1G01030 |
34 |
18 |
17 |
| AT1G01040 |
1174 |
706 |
718 |
...
| AT1G32630 |
26 |
11470 |
11613 |
| AT1G32640 |
0 |
11604 |
11453 |
As you can see for the vast majority of genes column 2 is roughly column 3+4. I know it's not exactly the sum but still. However, for some genes, like AT1G32640 its 11k for 3 and 11k for 4, and column 2 is 0? Is this an issue with the gtf, although I got it from the Arabidopsis website?
My command line:
$bin_DIR/STAR-2.6.1c/bin/Linux_x86_64/STAR \
--runThreadN 64 \
--readFilesCommand zcat \
--readFilesIn $Fastq_DIR/trimmed/"$Samples"_1.trim.gz $Fastq_DIR/trimmed/"$Samples"_2.trim.gz \
--genomeDir $Genome_DIR \
--outSAMstrandField intronMotif \
--outFilterType BySJout \
--outFilterIntronMotifs RemoveNoncanonical \
--quantMode TranscriptomeSAM GeneCounts \
--twopassMode Basic \
--outFileNamePrefix $Output_DIR/$Samples
I would be grateful for any tips you may have.
Cheers Thomas