Hi,
This is the first time I used STAR (version STAR_2.6.0a) to map RNA-seq paired-end reads so I just want to know if I am doing the right thing. The genome consists of 45K+ contigs. GFF is available but I believe it must be incomplete as the organism is not a model organism. Here's the mapping command I used:
STAR --runThreadN 16 --genomeDir ../data \
--readFilesIn ../data/MO01_R1_001.fastq.gz ../data/MO01_R2_001.fastq.gz \
--readFilesCommand zcat \
--outFileNamePrefix ./control
My mate reads are exactly 150 bps. When I looked at the final report, one thing stuck me was "Average input read length | 300", which is exactly 2 times of the mate (left/right) reads. I am dubious about it. Shouldn't be at least some random overlap between the mate pairs so that the average is slightly shorter than 300?
I randomly took a mate pair to align them, it showed 125 bps, plus/minus overlapping.
Should I specify a non-zero number for peOverlapNbasesMin? I didn't change the default peOverlapMMp either.
Thanks, Eric.