Average input read length of paired-end reads

1,371 views
Skip to first unread message

Eric Ho

unread,
May 25, 2018, 1:15:17 PM5/25/18
to rna-star
Hi,

This is the first time I used STAR (version STAR_2.6.0a) to map RNA-seq paired-end reads so I just want to know if I am doing the right thing. The genome consists of 45K+ contigs. GFF is available but I believe it must be incomplete as the organism is not a model organism. Here's the mapping command I used:

STAR --runThreadN 16 --genomeDir ../data \
                    --readFilesIn ../data/MO01_R1_001.fastq.gz ../data/MO01_R2_001.fastq.gz \
                    --readFilesCommand zcat \
                    --outFileNamePrefix ./control

My mate reads are exactly 150 bps. When I looked at the final report, one thing stuck me was "Average input read length |       300", which is exactly 2 times of the mate (left/right) reads. I am dubious about it. Shouldn't be at least some random overlap between the mate pairs so that the average is slightly shorter than 300? 

I randomly took a mate pair to align them, it showed 125 bps, plus/minus overlapping.

Should I specify a non-zero number for peOverlapNbasesMin? I didn't change the default peOverlapMMp either.

Thanks, Eric.

Alexander Dobin

unread,
May 25, 2018, 4:57:42 PM5/25/18
to rna-star
Hi Eric,

for PE reads, STAR's read length is the sum of the mates length, so it's 300 for 2x150 reads.
--peOverlapNbasesMin is a new option that allows merging of the overlapping mates before mapping.
However, the output alignment is converted back to PE format.
Typically, this does not change most of the alignments, as STAR can deal with overlapping mates even without this option.
It's most important for detection of chimeras for overlapping mates.

Cheers
Alex
Reply all
Reply to author
Forward
0 new messages