Confusion about viewing STAR aligned reads in IGV

720 views
Skip to first unread message

Emilie Wilkie

unread,
Nov 20, 2015, 11:53:23 AM11/20/15
to rna-star
Hi,

I've aligned my stranded RNA_seq data (generated from the illumina dUTP protocol) with STAR and I'm slightly confused by what I'm seeing in IGV and am hoping someone could shed some light on this.

I have aligned my reads using STAR, using ----readFilesIn R1.fastq R2.fastq.
Since I'm interested in the reads corresponding to the negative transcript, I have then filtered the aligned reads on the samflags 147 and 99.

To summarise my current understanding:

R1(+)R2(-) with positive insert size should be denoted as F1R2 - from the negative transcript - should have sam flags R1(99)/R2(147)
R1(-)R2(+) with negative insert size should be denoted as F2R1 - from the positive transcript - should have sam flags R1(83)/R2(163)

However when I subsequently look at only those reads with flags 99 or 147 in IGV I notice the following:

The pair orientation is F1R2 for all my read pairs, however some R1 reads are mapped to the (-) strand and the R2 reads to the (+) strand, with a negative insertsize (which makes me believe they should be denoted F2R1). I am also seeing R1 reads mapped to the (+) strand and the R2 reads mapped to the (-) strand with a positive insert size (which makes me believe they should be denoted F1R2).

I believe I should be only seeing F2R1 pair orientation, from the flags 99 or 147 reads. So I believe there is some inconsitency somewhere. Could this potentially have to do with how IGV interprets the STAR generated Sam file? Or have I misunderstood the use of either STAR or IGV or the samflags? Or perhaps the library is not what I think it is?

Any help would be greatly appreciated.

Some additional Info:

- The genome I'm investigating is very small (5000bp) and the transcripts on both strands will overlap.

- when I count the number of reads for each of the sam flags mentioned it is the following:

99 = 1440
147 = 1440
83 = 1113334
163 = 1113334

- I'm expecting to see more reads from the positive than from the negative transcripts

- When summarising the number of reads with flags 99, 147, 83, 163, the total is about 99% of the uniquely aligned reads.

Alexander Dobin

unread,
Nov 20, 2015, 4:41:17 PM11/20/15
to rna-star
Hi Emily,

By "negative transcripts", do you meant the transcripts on (-) DNA strand?
Do you define insert size =Read2_end - Read1_start?
If so, and for Illumina dUTP protocol, this is correct:
R1(+)R2(-) with positive insert size should be denoted as F1R2 - from the negative transcript - should have sam flags R1(99)/R2(147) 
R1(-)R2(+) with negative insert size should be denoted as F2R1 - from the positive transcript - should have sam flags R1(83)/R2(163) 

You should not see any reads 99/147 with positive insert and 83/163 with negative insert, as these combinations are not "properly paired".

You may see some reads mapping to the opposite strand of the annotated transcript, since dUTP protocol have a strand-error rate of ~1%.
In principle, there could be some biologically real "anstisense" transcription.

Since most of your reads are 83/163, it means the original RNAs were mostly from the +strand. Are the transcripts annotated on the +strand?

Cheers
Alex
Reply all
Reply to author
Forward
0 new messages