counting reads in the introns using paired end strand specific data

67 views
Skip to first unread message

Varun Gupta

unread,
Oct 2, 2017, 12:23:44 PM10/2/17
to rna-star
Hi ALex,
Hope you are doing well. I am trying to count the reads within the intron of the genes in a paired end seq data. From bam file I inferred that the library type for the data is fr -firststrand.

When I run STAR and get Count file, 4th column should give me the sense strand specific counts.

Question:
1. How does STAR know the strand specificity of the reads as there is no input parameter of library type.
2. What is the method for counting reads that STAR uses to get sense and anti-sense counts. I want to do that for intron coordinates and I have a 4 column file with chr , intron start,end and strand info. How can I do it?

Regards
Varun

Alexander Dobin

unread,
Oct 4, 2017, 4:31:55 PM10/4/17
to rna-star
Hi Varun,

STAR compares the strand of the the 1st read mapped to the genome, and the strand of the annotations.
If the strands agree, the reads is counted in the 3rd column, if not - in the 4th column.
Depending on the strandedness of the protocol, one of these columns will be counting reads in the sense direction, and the other in the antisense.

To count reads overlapping introns, you would need to create a GTF file with introns coordinates, e.g. (tab-spaced before gene_id)
chr1   myIntrons   feature 3073253   3074322   .   +   .     gene_id "intron1"; transcript_id "intron1"
chr1   myIntrons   feature 3075858   3079636   .   +   .     gene_id "intron2"; transcript_id "intron2"
gene_id and transcript_id have to be unique for each intron.

Then you can map with 
--sjdbGTFfile introns.gtf --sjdbGTFfeatureExon feature
to get counts per intron.

Note that only 1-base overlap is required for counting, so reads overlapping exon/intron boundaries will be counted.
However, reads mapping to regions of overlapping introns (alternatively spliced) will be counted as ambiguous.
I can probably think of some tricks to avoid these behaviors.

Cheers
Alex

Varun Gupta

unread,
Jan 11, 2018, 4:10:27 PM1/11/18
to rna-star
Hi Alex,
Is there a way/parameter I can use where if my read overlaps atleast 25bp into the intron, only then count it??

Let me know.

Thanks for all your help.

Regards
Varun

Alexander Dobin

unread,
Jan 11, 2018, 5:37:27 PM1/11/18
to rna-star
Hi Varun,

there is no specific parameter that controls the overlap, however, I think if you trim you introns by 25 bases on each side, it will simulate the effect you want to achieve?

Cheers
Alex
Reply all
Reply to author
Forward
0 new messages