Counting reads in Introns

1,413 views
Skip to first unread message

rajeev...@gmail.com

unread,
Jul 13, 2015, 3:49:17 PM7/13/15
to bedtools...@googlegroups.com

Hello Everyone,


                I am trying to count reads mapping to introns in my RNA-Seq data.  I have used the bedtools multicov to count the alignments in the indexed BAM file.  Below are the command line that I have used.    However, when I visualize the aligned reads using IGV, I have noticed that the program is only counting reads mapped to the boundary exons and not intronic reads.  I have also tried – f to ensure a minimum of 10% read overlap within an intron but get same resuts.  I am very new to using these tools.  Any ideas on why it is counting boundary exon reads and any suggestions on tools to just count reads within an intron will be very helpful. 


bedtools intersect -abam test.sorted.bam -b intron.bed -split >intron.bam

samtools sort intron.bam intron.sorted

samtools index intron.sorted.bam

bedtools multicov -bams intron.sorted.bam -bed intron.bed


Thanks,


Rajiv

  

arielp...@gmail.com

unread,
Jul 17, 2015, 3:59:39 PM7/17/15
to bedtools...@googlegroups.com, rajeev...@gmail.com
Hi Rajeev,

My post was just before yours, and I was trying to resolve this method specifically so I could count reads contained by introns.  (https://groups.google.com/forum/#!topic/bedtools-discuss/XWgFRjRqGS8)

In my case it is also important to eliminate intron-exon boundary reads, so I am only counting reads contained by strictly-intronic portions of genes.

You could get these regions with a command like this:

bedtools subtract -s -a genes.bed -b exons.bed > strict_introns.bed

But to get coverage from only those reads that are contained by the introns, you'll need to do something like this:

bedtools intersect -u -split -f 1 -a sample.bam -b strict_introns.bed | bedtools coverage -a strict_introns.bed -b - > strict_intron_coverage.txt

Ariel

Varun Gupta

unread,
Mar 13, 2017, 5:48:49 PM3/13/17
to bedtools-discuss, rajeev...@gmail.com, arielp...@gmail.com
Hi Ariel,
I have a question here. It is clear from your question in the previous post that you want to count only those reads within the intron(or the feature) which are contained in it. Any overlap you don't want to count.
Since I have a RNA-Seq data, i want to count reads within introns but also intron/exon boundry. Can you explain me

1. What is -split option doing??

2. If I remove -f 1 option from your answer (bedtools intersect -u -split -f 1 -a sample.bam -b strict_introns.bed ) , i will get my answer. isn't default for -f is 1 so when we omit it, I get what I want but When I USE IT i only get counts within the intron coordinates??

Thanks

Varun

Ariel Paulson

unread,
Mar 13, 2017, 9:23:25 PM3/13/17
to Varun Gupta, bedtools-discuss, rajeev...@gmail.com
Hi Varun,

1. The -split option splits reads which are mapped across splice junctions.  Otherwise, the end coord of the read will simply be start + length, which could make it overlap an intron, when in fact it skips the intron.  More specifically, it separates the read into two (or more) intervals based on the cigar string, where the cigar N entry acts as the separator between intervals.  It may also cause the read to count as 2 reads, I am not sure.  

2. You are correct, if I understand what you are saying.  Using "-f 1" requires that 100% of the read length lies inside the intron interval, else the read is ignored -- that was my approach.  Dropping -f will retain reads which intersect the intron by at least 1bp, which is what you want.  

The default value of -f is not 1 (100%), it is 0.000000001 (0.0000001%), or really ceiling(<read length>*0.000000001), which will always result in 1bp -- as long as your reads are > zero bp and < one billion bp, which is usually true...

Ariel

VG

unread,
Mar 14, 2017, 3:28:17 PM3/14/17
to Ariel Paulson, bedtools-discuss, rajeev...@gmail.com
Thanks Ariel. Also I want to ask you, what downstream steps you take for saying whether the intron is retained or not?Do you use DEXSeq for it? or anything else?What if you don't have replicates and only time points as data and want to see what happens to the intron at different time points?

Thanks
Varun

Ariel Paulson

unread,
Mar 15, 2017, 8:49:56 AM3/15/17
to VG, bedtools-discuss, Rajeev R
I was actually just quantitating various genomic compartments for library construction QC, not doing DE with them.  If you're looking for intron retention, DEXSeq is probably a good starting point.  I haven't used it myself.

If no replicates, you will probably be stuck with using thresholds to define candidates.  For global gene-of-interest extraction, with single-replicate timeseries data, in the past I have selected genes by requiring that each gene: 1) have a maximum RPKM >= X, where X is usually >= 1, and 2) experience a maximum absolute fold-change between any two time points of Y, where Y is usually >= 2.  This eliminates genes with low expression and genes which do not change over time.  You could make an MA plot with row max fold-change and row mean expression and look for interesting points.

Ariel Paulson

unread,
Mar 16, 2017, 11:54:51 PM3/16/17
to VG, bedtools-discuss, Rajeev R
You appear to have dropped the "-bed" from the bedtools intersect call in the second command -- now it will be pushing compressed bam output through your pipe, which bedtools coverage will not understand.

At least I think that is the problem.

On Wed, Mar 15, 2017 at 12:28 PM, VG <gupta5...@gmail.com> wrote:
Hi Ariel,
Thanks for your reply. Also I want to count the reads in my introns to be in strand specific way.
This command gives me this output
bedtools intersect -u -split -f 0.5 -a SRR1029143_Aligned.sortedByCoord.out.bam -b known_yeast_introns.bed -bed | head

chrI    87396    87446    SRR1029143.505365    255    -    87396    87446    0,0,0    1    50,    0,
chrI    87418    87467    SRR1029143.10559369    255    +    87418    87467    0,0,0    1    49,    0,
chrI    87456    87506    SRR1029143.3390263    255    +    87456    87506    0,0,0    1    50,    0,
chrI    87458    87507    SRR1029143.9988740    255    +    87458    87507    0,0,0    1    49,    0,
chrI    87462    87512    SRR1029143.7233972    255    -    87462    87512    0,0,0    1    50,    0,
chrI    87463    87512    SRR1029143.158693    255    +    87463    87512    0,0,0    1    49,    0,
chrI    87468    87518    SRR1029143.7312705    255    +    87468    87518    0,0,0    1    50,    0,
chrI    87472    87521    SRR1029143.4940539    255    +    87472    87521    0,0,0    1    49,    0,


Of course I am using it as bam and not bed and then I use coverage bed  with -s option

bedtools intersect -u -split -f 0.5 -a SRR1029143_Aligned.sortedByCoord.out.bam -b known_yeast_introns.bed  |bedtools coverage -s -a known_yeast_introns.bed -b stdin | head

chrI    87387    87500    +    0    0    113    0.0000000

chrI    139187    139218    +    0    0    31    0.0000000
chrI    142253    142619    +    0    0    366    0.0000000

Nothing for chrI.Should be 6 since the intron feature is on +ve strand. What am I missing here??

Thanks for all the help.

Regards
Varun


VG

unread,
Mar 16, 2017, 11:54:51 PM3/16/17
to Ariel Paulson, bedtools-discuss, Rajeev R
Hi Ariel,
Thanks for your reply. Also I want to count the reads in my introns to be in strand specific way.
This command gives me this output
bedtools intersect -u -split -f 0.5 -a SRR1029143_Aligned.sortedByCoord.out.bam -b known_yeast_introns.bed -bed | head

chrI    87396    87446    SRR1029143.505365    255    -    87396    87446    0,0,0    1    50,    0,
chrI    87418    87467    SRR1029143.10559369    255    +    87418    87467    0,0,0    1    49,    0,
chrI    87456    87506    SRR1029143.3390263    255    +    87456    87506    0,0,0    1    50,    0,
chrI    87458    87507    SRR1029143.9988740    255    +    87458    87507    0,0,0    1    49,    0,
chrI    87462    87512    SRR1029143.7233972    255    -    87462    87512    0,0,0    1    50,    0,
chrI    87463    87512    SRR1029143.158693    255    +    87463    87512    0,0,0    1    49,    0,
chrI    87468    87518    SRR1029143.7312705    255    +    87468    87518    0,0,0    1    50,    0,
chrI    87472    87521    SRR1029143.4940539    255    +    87472    87521    0,0,0    1    49,    0,


Of course I am using it as bam and not bed and then I use coverage bed  with -s option

bedtools intersect -u -split -f 0.5 -a SRR1029143_Aligned.sortedByCoord.out.bam -b known_yeast_introns.bed  |bedtools coverage -s -a known_yeast_introns.bed -b stdin | head

chrI    87387    87500    +    0    0    113    0.0000000

chrI    139187    139218    +    0    0    31    0.0000000
chrI    142253    142619    +    0    0    366    0.0000000

Nothing for chrI.Should be 6 since the intron feature is on +ve strand. What am I missing here??

Thanks for all the help.

Regards
Varun


Dolly

unread,
Jun 6, 2017, 7:11:44 AM6/6/17
to bedtools-discuss, rajeev...@gmail.com
Hello,

I am new to RNA-seq data analyses. I have mapped the reads to my genome of interest. I want to only consider the reads which are in the intronic region and even consider those which are overlapping to the exonic regions.

You have suggested the following commands:
1) bedtools subtract -s -a genes.bed -b exons.bed > strict_introns.bed
2) bedtools intersect -u -split -f 1 -a sample.bam -b strict_introns.bed | bedtools coverage -a strict_introns.bed -b - > strict_intron_coverage.txt

Do I have to use both commands? what files do I have to put for 1st command? Is the first command creating a file with all intron co-ordinates? Can you kindly explain me how will it work? It will be really helpful to me.

Also, I want to view these intronic mapped reads in some Genome viewer eg IGV. How can I do that? Is there anyway to convert bed to bam for loading them in any genome viewer?

Hoping for your kind response at the earliest.

Thanking You,

thind.a...@gmail.com

unread,
Oct 23, 2018, 5:23:31 PM10/23/18
to bedtools-discuss
HI! everyone

I have question related to this topic. So, in my case intron legth (20bp to 40 bp) is always less than read length. So I want to count reads which are matching 100% on intron but it can cross exon-intron coundaries.
I have intron.bed file and sorted bam files. how, I can do that using bedtool. I am looking for suggestion becuase i am new to bedtool usage.

Regards:
Kumar

arielp...@gmail.com

unread,
Oct 23, 2018, 6:16:10 PM10/23/18
to bedtools-discuss
Something like "intersectBed -F 1 -a sorted.bam -b introns.bed > introns.bam" should do, which requires that the introns be overlapped 100% by the reported reads.

amarind...@icar.cnr.it

unread,
Oct 24, 2018, 9:56:12 AM10/24/18
to bedtools-discuss
 I tried -f 1, but didn't get required results, I think it's other way around. 100% overlap of reported reads with intron.

thind.a...@gmail.com

unread,
Oct 24, 2018, 10:53:57 AM10/24/18
to bedtools-discuss
 With -F 1 option, I found many false positive entries. But -F 0.095 and -f 0.09 have better results. But I still didn't understand the logic of finding false positive with -F 1 option.

arielp...@gmail.com

unread,
Oct 24, 2018, 3:31:25 PM10/24/18
to bedtools-discuss
You must use "-F 1" not "-f 1", these mean completely different things.  See intersectBed --help.
Reply all
Reply to author
Forward
0 new messages