FeatureCounts PE stranded RNA-seq

1,540 views
Skip to first unread message

Dulce Vargas Landín

unread,
Jul 9, 2015, 7:16:00 PM7/9/15
to sub...@googlegroups.com
Hi everyone,

I am using featureCounts in my analysis, but I have some troubles in some regions of interest that may overlap with other genomic features at other strand. For this reason, I am using stranded RNA-seq, which will allow me to distinguish reads from my feature of interest. 

I am using the latest version, and my command is the following:


featureCounts -p -s 1 -a BJ_genes.gtf -g gene_id -t exon -T 8 --minReadOverlap 20 -Q 50 --donotsort -o BJ_R1.txt  BJ_R1_name.sorted.bam


And for my regions of interest that are overlapping exons in the other strand I got fragments, even though in the bam file they are inexistent over the strand of my feature. If I change the -s parameter to 2 the number of fragments diminish, but still I got a number. 

For example:

my feature of interest (with -s 1):

BJ_3ATLAS_0919 chr2 113260389 113261389 - 1001 69

gene that has an overlapping exon:

TTL chr2;chr2;chr2;chr2;chr2;chr2;chr2 113239743;113243494;113251720;113258783;113260489;113277859;113286258 113240078;113243572;113251952;113258918;113260758;113278002;113290222 +;+;+;+;+;+;+ 5163 3906


And if I check it with samtools, it makes sense with the bam file, but not with the featureCounts results:


505 Orion@mac:BJ$ samtools view -f 96 BJ_R1.bam "chr2:113260389-113261389" | wc -l

       0

508 Orion@mac:BJ$ samtools view -f 64 BJ_R1.bam "chr2:113260389-113261389" | wc -l

     191


Can you please point me if I am doing something incorrect or if I am missing something? Because I tried different things, like to check the bam file, the gtf coordinates, re-read the manual, but I can see it.


Thank you very much in advance.


Cheers,


Dulce






 

 

Wei Shi

unread,
Jul 14, 2015, 9:59:35 PM7/14/15
to sub...@googlegroups.com
Hi Dulce,

I do not quite understand your question. Are you saying there should be absolutely zero counts for genes that are located on the strand that is not sequenced? Some reads could be mapped to the unsequenced strand, leading to a small number of counts being generated for genes on that strand.

Could you re-run featureCounts on your data with the following commands and show me the counts for the two genes of concern so I can have a better understanding of your question?

featureCounts -p -s 1 -a BJ_genes.gtf -g gene_id -t exon -T 8 -o BJ_R1_s1.txt  BJ_R1_name.sorted.bam
featureCounts -p -s 2 -a BJ_genes.gtf -g gene_id -t exon -T 8 -o BJ_R1_s2.txt  BJ_R1_name.sorted.bam

Regards,
Wei

Dulce Vargas Landín

unread,
Jul 15, 2015, 10:26:53 AM7/15/15
to sub...@googlegroups.com
Hi Wei,

Sorry if I was not clear, I will try to explain it better. I have an exon on the + strand that comes from the TTL gene and I have another genomic feature on the - strand that within its coordinates contains the TTL exon. If I see bam file with IGV, I see no reads coming from the negative strand, even with the bigwig files. I am working with the ENCODE data so to check it I also used their bigwigs, and actually I am running featureCounts with their bam file (ENCFF000EPX). Just to let you know, I found this problem each time that my feature of interest contains an exon or something on the other strand that has a lot of reads, meaning that is not exclusive for this library and cell line. 

Here are the results of the commands you sent me:
If I change the strand (s2) it seems to correct it (but not really because there are no reads in real in that area)  but then it has problems in other genomic features that have no reads on that strand. 

featureCounts -p -s 1 -a BJ_genes.gtf -g gene_id -t exon -T 8 -o BJ_R1_s1.txt  BJ_R1_name.sorted.bam

TTL chr2;chr2;chr2;chr2;chr2;chr2;chr2 113239743;113243494;113251720;113258783;113260489;113277859;113286258 113240078;113243572;113251952;113258918;113260758;113278002;113290222 +;+;+;+;+;+;+ 5163 3909

BJ_0919 chr2 113260389 113261389 - 1001 76

featureCounts -p -s 2 -a BJ_genes.gtf -g gene_id -t exon -T 8 -o BJ_R1_s2.txt  BJ_R1_name.sorted.bam

TTL chr2;chr2;chr2;chr2;chr2;chr2;chr2 113239743;113243494;113251720;113258783;113260489;113277859;113286258 113240078;113243572;113251952;113258918;113260758;113278002;113290222 +;+;+;+;+;+;+ 5163 4159

BJ_3ATLAS_0919 chr2 113260389 113261389 - 1001 6


Another question is why if I ask to not sort the bam file (because is name sorted already) I have small differences in the results:
featureCounts -p -s 1 -a BJ_genes.gtf -g gene_id -t exon -T 8 --donotsort -o BJ_R1_s1.txt  BJ_R1_name.sorted.bam

TTL chr2;chr2;chr2;chr2;chr2;chr2;chr2 113239743;113243494;113251720;113258783;113260489;113277859;113286258 113240078;113243572;113251952;113258918;113260758;113278002;113290222 +;+;+;+;+;+;+ 5163 3946

BJ_0919 chr2 113260389 113261389 - 1001 78

featureCounts -p -s 2 -a BJ_genes.gtf -g gene_id -t exon -T 8 --donotsort -o BJ_R1_s2.txt  BJ_R1_name.sorted.bam

TTL chr2;chr2;chr2;chr2;chr2;chr2;chr2 113239743;113243494;113251720;113258783;113260489;113277859;113286258 113240078;113243572;113251952;113258918;113260758;113278002;113290222 +;+;+;+;+;+;+ 5163 4067

BJ_0919 chr2 113260389 113261389 - 1001 17



Thank you very much!


Cheers,


Dulce







Wei Shi

unread,
Jul 15, 2015, 7:20:19 PM7/15/15
to sub...@googlegroups.com
Hi Dulce,

Thanks for your clarification.

Firstly, the reason why you got slightly different results when you use the '--donotsort' option is because there are likely some multiple-mapping read pairs that were not properly sorted by samtools (ie incorrectly paired), resulting in incorrect read pair counting. You should let featureCounts re-sort the bam files when you count read pairs.

I actually wondered if the dataset is indeed strand specific. The TTL gene got similar counts when you used '-s 1' and '-s 2' (51633909 and 51634159 respectively), which is impossible if the data are strand specific.

Best,
Wei

Dulce Vargas Landín

unread,
Jul 16, 2015, 9:25:58 AM7/16/15
to sub...@googlegroups.com
Hi Wei,

It is stranded, the thing is that the library is FF not FR nor RV, that is why it gives similar results with s1 or s2 but it does not explain why I have fragment counts where there are not reads on that strand.

Thank you very much for your clarification about the sorting.

Best wishes,

Dulce 

Wei Shi

unread,
Jul 16, 2015, 8:00:39 PM7/16/15
to sub...@googlegroups.com
Hi Dulce,

Could you run the following command and save it to a file and then send the file to us? We need to have a close look at all the reads mapping to region "chr2:113260389-113261389" to understand what went wrong.

samtools view BJ_R1.bam "chr2:113260389-113261389"

Also, you got a count of 100176 with -s 1 and a count of 10016 with -s 2 for this region. Are these counts correct? In one of your earlier emails, you showed that there are only 191 read pairs mapping to this region.

Best,

Wei

Dulce Vargas Landín

unread,
Jul 21, 2015, 8:38:39 AM7/21/15
to sub...@googlegroups.com
Hi Wei,

I sent you the file you requested by a private e-mail. Let me know if you did not get it.

Thank you in advance.

Dulce

Wei Shi

unread,
Aug 26, 2015, 12:22:40 AM8/26/15
to Subread
Dear Dulce,

Thanks for sending through the file.

Are you using the latest version of Subread package (1.4.6-p4)? It seems to us that the problem you encountered was caused by a previous bug we have already fixed.

Best,

Wei

Wei Shi

unread,
Aug 27, 2015, 8:39:42 PM8/27/15
to Subread
Hi Dulce,

Please keep your messages to the forum so they can benefit others as well.

It turned out the issue is that featureCounts always takes the orientation of paired end reads as forward-reverse (FR) and it flips the orientation of the second read in the pair when counting the pairs. This caused a problem for your data since your reads have a FF orientation and that is the reason why you got some counts for the gene located on the negative strand (counts from the flipped second read).

We will add a new parameter to featureCounts to let users specify the orientation of paired end reads so that they can be properly counted when their orientation is not FR.

In the meantime, you may try counting reads instead of read pairs (remove -p parameter from your command). This will give you zero counts for the negative strand gene.

Best wishes,
Wei

Wei Shi

unread,
Sep 3, 2015, 9:25:56 PM9/3/15
to Subread
Dear Dulce,

We have added a new parameter '-S' to featureCounts in the latest release 1.4.6-p5, which should allow you to count your read pairs correctly.

Cheers,
Wei

Dulce Vargas Landín

unread,
Sep 7, 2015, 2:38:03 PM9/7/15
to Subread
Dear Wei,

Thank you very much I am going to try the new parameter!

Cheers,

Dulce
Reply all
Reply to author
Forward
0 new messages