featureCount over counting multi-mapping long reads

584 views
Skip to first unread message

Laraib Malik

unread,
Feb 28, 2018, 5:34:58 PM2/28/18
to Subread
Hi,

I came across an issue while using featureCounts to quantify long read datasets and wanted to report that. It does not seem to handle multi-mapping reads correctly, as output by the aligner, minimap2. I've attached a small sample file where one read has 7 possible alignments. I run featureCounts with the "-L", "-M", "-O", and "--fraction" flags. However, the output I get still treats it as 7 separate alignments, which should not be the case and the total count in output is given 6. I was wondering why that would happen? Note that the read is aligned against the mm10 mouse genome. I can email the output file for this as well if needed. This was just a test case, but I've encountered multiple cases where there's multi-mapping and featureCounts overcounts the total number of reads and, in turn, expressed features. Let me know if I should be using different parameters. Would be grateful for your help.

-Laraib

Command I ran: "/subread-1.5.3-Linux-x86_64/bin/featureCounts -t exon -g gene_id -a Mus_musculus.GRCm38.84.gtf -o out.txt longSingle.sam -L -O -M --fraction"

longSingle.sam

Yang LIAO

unread,
Feb 28, 2018, 5:43:09 PM2/28/18
to Subread
Hi Laraib,

The alignment records in your SAM file don't have the "NH" tag. According to SAM tag specification 


, "NH" tag is a standard tag for "number of reported alignments that contains the query in the current record". Other mainstream read aligners (inc. Subread), as we have tested, have this "NH" tag for the multi-mapping reads.

If this tag is missing, featureCounts cannot tell if a read was mapped multiple times in the input SAM/BAM file.

Cheers,
Yang

Laraib Malik

unread,
Feb 28, 2018, 6:04:59 PM2/28/18
to Subread
Thank you for the clarification. 

Best,
Laraib

Laraib Malik

unread,
Mar 1, 2018, 12:49:26 PM3/1/18
to Subread
Hi,

I added the "NH" tag to the file but still get the same result. Do the records seem correct in the attached file? 
longSingle.sam

Laraib Malik

unread,
Mar 27, 2018, 2:58:56 PM3/27/18
to Subread
Hi,

I still face the same issue in the output even with the "NH" flag added. Are there any other flags that need to be added for multi-mapping reads? I've added a sample file in my previous post as well with the same concern. 

Yang LIAO

unread,
Mar 27, 2018, 6:58:46 PM3/27/18
to Subread
Hi Laraib,

Can you provide us the SAM or BAM file where the NH tag was added?

Cheers, Yang

Laraib Malik

unread,
Mar 27, 2018, 7:10:25 PM3/27/18
to Subread
Please find it attached. Let me know if I'm doing something wrong here. Thanks.
longSingle.sam

Yang LIAO

unread,
Mar 27, 2018, 7:42:52 PM3/27/18
to Subread
Thanks!

I tried to convert the SAM to BAM format before counting and the results became correct. It seemed to be a bug in featureCounts' long-read mode when the input format is SAM. 

Sorry for the troubles. The bug will be fixed in the next release. As a quick fix, you may convert the SAM into BAM format and run featureCounts on the BAM files.

Cheers,
Yang
Reply all
Reply to author
Forward
0 new messages