Does featureCounts keep or filter specific length of reads?

49 views
Skip to first unread message

Shun Liu

unread,
Jul 7, 2020, 3:04:37 PM7/7/20
to Subread
Hi,

In case I would like to keep 22-32 nt sequences for certain RNAs, or count reads in a specific length range, could featureCounts work it out? This step would be useful if reads with specific lengths are considered in some genomic regions where clusters (e.g., phased piRNA/siRNA clusters) are identified. Each cluster might have a predominant RNA size. I would like to only count reads of the predominant size. It can be a range (e.g., 22-23 nt or 26-32 nt) or a specific number (e.g., 22 or 32).

Yang LIAO

unread,
Jul 7, 2020, 5:31:40 PM7/7/20
to Subread
You can specify the minimum overlap length between a read and a feature (or meta-feature). This can remove too-short reads but will still keep the long reads.

You may use AWK to pre-process the BAM file if you're using Linux or macOS:

$ samtools view -h my_input.bam | awk ' $0~/^@/ {print;next} length($10)>=22 && length($10)<=32 {print} ' > my_output.sam

Then you can count my_output.sam. "length($10)" denotes the length of the read sequence, hence my_output.sam only has the header lines (starting with '@') and reads between 22bp and 32bp long. You can change the conditions applied to "length($10)" to have exact lengths or a range of lengths or their combinations. 
Reply all
Reply to author
Forward
0 new messages