Featurecounts Successfully assigned reads : 0

900 views
Skip to first unread message

Yun-Lin Wang

unread,
Apr 17, 2018, 5:09:08 AM4/17/18
to Subread
I'm new to featurecounts and sorry for this basic question,

But I can not understand any information

I try to use featurecounts to count features after alignment

featureCounts -a ../02.Annotation/test_chr.gtf -o test2 C19_test.sam -t miRNA -g Name

But this is what i get



summary



I tried to search on web and found a possible reason for this problem is
"Chromosome names doesn't match my reference genome data" and -A
is a possible way to solve problem, the other one is change gtf file to match

So, how do I know chromosome name look like in my SAM file (Directly after Hisat2)

And If I have to use -A, what is this "comma delimited file including chromosome alias names" file look like (format)

Clearly there are something I missed
Please Let me know what do I miss..

Thank You
Auto Generated Inline Image 1
Auto Generated Inline Image 2

Wei Shi

unread,
Apr 17, 2018, 8:11:53 PM4/17/18
to Subread
If you suspect that the reason why you got no reads counted is because chromosome names do not match between your annotation and your SAM file, you can check a few mapped reads in the SAM file to see if their chromosome names match those in your annotation. To compare the full list of chromosome names, you can compare the chr names included in the reference genome (your reads were mapped to) and chr names in your annotation to see if they match.

The screen usage provides format information for the '-A' option:

"  -A <string>         Provide a chromosome name alias file to match chr names in
                      annotation with those in the reads. This should be a two-
                      column comma-delimited text file. Its first column should
                      include chr names in the annotation and its second column
                      should include chr names in the reads. Chr names are case
                      sensitive. No column header should be included in the
                      file."

Also, you count your reads to miRNAs. Is this what you intended to do?

Best wishes,

Wei

 

Yun-Lin Wang

unread,
Apr 17, 2018, 10:21:39 PM4/17/18
to Subread
Thank You for Response
Truly I can not found any chromosome information in my sam file (Sorry, I miss that...)
I guess the reason is I build the alignment index file by extract human data from fasta file download from miRBase
ftp://mirbase.org/pub/mirbase/CURRENT/mature.fa.gz
And the gtf file is transform from their gff3 to gtf
ftp://mirbase.org/pub/mirbase/CURRENT/genomes/hsa.gff3

This is how my SAM file looks like


And the GTF file



So...In my case, GTF based count tool doesn't suit for me?
(Sorry for these basic question..)

Thank You
Auto Generated Inline Image 1
Auto Generated Inline Image 2

Wei Shi

unread,
Apr 18, 2018, 12:36:23 AM4/18/18
to Subread
Your sam files includes mapping coordinates of reads in miRNA gene sequences whereas your GTF annotation includes coordinates of features in the whole genome. These are two completely different coordinate systems and they cannot work together.

You will have to map your reads to the genome and then use your current GTF file to get read counts, or try to obtain a GTF file in which the first column is miRNA gene names and then use your current bam files to count again this GTF file.

Yun-Lin Wang

unread,
Apr 18, 2018, 1:26:57 AM4/18/18
to Subread
Thank you very much
I didn't mention this beginner problem
I'll try to modify the annotaion file and count again

Thank You

peachan...@gmail.com

unread,
Apr 29, 2018, 8:09:06 PM4/29/18
to Subread
Hi!

I was hoping that anyone could help me with a similar problem. After running featureCounts I also get 0% successfully assigned reads. I was thinking that perhaps my problem is the same as Yun-Lin Wang´s, and the chromosome names do not match in my .gtf and reference genome. I am attaching below some information in case it might help. Is there any way I can use the option -A to fix this?

My reference genome fasta file looks like this:

>Ea.00g000010-v1.0.a1 ID=Ea.00g000010-v1.0.a1|Name=Ea.00g000010|organism=E apt|type=gene|length=750|location=tig00000001_pilon:10186..10935+

My bam files look like this:

FCC1L27ACXX:8:1307:18227:46127# 99 Ea.00g000010-v1.0.a1 11 50 100M = 95 184 TGCAGAAACTCCTTCTCTCCGCCGGCCTTTTAGGAGCCGCCACAGCATTGAACCGAGCTGTGCTGTGCTCCTGGCGGCAACTTCGACATGAGCAAGTGGGAGTT b beeeeefegggiiihiihhhiiifhiiiiiihiihiihhhiihefegfgeeeccccab´bcccbbcbcccaa{abccccccccbcbcbc^_bbaa°° AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Y:100 YT:z:UU NG:i:i

My .gtf looks like this:

tig00000001_pilon GenSAS?5a847ca1545e3-publish CDS 10186 10935 . + 0 transcript_id "Ea.00g000010.m01"; geneID "Ea.00g000010-v1.0.a1";

Thank you so much!

Wei Shi

unread,
Apr 30, 2018, 7:24:20 PM4/30/18
to Subread
Yes the chromosome names do not match between your bam file and your gtf annotation file. You can use -A option to link them so that featureCounts can work.

peachan...@gmail.com

unread,
May 1, 2018, 1:55:55 PM5/1/18
to Subread
Thank you so much Wei Shi! 

It seems that option -A is a txt file where I need to have one column with the name of all my chromosomes in the gtf file and a second column with the corresponding name in the bam file.

I know that in my .gtf file the chromosome name format is tig0000000X_pilon (where X is a number), and I have eight of them. However, when I check my bam file (after indexing) for the name of the chromosomes with the command samtools idxstats *.bam, I see that I have as many chromosome names as transcripts and that, in fact, the names correspond to different gene IDs. So I dont´t know how to match chromosome names in my gtf with chromosome names of the bam file. Do you have any suggestions?

Thank you again for all your help.

(Sorry, I think this message arrived to you by email first)

Wei Shi

unread,
May 1, 2018, 7:18:18 PM5/1/18
to Subread
Ok, then I think you cannot use this GTF annotation for counting because chr names cannot be matched. You will have to obtain/generate an annotation where chromosomes are genes as used in your bam file.
Reply all
Reply to author
Forward
0 new messages