FeatureCounts SAF file gene ID to Entrez symbol

260 views
Skip to first unread message

Helen Falk

unread,
Jun 3, 2018, 9:03:41 PM6/3/18
to Subread
Hello,

I am trying to convert the gene ID from the Feature Count annotation SAF file to Entrez symbol. However, I tried HUGO and also DAVID web services, and there are always a few ID could not be converted. I wonder if you have any advice in this. Thank you very much.

Helen

Yang LIAO

unread,
Jun 3, 2018, 9:47:46 PM6/3/18
to Subread
Can you give a few examples of the EntrezIDs that are in Subread's embedded annotation files but not recognized by DAVID or HUGO? Also, are you using hg19/hg39/mm9 or mm10 annotations?

Yang LIAO

unread,
Jun 5, 2018, 8:39:03 PM6/5/18
to Subread
Thanks for the examples!

I found that the gene ids that were in neither HUGO nor DAVID are all obsolete -- some of them were gene models that were later found to be invalid (eg 100126476, 105379398, 105371162), and the other were replaced by other gene ids (eg 84849, 117153, 400863). In other words, they have been removed from the new gene lists from NCBI. The gene ids that were in HUGO but not in DAVID seemed normal; I don't know the strategy of HUGO on selecting genes but it apparently doesn't use all the NCBI genes. 

The inbuilt annotations in the Subread package seemed a little old, although the analysis results should not be largely different if the latest NCBI annotation is used.

Wei Shi

unread,
Jun 6, 2018, 7:51:21 PM6/6/18
to Subread
Hi Helen,

Please post all your correspondences to the forum so the examples you sent can also be seen by others. Thanks.

We do not update the inbuilt annotations so that the downstream data analyses can be made consistent. Although you will see a few obsolete gene ids, there is not much difference between the current version of annotation in Rsubread and later versions. If you do want to use the latest version of annotation, you can download it from public databases such as NCBI RefSeq database and then provide it to featureCounts.

Best regards,

Wei

Helen Falk

unread,
Jun 7, 2018, 1:07:33 PM6/7/18
to Subread
Dear Yang and Wei,

Thanks so much for all the help!! I am sorry I did not realize my emails are not posted here. So here are the information corresponding to this topic:

We are using hg38 annotations. 

1. If use the annotation list download from HUGO, and select "Approved Symbol" and "Ensembl ID (supplied by Ensembl)", there are 2219 gene ID from SAF file that could not be matched. Please see file "geneID_noHUGOSymbol.txt"
2. I further processed the 2219 ID via DAVID website. There are 330 still could not be annotated. They are in file "geneID_noHUGOSymbol_noDAVIDSymbol.txt".

I was wondering if there is a "standard" annotation file, and the Dr. Shi and Dr. Liao cleared my concerned in their answers above. I really appreciate your replies. 

Helen
geneID_noHUGOSymbol_noDAVIDSymbol.txt
geneID_noHUGOSymbol.txt

Wei Shi

unread,
Jun 7, 2018, 7:29:10 PM6/7/18
to Subread
Thanks Helen.
Reply all
Reply to author
Forward
0 new messages