ENSEMBL Transcript fasta file

302 views
Skip to first unread message

Pankaj Agarwal

unread,
Dec 15, 2016, 3:29:09 PM12/15/16
to Sailfish Users Group
Hi,
I am a first time user of Salmon and did a quantification run for an rna-seq data set using the Ensembl Transcript fasta file (Homo_sapiens.GRCh38.cdna.all.fa).
A couple of questions:
1. I have never used the transcripts fasta file directly before with other tools (only used the human reference genome and the GTF annotation file), so just wanted to verify if this is the correct transcript fasta file.
2. In the quantification results file I get output as follows:
...

ENST00000414852.1       16      7.89476 0       0

ENST00000390399.3       387     235.968 0       0

ENST00000610439.4       381     230.638 0       0

ENST00000390449.3       343     197.461 0       0

ENST00000632425.1 407   253.985     0     0
...

I am not sure what the period followed by the number mean.  I noticed this comes out of the "Homo_sapiens.GRCh38.cdna.all.fa" file so it might be a better question for Ensembl.

Thanks,

- Pankaj


Rob

unread,
Dec 20, 2016, 10:36:01 AM12/20/16
to Sailfish Users Group
Hi Pankaj,

  Yes, this is a perfectly reasonable transcriptome fasta file to use.  Assuming that you don't want to quantify more exotic (non-coding) transcripts (http://ftp.ensembl.org/pub/current_fasta/homo_sapiens/ncrna/), it should be fine.
Regarding the output, the number after the period denotes the Ensemble version for each transcript.  This is an aspect of Ensemble's naming convention.  You can read more about that on their homepage.

Best,
Rob
Reply all
Reply to author
Forward
0 new messages