Recommended reference sequences

60 views
Skip to first unread message

Thomas Sandmann

unread,
Oct 28, 2016, 1:31:25 PM10/28/16
to Sailfish Users Group
Dear Salmon users,

I was wondering if you had any recommendations on the best transcript reference source. 

As far as I understand, the reference should contain all known transcripts that may give rise to sequenced reads.

1. Using Salmon to map to a transcript reference

Ensembl provides multiple FASTA files e.g. shown in this table

There are separate FASTA files for 
  • cDNAs with "transcript sequences for actual and possible genes, including pseudogenes, NMD and the like."
  • ncRNAs with "sequences corresponding to non-coding RNA genes (ncRNA) both short and long."
The ncRNA transcript identifiers from the ncRNA FASTA file are not found in the cDNA FASTA file. 

Gencode provides an alternative source of reference transcript sequences here (e.g. for the mouse).
Again, there are separate FASTA files for 
  • Nucleotide sequences of all transcripts on the reference chromosomes
  • Nucleotide sequences of long non-coding RNA transcripts on the reference chromosomes
So it seems that to get a complete list of RNA sequences, both reference FASTA files (cDNA and ncRNA) need to be combined?

Thanks for any input,

Thomas
Reply all
Reply to author
Forward
0 new messages