We are pleased to announce a new Non-canonical ORFs track collection for the human genome assembly (GRCh38/hg38), bringing together several public databases of open reading frames (ORFs) that fall outside annotated protein-coding genes. While the human genome has roughly 20,000 annotated protein-coding genes, ribosome profiling (Ribo-seq) and proteomics have revealed widespread translation of ORFs in regions long considered non-coding, including 5' and 3' UTRs, long non-coding RNAs, pseudogenes, and alternative reading frames of known genes.
These non-canonical ORFs include upstream ORFs (uORFs) in 5' UTRs, which can regulate translation of the downstream coding sequence; small ORFs (sORFs), generally under 100 codons, many of which produce functional micropeptides; downstream ORFs (dORFs) in 3' UTRs; out-of-frame ORFs that overlap known coding sequence in an alternative frame; and ORFs in transcripts annotated as non-coding RNAs or pseudogenes. The collection gathers the following datasets as individual subtracks:
Every ORF in every subtrack is annotated with the strength of its Kozak sequence, the sequence context around the start codon that governs how efficiently translation initiates. Features are colored by a categorical Kozak label:
Each subtrack offers filters for the start codon, Kozak strength, and a numeric Kozak translational efficiency score, along with dataset-specific filters such as ORF type and evidence category.
See the Non-canonical ORFs collection page and the individual subtrack description pages for per-dataset methods, item counts, download URLs, and references.
We would like to thank the data providers who made these resources publicly available: Xiaolei Zhang, Nicola Whiffin, and the UTRannotator team at Imperial College London; Jonathan Mudge, Jorge Ruiz-Orera, John Prensner, Sebastiaan van Heesch, and the GENCODE / TransCODE consortium; Matthieu Chaldebas and the 5ULTRA team; Tamara Ouspenskaia, Travis Law, Karl Clauser, and colleagues at the Broad Institute of MIT and Harvard for nuORFdb; the MetamORF team at the TAGC laboratory, Aix-Marseille University; and Xavier Roucou and the OpenProt team at the Université de Sherbrooke. We also thank Eric Malekos (UCSC) for suggesting nuORFdb, and the VuTR authors (Whiffin lab) for the Kozak-strength implementation. Finally, we would like to thank Max Haeussler and Jairo Navarro for creating and releasing these UCSC Genome Browser tracks.
Jairo Navarro
UCSC Genome Browser
UC Santa Cruz Genomics Institute
Revealing life’s code.
Google Scholar | Twitter | Facebook | YouTube