GENCODE Genes V43 for human (hg38/hg19) and VM32 for mouse (mm39)

已查看 11 次
跳至第一个未读帖子

Luis Nassar

未读,
2023年2月24日 18:09:032023/2/24
收件人 genome-...@soe.ucsc.edu

We are pleased to announce the release of five new GENCODE Gene tracks corresponding to GENCODE release V43 for human and VM32 for mouse. While all of the tracks are built from the GENCODE release, they fall into two categories. Two of these tracks, GENCODE V43 (hg38) and GENCODE VM32 (mm39) were built with our knownGene pipeline and are now the default gene tracks for those assemblies. The knownGene pipeline builds extensive associations from the annotations and allows us to show additional metadata for each item as well as link to external resources. The track description pages for these tracks contain options for configuring the display such as also showing non-coding genes, splice variants, and pseudogenes. Different tags and labels may also be toggled.

The remaining three tracks were each nested within our GENCODE Versions superTrack for each of the three assemblies: hg19, hg38, and mm39. For human, the GENCODE V43 annotations were mapped to hg38 and then back-mapped to the hg19 assembly. New GENCODE releases now have an assigned rank for transcripts within the gene. The transcript rank may be used to filter the number of transcripts displayed in a principled manner. More details about transcript ranking can be found on the track description page. For all three assemblies, the gene sets contain the following tracks:

  • Basic - a subset of the Comprehensive set.
  • Comprehensive - all GENCODE coding and non-coding transcript annotations, including polymorphic pseudogenes. This includes both manual and automatic annotations.
  • Pseudogenes - all annotations except polymorphic pseudogenes.

The hg38 and mm39 assemblies also include the following track:

  • PolyA - polyA signals and sites manually annotated on the genome based on transcribed evidence (ESTs and cDNAs) of 3' end of transcripts containing at least 3 A's not matching the genome.

Below is a summary of the contents found in each release. For more details visit the GENCODE site.

GENCODE v43 Release Stats
GenesObservedTranscriptsObserved
Protein-coding genes19,393Protein-coding transcripts89,411
Long non-coding RNA genes19,928- full length protein-coding64,004
Small non-coding RNA genes7,566- partial length protein-coding25,407
Pseudogenes14,737Nonsense mediated decay transcripts21,354
Immunoglobulin/T-cell receptor gene segments410Long non-coding RNA loci transcripts58,023
Total No of distinct translations65,519Genes that have more than one distinct translations13,618

GENCODE VM32 Release Stats
GenesObservedTranscriptsObserved
Protein-coding genes21,565Protein-coding transcripts58,913
Long non-coding RNA genes14,834- full length protein-coding45,219
Small non-coding RNA genes6,105- partial length protein-coding13,694
Pseudogenes13,722Nonsense mediated decay transcripts7,211
Immunoglobulin/T-cell receptor gene segments701Long non-coding RNA loci transcripts26,421
Total No of distinct translations45,163Genes that have more than one distinct translations10,914

We would like to thank the GENCODE project for providing these annotations.

回复全部
回复作者
转发
0 个新帖子