We are proud to announce the release of four new tracks and a new track group on hg38 dedicated to the NIH's Human Pangenome Reference Consortium (HPRC) data.
No single reference genome such as hg19 or hg38 can accurately represent
human genetic diversity. The HPRC's goal is to improve this by sequencing
thousands of human genomes at high quality and building new tools to
improve working with them. The first data release
from this project consists of 47
phased, diploid assemblies, more than 99% accurate at the structural and
base pair levels. We obtained alignments of these new genomes to hg38 from
the HPRC analysis groups and have created new Genome Browser annotation
tracks that visualize the differences between the established hg38
reference and the new 94 pan-genome assemblies. The new tracks are grouped
into short and structural variants, with the latter further split by type
(insertion, deletion, inversion, duplication, etc). We plan to update these
and add other tracks as soon as more HPRC data is released.
In this first HPRC data release, we are adding four new tracks to this new track group. Details on each of the tracks are as follows:
The Short Variants container track shows tracks of short nucleotide variants of a few base pairs when aligning HPRC genomes to the hg38 reference assembly using the Minigraph-cactus approach. Short variants have been used in population genetics to investigate population-specific allele frequencies and genetic diversity, and have been used in the association of diseases. The track consists of three subtracks:
The Rearrangements container track shows various rearrangements in the HPRC assemblies with respect to hg38. The types include indels, duplications, inversions, and other more complicated rearrangements.
There are five tracks in the Rearrangement composite track:
Many of these features are unique to this dataset, although overlap can be found with other structural variant databases such as DGV. Potential applications of these rearrangements could be data validation for new and existing data and a better understanding of the prevalence of rearrangements in diverse populations, many of which are underrepresented in current clinical and genomic databases.
The Chain/Net track shows regions of the human genome that are alignable between the HPRC genomes as well as hg38 and T2T-CHM13. A total of 176 maternal and paternal haplotypes were used in this analysis. The configuration page for this track sorts the haplotypes into 14 subpopulations as follows:
The 90-way Multiple Alignment track contains multiple alignments of 90 human genomes generated by the Minigraph-Cactus pangenome pipeline, which creates pangenomes directly from whole-genome alignments. This method builds graphs containing all forms of genetic variation while allowing the use of current mapping and genotyping tools. The confirmation page sorts the Maternal and Paternal haplotypes by the same 14 subpopulations described above.
We are always looking for feedback, if you would like to see other HPRC data, or the data presented differently, please contact us at gen...@soe.ucsc.edu. Likewise, if you find this data useful and see potential improvements, we would be interested in hearing from you.
We would like to thank the Human Pangenome Reference Consortium for taking on this genomics challenge and providing these data. In particular, we would like to thank Benedict Paten, Heng Li, and Glenn Hickey for their help in putting these Browser tracks together.