Hp 3760 Driver

0 views

Skip to first unread message

Randell Magtoto

unread,

Aug 3, 2024, 4:46:45 PM8/3/24

to illopicen

Rare oncogenic driver events, particularly affecting the expression or splicing of driver genes, are suspected to substantially contribute to the large heterogeneity of hematologic malignancies. However, their identification remains challenging.

To address this issue, we generated the largest dataset to date of matched whole genome sequencing and total RNA sequencing of hematologic malignancies from 3760 patients spanning 24 disease entities. Taking advantage of our dataset size, we focused on discovering rare regulatory aberrations. Therefore, we called expression and splicing outliers using an extension of the workflow DROP (Detection of RNA Outliers Pipeline) and AbSplice, a variant effect predictor that identifies genetic variants causing aberrant splicing. We next trained a machine learning model integrating these results to prioritize new candidate disease-specific driver genes.

We found a median of seven expression outlier genes, two splicing outlier genes, and two rare splice-affecting variants per sample. Each category showed significant enrichment for already well-characterized driver genes, with odds ratios exceeding three among genes called in more than five samples. On held-out data, our integrative modeling significantly outperformed modeling based solely on genomic data and revealed promising novel candidate driver genes. Remarkably, we found a truncated form of the low density lipoprotein receptor LRP1B transcript to be aberrantly overexpressed in about half of hairy cell leukemia variant (HCL-V) samples and, to a lesser extent, in closely related B-cell neoplasms. This observation, which was confirmed in an independent cohort, suggests LRP1B as a novel marker for a HCL-V subclass and a yet unreported functional role of LRP1B within these rare entities.

Altogether, our census of expression and splicing outliers for 24 hematologic malignancy entities and the companion computational workflow constitute unique resources to deepen our understanding of rare oncogenic events in hematologic cancers.

Hematologic malignancies are characterized by abnormal blood cells in the bone marrow, peripheral blood, or lymphatic organs. They can occur in various forms, affecting the myeloid or lymphoid cell lineage. In 2020, hematologic malignancies accounted for approximately 2.5% of new cancer cases globally and accounted for 3.1% of cancer-associated mortality [1]. While some subtypes, like myeloproliferative neoplasm, exhibit a high degree of uniformity in their manifestation and genetic profile, others, like myelodysplastic neoplasm, display a significantly broader spectrum, hampering correct diagnosis and therapy decisions, which negatively impacts treatment outcomes and survival [2]. Thus, better understanding the variety of oncogenic events for each disease entity is of utmost interest to refine diagnostics and facilitate the development of new therapeutic options.

Within the last decade, the identification of driver genes in hematologic malignancies has been dramatically enhanced. To this end, functional screens such as CRISPR [3, 4] and transposon screens [5] on model systems have been applied. Complementary to these efforts, next-generation sequencing analyses of primary clinical samples [6, 7] were employed, which better capture the in vivo biology. This research has provided valuable insights into the underlying genetic landscape of each entity and triggered a revision of the classification systems, which now emphasize genomics-based categorization of various leukemia and lymphoma entities [2, 8,9,10]. However, despite significant progress in understanding recurrent driver mutations in hematologic malignancies, much remains to be learned about the rare events within each disease entity that drive their individual development and progression [11, 12]. Such rare events could arise not only from somatic mutations but also from rare germline variants, as supported by an increasing number of studies unraveling the implication of rare genetic predispositions to cancer [13,14,15,16,17,18].

To address this gap, we conducted a comprehensive analysis of genomes (whole genome sequencing, WGS) and matched transcriptomes (total RNA sequencing, RNA-Seq) of tumor tissues from 3760 patients spanning 24 hematologic malignancy entities (Fig. 1). We analyzed this data using RNA-seq-based expression and splicing outlier callers, as well as AbSplice [36], a tool we recently published that predicts rare genetic variants causing aberrant splicing. We demonstrate how these results can be utilized to identify a novel marker for a rare entity and enhance the prediction of hematologic malignancy driver genes beyond the commonly used mutational recurrence. In summary, our study aims to deepen our understanding of the role of rare gene expression and RNA splicing in the development of hematologic malignancies and provide novel driver gene candidates.

Overview of the study. Dataset: Whole genome sequencing and total RNA sequencing of 3760 hematologic malignancies spanning 24 different disease entities. Bioinformatic processing: On genomic data, IntOGen captures recurrent mutational patterns [37], and AbSplice predicts variants causing aberrant splicing [36]. Working on RNA-seq data, OUTRIDER calls expression outliers of commonly expressed genes [38], NB-act calls overexpression of rarely expressed genes (Methods), and FRASER calls splicing outliers [39]. Census: a unique collection of genomic and transcriptomic aberrations for 24 hematologic malignancy entities. Downstream analysis: driver gene prediction and enrichment analysis per disease entity

DNA and total RNA from peripheral blood and bone marrow samples were extracted using the MagNA Pure 96 Instrument and the MagNAPure96 DNA and Viral NA LV Kit and MagNA Pure 96 Cellular RNA LV Kit, respectively (Roche LifeScience, Mannheim, Germany). WGS and RNA-seq were performed on the prepared samples (Supplementary Materials and Methods).

Variant calling of single-nucleotide variants, short insertions and deletions, structural variants, copy number variations, and gene fusions were performed on all samples as described previously [43,44,45,46] (Supplementary Materials and Methods). The analysis was based on the GENCODE v33 [47] annotation and using the reference genome GRCh37.

The variance component analysis was conducted using a linear model. We used the logarithmized copy ratio and a binarization of the rare VEP high-impact, rare promoter, and rare structural variants affecting a gene-sample combination as the independent variables. As for the dependent variable, we used an autoencoder corrected expression zScore for each gene-sample combination. The linear model was fitted for every gene individually, followed by ANOVA. The variance explained by each independent variable was calculated as the sum of square errors of each independent variable over all independent variables. The resulting variance components were normalized to add up to one, and we reported the mean value.

The validation HCL-V dataset included a total of 42 patients, including 14 HCL-V and 28 HCL patients. The HCL-V patients were diagnosed based on immunophenotype and morphological characteristics consistent with HCL-V, including all CD5 negative, CD11c positive, and CD123 negative markers. All HCL-V patients were confirmed as BRAF-V600E negative.

For the generation of RNA-sequencing libraries, the NuGEN Trio RNA-Seq System (NuGEN, Redwood City, California) was used. Samples were split equally and processed in independent sequencing steps to allow for the correction of batch effects. Sequencing was performed with paired-end sequencing and two times 100-bp length. Sequences were aligned with HiSAT2 v2.1.0 [62] to the GRCh38.

We investigated WGS and RNA-seq data from 3760 tumor samples representing 24 different types of leukemia and lymphoma (Table 1). This dataset has been collected from routine diagnostics and is the largest collection of hematologic malignancy samples with WGS and matched RNA-seq, which also includes rare disease entities like hairy cell leukemia variant (HCL-V) and chronic lymphoproliferative disorder of natural killer cells. In order to restrict our analysis to putative rare germline and somatic variants, we filtered variants called on WGS with stringent quality filters and population allele frequency to discard artifacts and common germline variants, respectively (Table S2, S3 and S4) [48]. We next annotated the genes using the seven features from the software IntOGen, which include positional recurrence of variants in genome sequence (OncodriveCLUSTL), positional recurrence of variants in protein conformation (HotMAPS), enrichment of variants in functional domains (smRegions), three alternative measures of selection strength inferred from synonymous and nonsynonymous variants (CBaSE, MutPanning, and dNdScv), and OncodriveFML, a method identifying excess of variants across tumors in both coding and non-coding genomic regions [37, 65,66,67,68,69,70,71]. Moreover, we annotated genetic variants falling into gene bodies, including deep intronic variants, with AbSplice-DNA, a tool predicting variants causing aberrant splicing [36]. On the RNA-seq data, we used OUTRIDER on a total of 12,966 protein-coding genes commonly expressed across the dataset to call high or low expression outliers, and FRASER to call splicing outliers [38, 39]. We also introduced a new method, NB-act, to call rare aberrant activation of genes mainly not expressed (Methods). As summarized in Table 2, these methods provide qualitatively complementary evidence for detecting and predicting driver genes. Combining all these results, we established a unique census of genomic and transcriptomic aberrations in 3760 hematologic malignancy samples (Fig. 1).

OUTRIDER filters out genes expressed in less than 5% of samples due to statistical modeling limitations, leaving a gap in detecting rare gene activation. To fill this gap, we developed a complementary algorithm, NB-act (Methods). We applied it to the 6017 rarely expressed protein-coding genes filtered out by OUTRIDER. NB-act identified 10,263 activation outliers among 1623 genes (with a median of 0 and 75% quantile of 2 per sample, Figure S3, Table S7). We observed a notable enrichment for CGC hematologic oncogenes among all activation outliers (Fig. 2C). Here too, restricting to at most three outliers per sample increased the enrichment. Altogether, these analyses provide a unique set of aberrantly expressed genes in hematologic malignancies with strong enrichment for driver genes.