Application of topological data analysis to multi-resolution matching and anomaly detection
Topology is the study of shapes. Topological data analysis (TDA) is emerging machinery at the interface of algebraic topology, machine learning (ML), and statistics. TDA has shown a high utility in a diverse range of applications, from social studies, to digital
health care, to power systems. While geometrical methods, such as TDA, continue to gain popularity in statistical sciences and ML, from causal inference to deep learning on manifolds, the utility of geometric methods for assessing the spatial characteristics
of Earth science datasets is yet untapped. Topological information on the inherent data shape can provide invaluable insights into the latent data structure and organization, and can serve a leading role in understanding spatiotemporal dynamic patterns of
observations and climate models.
Here, I studied latent shape in temperature maps over the contiguous United States in February, June, and July 2021. The cold wave in February 2021 was an extreme weather event that brought record-breaking temperatures to North America and caused multiple
days of massive blackouts in Texas. From late June through mid-July, an extreme heat wave associated with a strong ridge occurred over Western North America. The main objective is to build a robust and reliable methodology that compares spatial patterns from
different sources and detect anomalous spatial patterns during the extreme temperature events. Specifically, I assessed two temperature datasets, the Modern-Era Retrospective analysis for Research and Applications, version 2 (MERRA-2) reanalysis and Atmospheric
Infrared Sounder (AIRS). By applying cubical complexes, I summarized the shape of these two temperature datasets into persistence diagrams (PDs) and calculated Wasserstein distance between two PDs. My previous work (Orofi-Boateng et al., 2021) shows that Wasserstein
distance represents difference in spatial patterns and can replace conventional metrics, such as a bias and root-mean-square-deviation (RMSD).
To the best of my knowledge, there is no quantitative metric to measure difference in spatial patterns between Earth science datasets at different spatial resolutions. In my work, PDs summarized temperature maps during the extreme cold and heat waves.
Applying TDA to observational and model datasets has enormous potential, because we can also analyze key spatial structures in the three dimensional data from sounders and compare them with climate models. In addition, the Wasserstein distance can offer game-changing
capabilities for self organizing maps (SOMs), which are one of the most widely used ML tools among atmospheric scientists.
---
Jana Asher (she/her/hers)
Assistant Professor/Director of Statistics Education
Department of Mathematics and Statistics
Service-Learning Associate
Office for Community-Engaged Learning
Slippery Rock University
---
Elected Member of the International Statistical Institute (ISI)
Member, International Association for Statistics Education
Join IASE @
https://iase-web.org/Membership.php
|
 
|