Summary
Focus: novel ML solutions to help scientists (more on physical and Earth sciences)
Explainable AI
Physic-informed learning
UQ
Large-scale Earthquake detection
Seismic sensors have collected data over data
Developing new algorithms to discover small earthquakes from this data
P-wave: smaller wave at the start of the seismic event
S-wave: larger wave of the full shock
Challenges:
Heterogeneous background noise
Low signal to noise ratio
Data quality/missing data issues
Large data but few labels
Earthquake: regional shaking caused by slips along a fault (rather than trucks or explosions)
Traditional approach: Template matching of waveform patterns of known events
Correlate same pattern across many sites
Challenge: relies on prior knowledge of event shapes
Need a way to detect earthquakes without prior knowledge: find new waveforms
New approach: FAST
Scalable detection of new events, new signal shapes
Assumes limited waveform catalog
Uses domain knowledge: based on waveform similarity matching
Compute similarity metric between waveforms
Cluster time periods with similar waveforms at similar times in correlated regions
Eliminated repeating signals that are not individual events (e.g. background car traffic)
Spectral representation of data using wavelets
Computationally efficient, few false detections
Key takeaways:
Scientific datasets have limitations and reflect current knowledge (e.g. instrument technology)
Methods in ML literature may not reflect real data/tasks
Opportunities for developing new methods
Scientific knowledge&physical models can improve the data analysis: bias model towards known physics (e.g., look for correlated patterns at different stations)
Interdisciplinary collaboration is critical
Machine learning for data-driven discovery in solid Earth geoscience (2019): https://www.science.org/doi/10.1126/science.aau0323
Lack of large high quality labeled datasets
Limited sharing of research codes and data
Difference in research cultures hinders collaboration (shared language, presentation venues)
Data analysis needs of geoscience
Scientific discovery in the age of artificial intelligence (2023): https://www.nature.com/articles/s41586-023-06221-2
Broader adoption, new uses of ML
Data collection/representation
Generating scientific hypothesis
ML-driven simulation and experimentation
ML toolbox is expanding
Open Science and data sharing
Seisbench: https://github.com/seisbench/seisbench
EQTransformer: https://github.com/smousavi05/EQTransformer
Stanford Earthquake dataset: https://github.com/smousavi05/STEAD
INSTANCE: https://essd.copernicus.org/articles/13/5509/2021/
New Conference for ML + Science
Challenges:
Poor data quality, multimodal, multisource
Lack of theory for out of distribution, available constraints from domain knowledge
Trust/Uncertainty Quantification/Model Explainability/Insights
Explainable AI for Scientific Data
Hard to gain insights from analyses of scientific data
Deep learning/data-driven models
May learn non-physical solutions
No uncertainty estimation
Black box - no explanation
SciAI center: interpretable ML architectures: https://sciaicenter.engineering.cornell.edu/
Instance-based explanations by design
Relate a new test observation to a set of prototypical examples
Prototypical examples are learned, part of neural network architecture
Approach:
Take a predictive neural network
Replace final fully connected layer with a prototype layer
Encoder or input data (e.g. image)
Compare input to encoded prototypes
Output softmax of similarity (like attention)
Prototypes as template waveforms
Prototype-based Joint Embedding Method (PB&J):
Sample prototypes from the training data
Identifies which training instances resulted in conflicting predictions
Explicit representation of model confidence
Generates ensemble of predictions: estimate model confidence with explanations for ambiguous cases
ML and the Cryosphere
A Variational LSTM Emulator of Sea Level Contribution From the Antarctic Ice Sheet (2023)
Question: How much will Antarctic or Greenland ice sheet contribute to sealevel rise
Typical approach: climate simulation
Can ML emulate these models (surrogates)?
Approach: Variational LSTM emulator of sea level contributions from Antarctica
Dataset: models in ISMIP6: https://climate-cryosphere.org/about-ismip6/
Emulators: smaller-scale approximations of fully detailed models
LSTM-based model outperforms Gaussian Process emulator
Ensemble and mean & distribution
Uncertainty via random dropout
Computational advantage allows richer set of forcings
How will changing climate affect navigability of Arctic ocean
New potential shipping routes
Most pass through narrow straits that are too small to resolve in climate models
Approach: ML-based downscaling
100km -> 25km resolution
Superresolution GAN
ERA5 reanalysis data: https://www.ecmwf.int/en/forecasts/dataset/ecmwf-reanalysis-v5
Physics-constrained - ice volume conserved across resolutions
Result: outperforms standard (interpolation-based) techniques
Future: apply to CMIP6
New England community-driven Coastal Climate Research and Solutions (3CRS): https://www.3crs.org/
Working to downscale models to New England region