Summary:
Focus: Geospatial foundation models on Earth and Mars
Data required to support decision-making: agriculture, ecosystems, disasters
Common problem format:
Input:
Sparse observational data of surface properties (on-ground measurements, survey data, manual annotations)
Wall-to-wall remote-sensing data
Output: wall-to-wall maps of inferred surface features
Challenge: creating end-to-end pipelines to do this inference for many features of interest to stakeholders
Foundation models make these processing tasks much simpler
Compress multi-modal/multi-sensor observations into a compact latent space
Can create predictive models of surface features given these latent vectors
Very expensive to create, very cheap to use
Need to be as flexible as possible to accommodate diverse user-cases
Have developed family of Earth Foundation models
Presto: Pre-trained remote sensing transformer
Upto 5 sensors/data sources: 15 dynamics channels + 5 static variables
Location, elevation, dynamic world, precipitation, Sentinel-1, Sentinel-2 RGB
Globally diverse data
Flexible to missing data points via random masking
Fairly small model: .4M params.
Easy to routinely run for individual teams
Fast to fine-tune for individual tasks
Challenge: does not incorporate spatial inputs
Galileo: Global and Local Flexible Earth Observation models
Natively handles different shapes
Upto 9 sensors/data sources
Flexible input shape
Combination of masked reconstruction and contrastics learning
Good performance across all scales e.g. regional (coast line, forest) and local (tree, cow)
Global loss with variable exit encoder (model can use latent features from earlier levels of the encoding stack; 2 exit points)
Local loss with shallow encodings
Smaller, computationally efficient models than others with comparable performance
Challenges: unstable training, hard to use
OlmoEarth: Stable Latent Image Modeling for Multimodal Earth Observation
Modeling innovation
Extensive improvement in testing and analysis
Upto 9 sensors/data sources, including maps and derived products in addition to raw observations
More stable training by replacing a learned projection with a random projection
Contrastive loss focuses on hard negatives to ensure model can’t leak information across modalities
Accessible platform: olmoearth.allenai.org
Mars Foundation Models
Far less data: modalities, variable resolution, time period, spatial coverage
Approach: task arithmetic
Independent model for each sensor
Combine the outputs of the models via adding
MOMO: Mars Orbital Model
Novel strategy:
Train sub-models until they have similar loss values
Then take those model checkpoints and combine them into single model
Observations:
Multi-modal learning is a lot more complex than just adding more modalities
Simple model architectures lead to naive solutions
Pre-training objectives should incorporate data structure and complementarity
Local vs global complementarity
Multi-modal: different views of same location
Multi-model pretraining depends on missingness of the data