Summary:
Google’s work on population modeling
2011: research on predicting societal metrics using Google Trends data
E.g. Flu Trends, economic metrics
Challenge: the way people search changes routinely
Models need to be retrained
E.g. Flu Trends model stopped being useful after a few years since it was not refreshed
2023: introduced the symptom search dataset
300 symptoms that affect people globally
Important signal for COVID and Flu tracking globally
Continually re-trained on current search patterns
Current work: broadening work across human behavior domains
WHO people are: demographics, health, wellbeing
WHAT they do: economic, social, consumption
WHY they do it: beliefs, values
WHERE people are: distribution, migration, forced displacement
HOW: environmental interactions, power dynamics
PDFM: Population Dynamics Foundation Model
Example: Diabetes prevalence super-resolution
Given
county-level diabetes prevalence
spatially fine-grained embeddings of population (zip-code)
Train model to predict embeddings to county diabetes prevalence
Use it to infer features at finer resolution such that they add up correctly to county
PDFM structure
Relevant population facts
Train a graph neural network
Produces a 300-dimensional feature vector
Used for: interpolation, extrapolation, supe-resolution, now-casting, forecasting
Datasets:
Aggregated search trends:
Top 1,000 US national search trends on July 2022
Balanced to ensure these are searched across many zip codes
Ignore query text, focus on histogram of counts
Observation: most popular queries capture the major dynamics of more niche dynamics like health symptoms
Aggregated maps places
Top 1,192 points of interest categories from Google Maps in each location in 2024
Represented in >= 5% of zip codes
Aggregated place busyness: 683 metrics
Weather & Air Quality: 45 statistics in July 2022
Trained an auto-encoded graph neural network
Nodes: spatial regions
Edges: distance, correlation data
Loss function: predict the original data based on node’s state and graph neighboring nodes
Embedded vector: 313 dims
Separated into separate sub-losses: Search Trends, Maps&Busyness, Weather/AQI
Forecasting: TimeFM
Transformer-based model trained in many time series
Can effectively predict the future trends of uni-variate time series
Doesn’t incorporate geo-spatial reasoning
PDFM+TimesFM:
Learned an adapter model on top of TimesFM
Take TimesFM prediction for a given zip code
Then learn a model that adjusts the prediction
Evaluation
Benchmarks: health, socioeconomic, environment
EarthEngine geospatial data: nighttime lights, tree coverage
Data commons: aggregates census statistics across the world
Comparison:
Inverse distance weiting: interpolate data at point by interpolating from neary points
SatCLIP: Neural embeddings of satellite data
GeoCLIP: Neural embeddings of geo-tagged personal photography
PDFM-based prediction is best for social metrics, at state of art for environmental metrics
Augmenting PDFM with SatCLIP generally improves performance but in some cases extrapolation performance drops when using both
Applications:
Sust Global: Populous: Trying to predict insurance premiums using AI
CARTO: Cloud-Based Location Intelligence Platform
GroupM: Model to help understand media performance insights
Cooper/Smith: Disease tracking in low-resource environments
UN AI4Good: Housing Prices + Night Time Lights Tutorial
Geospatial reasoning