General Geospatial Inference with a Population Dynamics Foundation Model | 9am PT, Tues June 10, 2025

28 views

Skip to first unread message

Grigory Bronevetsky

unread,

Jun 9, 2025, 5:41:21 AMJun 9

to ta...@modelingtalks.org

Modeling Talks

General Geospatial Inference with a Population Dynamics Foundation Model

Gautam Prasad, Google Research

Tues, June 10, 2025 | 9am PT

Meet | Youtube Stream

Hi all,

The presentation will be via Meet and all questions will be addressed there. If you cannot attend live, the event will be recorded and can be found afterward at

https://sites.google.com/modelingtalks.org/entry/general-geospatial-inference-with-a-population-dynamics-foundation-model

More information on previous and future talks: https://sites.google.com/modelingtalks.org/entry/home

Abstract:
Supporting the health and well-being of dynamic populations around the world requires governmental agencies, organizations and researchers to understand and reason over complex relationships between human behavior and local contexts in order to identify high-risk groups and strategically allocate limited resources. Traditional approaches to these classes of problems often entail developing manually curated, task-specific features and models to represent human behavior and the natural and built environment, which can be challenging to adapt to new, or even, related tasks. To address this, we introduce a Population Dynamics Foundation Model (PDFM) that aims to capture the relationships between diverse data modalities and is applicable to a broad range of geospatial tasks. We first construct a geo-indexed dataset for postal codes and counties across the United States, capturing rich aggregated information on human behavior from maps, busyness, and aggregated search trends, and environmental factors such as weather and air quality. We then model this data and the complex relationships between locations using a graph neural network, producing embeddings that can be adapted to a wide range of downstream tasks using relatively simple models. We evaluate the effectiveness of our approach by benchmarking it on 27 downstream tasks spanning three distinct domains: health indicators, socioeconomic factors, and environmental measurements. The approach achieves state-of-the-art performance on all 27 geospatial interpolation tasks, and on 25 out of the 27 extrapolation and super-resolution tasks. We combined the PDFM with a state-of-the-art forecasting foundation model, TimesFM, to predict unemployment and poverty, achieving performance that surpasses fully supervised forecasting. The full set of embeddings and sample code are publicly available for researchers.

Bio:

Dr. Gautam Prasad is a Software Engineer in Google Research working on geospatial machine learning including the Population Dynamics Foundation Model and other work related to Factuality in LLMs. His focus is to address health, socioeconomic, environmental, and commercial related problems using novel techniques that leverage unique data sources. Previously, he worked on human related computer vision including emotion recognition, eye tracking, and gesture recognition. Prior to Google he studied brain connectivity patterns in health and disease using MRI and machine learning.

Grigory Bronevetsky

unread,

Jun 14, 2025, 9:42:14 PMJun 14

to ta...@modelingtalks.org

Video: https://youtube.com/live/hNIVHGMScy0

Slides: https://docs.google.com/presentation/d/1yzO3ztnUNRCA0sBRiA0zPO7DXGN5QdtqnCOfyRUpuWk/edit?usp=sharing

pdf: https://drive.google.com/file/d/1lBBstgeWTtm3rjJ2o5M_jDDXfovYLO6K/view?usp=sharing

Summary:

Google’s work on population modeling

2011: research on predicting societal metrics using Google Trends data

E.g. Flu Trends, economic metrics

Challenge: the way people search changes routinely

Models need to be retrained
E.g. Flu Trends model stopped being useful after a few years since it was not refreshed

2023: introduced the symptom search dataset

300 symptoms that affect people globally
Important signal for COVID and Flu tracking globally
Continually re-trained on current search patterns

Current work: broadening work across human behavior domains

WHO people are: demographics, health, wellbeing
WHAT they do: economic, social, consumption
WHY they do it: beliefs, values
WHERE people are: distribution, migration, forced displacement
HOW: environmental interactions, power dynamics
PDFM: Population Dynamics Foundation Model

Example: Diabetes prevalence super-resolution

Given

county-level diabetes prevalence
spatially fine-grained embeddings of population (zip-code)

Train model to predict embeddings to county diabetes prevalence
Use it to infer features at finer resolution such that they add up correctly to county

PDFM structure

Relevant population facts
Train a graph neural network
Produces a 300-dimensional feature vector
Used for: interpolation, extrapolation, supe-resolution, now-casting, forecasting

Datasets:

Aggregated search trends:

Top 1,000 US national search trends on July 2022
Balanced to ensure these are searched across many zip codes
Ignore query text, focus on histogram of counts
Observation: most popular queries capture the major dynamics of more niche dynamics like health symptoms

Aggregated maps places

Top 1,192 points of interest categories from Google Maps in each location in 2024
Represented in >= 5% of zip codes
Aggregated place busyness: 683 metrics

Weather & Air Quality: 45 statistics in July 2022

Trained an auto-encoded graph neural network

Nodes: spatial regions
Edges: distance, correlation data
Loss function: predict the original data based on node’s state and graph neighboring nodes

Embedded vector: 313 dims
Separated into separate sub-losses: Search Trends, Maps&Busyness, Weather/AQI

Forecasting: TimeFM

Transformer-based model trained in many time series
Can effectively predict the future trends of uni-variate time series

Doesn’t incorporate geo-spatial reasoning

PDFM+TimesFM:

Learned an adapter model on top of TimesFM
Take TimesFM prediction for a given zip code
Then learn a model that adjusts the prediction

Evaluation

Benchmarks: health, socioeconomic, environment
EarthEngine geospatial data: nighttime lights, tree coverage
Data commons: aggregates census statistics across the world
Comparison:

Inverse distance weiting: interpolate data at point by interpolating from neary points
SatCLIP: Neural embeddings of satellite data
GeoCLIP: Neural embeddings of geo-tagged personal photography

PDFM-based prediction is best for social metrics, at state of art for environmental metrics
Augmenting PDFM with SatCLIP generally improves performance but in some cases extrapolation performance drops when using both

Applications:

Sust Global: Populous: Trying to predict insurance premiums using AI
CARTO: Cloud-Based Location Intelligence Platform
GroupM: Model to help understand media performance insights
Cooper/Smith: Disease tracking in low-resource environments
UN AI4Good: Housing Prices + Night Time Lights Tutorial
Geospatial reasoning

Reply all

Reply to author

Forward

0 new messages