Summary
Focus: Operator learning and how it is useful for scientific computing
Motivation:
Goal is modeling fluids, materials, weather
Operator learning applications:
Speed up expensive simulations
Model unknown dynamics from data
Setting:
Map between separable Banach spaces
Function -> Function
A function represents a mapping from some coordinate space to the state of the world at points in that space
Goal is to approximate the function->function mapping to minimize approximation error
Example:
Semi-discrete heat equation (discretized in time, not space)
Forward Euler formulation loses stability when represented in as function transform
Backward Euler is stable, choice of approximation parameters is independent of the parameters of the discretization
Design of operator learning architecture has same type of design choice
Goal: discretization invariance:
decouple cost from discretization
use information at different discretizations
transfer learn across discretizations
Architectures and Approximation Theory
Reduced order modeling:
Idea:
Encode original function into a low-dim approximation,
Transform this low-dim approximation to another function,
Then decode back into original representation
Goal: make the low-dim approximation/transform accurate: approximately commutes with original transform
Universal approximation is possible in this framework
Non-linear instantiation:
Encode original function using PCA -> main eigenfunctions of input function
Decode via inverse PCA
Low-dim transform: neural net
Can prove that we can achieve any level of accuracy given enough eigenfunctions to decompose to via PCA
Challenge: for this linear transform the number of eigenfunctions needed grow exponentially; need a non-linear approximation
Non-linear instantiation:
Sequence of neural layers
Kernel transforms data into reduced form
Apply linear weights
Push through non-linear operator (e.g. sigmoid, tanh is normal neural nets)
Approximation is more non-linear and efficient
Challenge: choice of kernel
Many kernels directly imply a data representation,
E.g. CNNs impose a specific grid
More flexible:
Transforms: fourier, circle harmonics, wavelets, Laplace-Beltrami
Adaptive meshing / multipole
Allow selective discretization that uses different levels of approximation in different spatial regions
Approximation
Can show that for each architecture there is some bad map that requires exponentially many parameters
So worst case is bad but what can we approximate successfully?
For each approximation method try to find the space of functions the method can approximate efficiently (polynomially many parameters)
Hard to characterize this space but can show it is non-empty
E.g. Navier-Stokes model of incompressible fluids
Can prove that approximating this requires polynomially many parameters
Data complexity
Instantiate the framework to encoder that uses a differentiable function to sample the original transform and encode the data, then decode it
E.g. finite sampler
Can prove that in the worst case the number of samples required for an approximation grows exponentially
But can show that if the transformation approximation requires polynomially many parameters, the data approximator will need polynomially many samples
Applications
3D RANS Simulations
Training: 500 converged simulations, 5,000,000 Reynolds number
Map: inlet velocity to wall shear stress
Used Geometry-Informed Neural Operator to approximate simulation efficiently
Weather modeling
Used ERA5 Reanalysis from ECMWF
1979-2018 - 1 hour intervals
721x1440 equiangular grid
Parameterized using spherical harmonics
Matches accuracy of physics model but with lower cost