7.1 Modeling With Differential Equations Practice

0 views

Skip to first unread message

Darios Uclaray

unread,

Aug 5, 2024, 12:36:40 PM8/5/24

to tabcibackness

Wenow move into one of the main applications of differential equations both in this class and in general. Modeling is the process of writing a differential equation to describe a physical situation. Almost all of the differential equations that you will use in your job (for the engineers out there in the audience) are there because somebody, at some time, modeled a situation to come up with the differential equation that you are using.

This section is not intended to completely teach you how to go about modeling all physical situations. A whole course could be devoted to the subject of modeling and still not cover everything! This section is designed to introduce you to the process of modeling and show you what is involved in modeling. We will look at three different situations in this section : Mixing Problems, Population Problems, and Falling Objects.

In all of these situations we will be forced to make assumptions that do not accurately depict reality in most cases, but without them the problems would be very difficult and beyond the scope of this discussion (and the course in most cases to be honest).

So, we first need to determine the concentration of the salt in the water exiting the tank. Since we are assuming a uniform concentration of salt in the tank the concentration at any point in the tank and hence in the water exiting is given by,

As you can surely see, these problems can get quite complicated if you want them to. Take the last example. A more realistic situation would be that once the pollution dropped below some predetermined point the polluted runoff would, in all likelihood, be allowed to flow back in and then the whole process would repeat itself. So, realistically, there should be at least one more IVP in the process.

So, just how does this tripling come into play? Well, we should also note that without knowing \(r\) we will have a difficult time solving the IVP completely. We will use the fact that the population triples in two weeks time to help us find \(r\). In the absence of outside factors the differential equation would become.

Okay, if you think about it we actually have two situations here. The initial phase in which the mass is rising in the air and the second phase when the mass is on its way down. We will need to examine both situations and set up an IVP for each. We will do this simultaneously. Here are the forces that are acting on the object on the way up and on the way down.

In the second IVP, the \(t\)0 is the time when the object is at the highest point and is ready to start on the way down. Note that at this time the velocity would be zero. Also note that the initial condition of the first differential equation will have to be negative since the initial velocity is upward.

Okay, we want the velocity of the ball when it hits the ground. Of course we need to know when it hits the ground before we can ask this. In order to find this we will need to find the position function. This is easy enough to do.

Now, in this case, when the object is moving upwards the velocity is negative. However, because of the \(v^2\) in the air resistance we do not need to add in a minus sign this time to make sure the air resistance is positive as it should be given that it is a downwards acting force.

Also, the solution process for these will be a little more involved than the previous example as neither of the differential equations are linear. They are both separable differential equations however.

The problem arises when you go to remove the absolute value bars. In order to do the problem they do need to be removed. This is where most of the students made their mistake. Because they had forgotten about the convention and the direction of motion they just dropped the absolute value bars to get.

So, why is this incorrect? Well remember that the convention is that positive is upward. However in this case the object is moving downward and so \(v\) is negative! Upon dropping the absolute value bars the air resistance became a negative force and hence was acting in the downward direction!

Outside of one dimension, there is (to the best of my knowledge) no direct solution to the transport problem. That means that we need to solve our own. Thankfully, the glorious Youssef Marzouk and a bunch of his collaborators have spent some quality time mapping out this idea. A really nice survey of their results can be found in this paper.

A really clever idea, which is related to normalising flows, is to ask what if, instead of looking for a single14 map \(S(x) = T^-1(x)\), we tried to find a sequence of maps \(S(x,t)\) that smoothly move from the identity map to to the transport map.

This seems like it would be a harder problem. And it is. You need to make an infinite number of maps. But the saving grace is that as \(t\) changes slightly, the map \(S(\cdot, t)\) is also only going to change slightly. This means that we can parameterise the change relatively simply.

The problem is specified with \(n\) data points \((t_1, x_1), \ldots, (t_n, x_n)\) and the aim is to find the value of \(f\) that best fits the data. The traditional choice is to minimise the mean-square error \[ \theta = \arg \min_\theta \sum_i=1^n \left(y_i - \mathcalF(f)(t_i,x_i)\right)^2.\]

Now every single one of you will know immediately that this question is both vague and ill-posed. There are many functions \(f\) that will fit the data. This means that we need to enforce18 some sort of complexity penalty on \(f\). This leads to the method known as Tikhonov regularisation19 \[ \theta = \arg \min_\theta \in B \sum_i=1^n \left(y_i - \mathcalF(f)(t_i,x_i)\right)^2 + \lambda\f\_B^2, \] where \(B\) is some Banach space and \(\lambda>0\) is some tuning parameter.

As with all Bayesianifications, we just need to turn the above into a likelihood and a prior. Easy. Well, the likelihood part, at least, is easy. If we want to line up with Tikhonov regularisation, we can choose a Gaussian likelihood \[y_i \mid f, x_i, t_i, \sigma \sim N(\mathcalF(f)(t_i,x_i), \sigma^2).\]

This is familiar to statisticians, the forward model is essentially working as a non-standard link function in a generalised linear model. There are two big practical differences. The first one is that \(\mathcalF\) is very non-linear and almost certainly not monotone. The second problem is that evaluations of \(\mathcalF\) are typically very21 expensive. For instance, you may need to solve a system of differential equations. This means that any computational method22 is going to need to minimise the number of likelihood evaluations.

Firstly, the vector field \(f\) directly effects how easy the differential equations are to solve. This means that if \(f\) is too complicated, it can take a long time to both train the model and generate samples from the trained model. To get around this you need to put fairly strict penalties30 and/or structural assumptions on \(f\).

Diffusion models fix these two aspects of normalising flows at the cost of both a more complex mathematical formulation and some inexactness32 around the base distribution \(q\) when generating new samples.

There are a number of diffusions that are familiar in statistics and machine learning. The most famous one is probably the Langevin diffusion \[dX_t = \frac12\nabla \log p(x) dt + \sigma dW_t,\] which is asymptotically distributed according to \(p\). This forms the basis of a bunch of MCMC methods as well as some faster, less adjusted methods.

The stationary distribution of \(X_t\) is \(X_\infty \sim N(0, \sigma^2I)\), where \(I\) is the identity matrix. In fact, if we start the diffusion at stationarity by setting \[X_0 \sim N(0, \sigma^2I),\] then X_t is a stationary Gaussian process with covariance function \[c(t, t') = \sigma^2e^I.\]

More interestingly in our context, however, is what happens if we start the diffusion from a fixed point \(x\), that will eventually be a sample from \(p(x)\). In that case, we can solve the linear stochastic differential equation exactly to get \[X_t = xe^-\frac12t + \sigma \int_0^t e^\frac12(s-t)\,dW_s,\] where the integral on the right hand side can be interpreted34 as a white noise integral and so \[X_t \sim N\left(xe^-t, \sigma^2\int_0^t e^s-t\,dt\right),\] and the variance is \[\sigma^2\int_0^t e^s-t\,dt = \sigma^2 e^-t\frac12\left(e^t - 1\right) = \sigma^2(1-e^-t).\] From these equations, we see that the mean of the diffusion hurtles exponentially fast towards zero and the variance moves at the same speed towards \(\sigma^2\).

More importantly, this means that, given a starting point \(X_0 = x\), we can generate data from any part of the diffusion \(X_t\)! If we want a sequence of observations from the same trajectory, we can generate them sequentially using the fact that and OU process is a Markov35 process. This means that we are no longer limited to information at just two points along the trajectory.

The twist is that the new diffusion process is going to be quite a bit more complex than the original one. The problem is that unless \(X_0\) comes from a Gaussian distribution, this process will be non-Gaussian, and thus somewhat tricky to find the reverse trajectory of.

To see this, consider \(s>t\) and recall that \[p(X_0, X_t, X_s) = p(X_s \mid X_t)p(X_t \mid X_0)p(X_0)\] and \[p(X_t, X_s) = \int_\mathbbR^d p(X_s \mid X_t) p(X_t \mid X_0) p(X_0)\,dX_0.\] The first two terms in that integrand are Gaussian densities and thus their product is a bivariate Gaussian density \[X_t, X_s \mid X_0 \sim N\left(X_0\beginpmatrixe^-\fract2\\e^-\fracs2\endpmatrix, \sigma^2 \beginpmatrix 1 & e^-\fracs-t2 - e^-\fracs+t2 \\ e^-\fracs-t2 - e^-\fracs+t2 & 1\endpmatrix\right).\] Unfortunately, as \(X_0\) is not Gaussian, the marginal distribution will be non-Gaussian. This means that our reverse time transition density \[p(X_t \mid X_s) = \frac \int_\mathbbR^d p(X_t,X_s \mid X_0) p(X_0)\,dX_0 \int_\mathbbR^d p(X_s \mid X_t) p(X_0)\,dX_0\] is also going to be very non-linear.