Simple Numerical Problems

0 views

Skip to first unread message

Rosette Allaband

unread,

Aug 5, 2024, 7:26:34 AM8/5/24

to neysocoper

NumericalOptimization is one of the central techniques in Machine Learning. For many problems it is hard to figure out the best solution directly, but it is relatively easy to set up a loss function that measures how good a solution is - and then minimize the parameters of that function to find the solution.

I ended up writing a bunch of numerical optimization routines back when I was first trying to learnjavascript.Since I had all this code lying around anyway, I thought that it might be fun to provide someinteractive visualizations of how these algorithms work.

The cool thing about this post is that the code is all running in the browser, meaning you caninteractively set hyper-parameters for each algorithm, change the initial location, and change what function is being called to get a better sense of how these algorithms work.

To overcome these problems, the Nelder-Mead methoddynamically adjusts the step size based off the loss of the new point. If the new point is better than any previously seen value, it expands the step size to accelerate towards the bottom. Likewise if the new point is worse it contracts the step size to converge around the minima.

Click anywhere in this graph to restart with a new initial location. This method will generate a triangleat that spot and then flip flop towards the minima at each iteration, expandingor contracting as necessary according to the settings.

While this method is incredibly simple, itactually works fairly well on low dimensional functions. It can even minimize non-differentiablefunctions like $ f\left(x\right) = \left\left\lfloor x \right\rfloor - 50\right $, which allof the rest of the methods I'm going to talk about would fail at.

One possible direction to go is to figure out what the gradient $\nabla F(X_n) $ is at the current point, and take a step down the gradient towards the minimum. The gradient can be calculated by symbolically differentiating the loss function, or by using automatic differentiation likeTorchand TensorFlow does. Using a fixed step size $\alpha$ this means updating the current point $X_n$ by:

A line search can modify thelearning rate at each iteration so that both the loss always is descending, which prevents it from overshootingthe minima - and also making sure that the gradient is flattening out sufficiently , whichprevents taking too many tiny steps. Enabling line search here leads tofewer iterations with the downside that each iteration has might have to sample extra functionpoints.

The Conjugate Gradient method tries to estimate the curvature of the function being minimized byincluding the previous search direction with the current gradient to come up with a new better searchdirection.

While the theory behind this might be a little involved, the math is pretty simple.The initial search direction $ S_0 $ is the same as in gradient descent. Subsequent search directions $S_n$ are computed by:

You can see the progress of this method below. The actual direction taken is in red, with the gradients at each iteration beingrepresented by a yellow arrow. In certain cases the search direction being used is almost 90 degrees to the gradient, which explains why Gradient Descent had such problems on this function:

The challenge here is to convert a matrix of distances between some pointsinto coordinates for each point that best approximate the required distances. One way of doing this is to minimize a function like:

Nocedai and Wright have written an excellentbook on numericaloptimization that wasmy reference for most of this. While it is a great resource, there are a couple of other techniques not coveredthat I quickly wanted to mention.

Sebastian Ruder wrote an excellent overview of gradient descent methodsthat go into more depth, especially in the case of stochastic gradient descent on large sparse models like those used to train deep neural networks.

One cool derivative free optimization method is Bayesian Optimization. Eric Brochu, Mike Vlad Cora and Nando de Freitaswrote a great introduction to Bayesian Optimization. An interesting application of Bayesian Optimization is in hyper-parameter tuning - there is even a company SigOpt that is offeringBayesian Optimization as a service for this purpose.

Looks not too complicated, does it ? Now to the weired thing: I believe that u=Tanh[x/Sqrt[2]] is a solution to Laplacian[u, x] + (1 - u^2)* u == 0 using the boundary condition f[-inf]==-1 && f[inf]==1. Mathematica doesn't seem to be able to deliver me any solution, though. Well, I might be wrong with my guessed solution, so I just plugged it into the equation to see how far off I am.

So that plot looks way too noisy and within tiny numbers that I suspect a numerical problem here (and tanh(x/sqrt(2)) actually being a solution). Is that possible ? Why is that ? Could that also be the problem why Mathematica is unable to solve for u with the said boundary condition ?

Thanks the replies.So is there no way Mathematica can solve this equation for the boundary condition and give me the tanh(x/sqrt(2)) ? Do I have to guess all my solutions for specific boundary problems ? ;)

Even though I have luckily guessed the solution for the 1-dimensional case I was not so lucky with the 2- and 3-dimensional case (where u^2 becomes u.u and the boundary condition at x^2+y^2+z^2==inf being x,y,z/Sqrt[x^2+y^2+z^2]).Is there a way to solve this problem with Mathematica ?

Yes, using spherical coordinates would certainly make sense as the solution must be spherically symmetric. I just wasn't sure how to do it with Mathematica (I'm still a beginner).And to reformulate the boundary condition: Basically the condition should be that vectors at r=inf should have a length=1 and point away from the origin. At the origin (r=0) the vector length=0.How could I formulate it for mathematica ? Though I'm still struggling with the 1D case (see comment to your next post).

In 2- and 3-D the equation $\beginequation\fracd^2u(x)dx^2+(1-u(x)^2) u(x)=0\endequation$ no obvious generalization for a vector valued function $\vecu(\vecx)$ exists because the Laplace operator $\Delta=\nabla\cdot\nabla$ as a gradient of the divergence of a scalar function $u(\vecx)$ will not happen to be applicable. If one applies the divergence to a vector valued function, a matrix valued function appears ...

hmm, i don't really understand what you are saying. But this equation is the Ginzburg-Landau equation which works in any dimension (so u yields a 3-vector for every given x, also a 3-vector). So for 3 dimensions it works on a 3d vector field. the laplacian of a 3d vector field is again a 3d vector field (and the second term is also a 3d vector field to be added). and i know from numerical experiments that there is a solution to it. so what's the problem here ?

Ah okay, but that's somewhat cheating as I have to know the solution in advance already. ;) If I wouldn't have had that lucky guess of tanh(x/sqrt(2)) in the first place there would be no way to derive the solution, i guess ?

Mathematical Skills: Proficiency in basic arithmetic operations (addition, subtraction, multiplication, division), as well as more advanced concepts like percentages, ratios, and algebraic expressions.

A numerical word problem typically presents a scenario or a situation that involves numbers and requires the application of mathematical concepts to solve. These problems are written in a narrative form and often relate to real-life situations. The goal is to extract the relevant numerical information from the text and use appropriate mathematical operations to find a solution.

TechWave Solutions is planning to purchase new laptops and printers for its office. Each laptop costs $800, and each printer costs $150. The company decides to buy twice as many laptops as printers. If the total budget for this purchase is $14,000, how many printers can the company buy?

They are widely used in educational assessments, standardized tests like the SAT or GMAT, and in professional settings, particularly for roles that require strong quantitative and analytical skills. Numerical word problems are commonly found as part of a cognitive ability test and in numerical aptitude tests.

Preparing for numerical word problems, especially for exams or job assessments, involves a combination of building strong foundational math skills, practicing problem-solving techniques, and familiarizing yourself with common types of problems.

Elevate your test preparation experience with our Test Prep Account, where you unlock the potential to master numerical word problems through an extensive library of 190 carefully crafted questions. In total you will have access to more than 700 numerical aptitude questions.

JobTestPrep has decades of experience instructing and preparing job candidates for their pre-employment aptitude tests, including the numerical reasoning test. On this page, you will find a free numerical reasoning test with solutions, tips, and advice. Use these numerical questions to assess your numerical ability ahead of your exam.

We at JobTestPrep find the assessment tests world highly diverse and fascinating. If you are looking to deepen your knowledge in the aptitude tests world, or you want some extra practice before your test, we've got you covered!

This section covers the fundamental numerical reasoning skills and competencies that underlie success on any numerical test: mental arithmetic, basic operations, ratios, percentages, algebra, and more.

Fractions complicate things. In this case, the first step towards a solution is to mentally replace 5/20 with , which is its simplified version. Now, isolate the question mark by multiplying both sides of the equation by the reciprocal of , which is simply 4.

This seemingly simple question can slow many people down. There are mental strategies that can help you solve it (like breaking 198 down into two chunks of 90 with a remainder of 18), but these are often time-consuming.