Integrate

Venkatesh

unread,

Jun 23, 1999, 3:00:00 AM6/23/99

to

I have a table of data from which I plot a x-y curve...
Is there any way from the built in functions to estimate the
area under the curve?
I have already used approximation by small intervals and manually
determining the area...
Is there any short & sweet way other than say...
regress the equation(some way? for that?) and
manually inetgrate the x(n) function and plug in values etc etc...??

Regards,

Venkatesh

David J. Braden

unread,

Jun 23, 1999, 3:00:00 AM6/23/99

to

Venkatesh,
The following will give you the integral for a piecewise linear graph of your
data. Doing it through a regression would be *terrible* for most functions.
This is a trapezoidal approximation (thich is exact for the function I described),
but may be the best given the nature of the data.

First, define the ranges (insert-name-define) for the xvals and yvals. While you
are in there, define xnum as =Rows(xvals)-1. Then array-enter this equation:

=SUM((OFFSET(xvals,1,0,Xnum)-OFFSET(xvals,0,0,xnum))*(OFFSET(yvals,1,0,xnum)+OFFSET(yvals,0,0,Xnum)))/2

There you have it.

Dave Braden
TMY Research

David J. Braden

unread,

Jun 23, 1999, 3:00:00 AM6/23/99

to

Ven,
I could see no curve in what you sent me. I don't know what your function is. I
gave you an exact solution for what I understand you to have posted. You wrote
that you had a table without saying you had the underlying functon from which
you could generate it. Caramba!

Options:
(1) Integrate it yourself analytically, if it is feasible and you are good at
this stuff.

(2) Try Mathematica. It's pretty good at these things.

(3) If you are very good at this and *know* there isn't an analytic solution,
it's numerical methods time. Here are ways one-dimensional integrals are
typicaly solved:
(a) Simple brute force: Use what I sent you. Define, say, 500 xvals, get the
corresponding yvals, use my formula (be sure the ranges are correctly defined),
and write down the result. Then repeat this for 750 xvals, write down the
result. Keep adding x-values until the difference between successive integral
evalueations is "small".

(b) More sophisticated algorithms, such as Romberg integration, Gaussian
quadrature and related methods are detailed in Numerical Methods in C; you can
code these up in VBA, perhaps, though they require that you are able to make a
function "visible" to VBA; easy in most languages, but I don't think it can be
done in VBA.

(c) Entirely different approach is to use Monte Carlo simulation to integrate
the function. While it takes a few minutes to run for really tight results,
setup takes about 30 seconds, if you know about MC simulation.

Dave Braden
TMY Research

venky.v...@kvaerner.com wrote:
>
> I appreciate yr answer and the method suggested..
> I will have to try it out..
> But what if you have a curve which is specifically nonlinear the
> Blue bell curve inn the picture you see below
> All the data points are arrived at by some complex mathamatical
> calculations
>
> Thanking you again,
>
>
> Regards,
>
> A.R.Venkatesh

David J. Braden

unread,

Jun 23, 1999, 3:00:00 AM6/23/99

to

Venkatesh,

Here's code for simplistic Monte Carlo integration; it is generic. Better code
would use antithetic variates, but hey, this is free. Here's how to use it:

In one cell of your worksheet, enter =low+(hi-low)*RAND(), where low and high
are the limits of the domain. In another cell, enter your function, or at least
identify some cell where the functional result is. Set the "x" of your function
to the cell that has the RAND formula. Go into Insert - define - name, and name
the cell containing the functional evaluation as "FofX", without quotes.

Enter this code into a VBA module:

Option Explicit

Sub MCintegrate(f As Range, NumIts As Long, ret As Variant)
Dim dbl As Double
Dim lng As Long

If f.Cells.Count > 1 Or NumIts < 2 Then ret = CVErr(xlErrValue): Exit Sub
Application.ScreenUpdating = False
For lng = 1 To NumIts
Application.Calculate
dbl = dbl + f
Next
ret = dbl / NumIts
Application.ScreenUpdating = True
End Sub

Sub test()
Dim ret As Variant, t As Variant, dbl As Double

Const n = 1000000

t = Timer

MCintegrate [FofX], n, ret
dbl = Sqr(1 / n) 'precision is proportional to this
ret = Application.Round(ret, -Int(Int(Log(dbl) / Log(10)) + Log(ret) / Log(10)))

MsgBox "Iterations: " & n & vbCr & _
"Answer: " & ret & vbCr & _
"Precision ~ " & Format(dbl, "0.00E+00") & vbCr & _
"Time: " & Timer - t & " seconds"
End Sub

Set n to as high a number as you would like; you could let this thing run all
night, or on several computers and average the results. do a run for, say,
10000 iterations to get a sense of low long it will take to get 2 significant
digits. The precision os proportional to sqrt(1/n), whereas numerical
integration is typically O(p^(-4)), where p is the number of points. Still, this
is a quick set up.

HTH

sto...@pop.phnx.uswest.net

unread,

Jun 23, 1999, 3:00:00 AM6/23/99

to

Monte Carlo integration? Isn't that kind of like using a chain saw to mow the
lawn? It works, but it sure aint pretty! :-)

Also, if you know something about the function or the shape of the data, can't
you use some kind of weighting function so that it converges faster?

Finally, another option is to use Simpson's Rule. this is similar to the
Trapezoid rule, except that each 3 data points are essentially fit with a
parabola instead of each 2 points being fit with a straight line. It is usually
more efficient than the trapezoid rule.

To use Simpson's rule, you must have 2*n+1 data points (where n is any integer)
and the x values must be equally spaced, the distance between adjacent x data
points being DX. Simpson's rule is then:

Integral=DX/3*(Y_0 + Y_2n + 4*sum(Y_2i+1) + 2*sum(Y_2j+2))

where i ranges from 0 to n-1 and j ranges from 0 to n-2.

To implement this in excel put the first and last Y values (Y_0 and Y_2n) in
column A, put the Y values corresponding to an odd subscript (Y_1, Y_3, Y_5,
...Y_2n-1) in column B, and the Y values corresponding to an even subscript
(Y_2,Y_4,Y_6,...Y_2n-2) in column C. Then the area is given by:

=DX/3*(SUM(columnA)+4*sum(columnB)+2*sum(columnC))

where DX is the spacing between adjacent x data points as mentioned above and
columnA, columnB, and columnC correspond to the appropriate cell ranges in each
column. As you might expect decreasing DX (increasing the number of data points
in a given range) will improve the accuracy of the result.

David J. Braden

unread,

Jun 24, 1999, 3:00:00 AM6/24/99

to sto...@uswest.net

sto...@pop.phnx.uswest.net wrote:
>
> Monte Carlo integration? Isn't that kind of like using a chain saw to mow the
> lawn? It works, but it sure aint pretty! :-)

You aren't kidding, but see below. [more]

>
> Also, if you know something about the function or the shape of the data, can't
> you use some kind of weighting function so that it converges faster?
>

Of course. The point of not using a weighting function (in my area, at least, we
refer to this as importance sampling) was to let this be a simple black box.
What I posted will work for any absolutely continuous function; importance
sampling requires that the user have a *lot* more sophistication in this area
than most people have. Choosing a decent weighting function can't be automated;
it's up to the user.

MC simulation is the only game in town for integrating functions of 4 (maybe
even 3) or more variables, and the code is invariant to the number of variables
one has. In such cases, to use your metaphor, it is a finely tuned lawnmower,
whereas your approach would be the chainsaw, and it would be on the wrong lawn! <bg>

[more]

> Finally, another option is to use Simpson's Rule. this is similar to the
> Trapezoid rule, except that each 3 data points are essentially fit with a
> parabola instead of each 2 points being fit with a straight line. It is usually
> more efficient than the trapezoid rule.

Originally I understood the poster wanted to integrate the function graphed when
graphing a table; I had no idea he knew the function (why is it that that this
sort of stuff comes out in later posts?), given that he was talking about
regressing and integrating the fitted line. Given that, the trapezoidal function
is exact, and Simpson's rule, of course, will be off the mark.

>
> To use Simpson's rule, you must have 2*n+1 data points (where n is any integer)
> and the x values must be equally spaced, the distance between adjacent x data

<snip>

Nice explanation. Messy implementation, though. <g> Seems to me it would be
cleaner to pass the range of the cell containing the function to a VBA routine,
along with an acceptable level of precision, and let the routine do the work
without filling up the worksheet.

One thing that has always bugged me about VB is that I can't pass pointers to
functions. The "aha" I had yesterday was that if I can get the function into a
spreadsheet, I can work off of that instead. That means I can implement a method
with much better properties than Simpson's rule (a relic), which I guess I'll go
ahead and do.

I'll get back to you. Thanks for the input.

Dave Braden
TMY Research

David J. Braden

unread,

Jun 24, 1999, 3:00:00 AM6/24/99

to Venkatesh

Vankatesh,

Here's my final input on this deal, goaded by Stoker. Let's summarize how to
use what I've given you:

To get the integral of the piecewise-linear function that I thought you
initially described, just use the worksheet function I posted; it is exact, and
the quickest way to do what you asked. The x-values need not be equally spaced.

The Monte Carlo simulation I posted was quick to code, and is easy to use, it
just takes a while to run; its advantage over other methods is that it can be
used for high-dimensional integrals with no changes. To Stoker's point, you can
speed convergence with a weighting function, but my point is that you need to
know how to choose that function; it cannot be automated (rather, it hasn't been
done yet; I'll leave that to Stoker <g>). The MC simulation will integrate
functions that you can't do with other methods, and you don't need to change the
code to do so, which is really nice.

What follows is an interactive implementation of the "shrinking" trapezoid
method I suggested. While slower to converge for very smooth functions than
methods relying on orthogonal polynomials, and slightly slower for continuously
differentiable functions than Simpson's method, it is far more robust than the
other methods. Again, I'm trying to give you help that requires the least
knowledge of your function, in which case it is more broadly applicable straight
off the shelf. Most of the code is devoted to error-trapping and reporting.
Change it as you would like, but please keep the original comments, and add more
as needed. It matters to me that the comment re the NG to which this was posted
be maintained.

Dave Braden
TMY Research

Sub IntegrateTrap()
'1999/6/24, David J. Braden
'Posted to microsoft.public.excel.programming
'PLEASE RETAIN ALL COMMENTS

'Usage:
'Set up the function F(X) so that it refers to a cell containing X
'Use Insert Define Name to define the *ranges* Xrng and FofX
'Define the limits as LowX and HighX; they can be values or ranges
'Run the macro; don't expect relative accuracy < 1E-06

Dim dblS As Double, dblOldS As Double, dblPrec As Double, ret As Double
Dim i As Integer
Const intMinIts = 6
Const sTtl = "Integrate"

On Error GoTo BadFunction
If [xrng].Cells.Count + [FofX].Cells.Count <> 2 Then
MsgBox "Names ""Xrng"" or ""FofX"" in your sheet aren't defined as
refences to a single cells", _
vbExclamation, sTtl
Exit Sub
ElseIf Not (IsNumeric([lowx]) And IsNumeric([highx])) Then
MsgBox "LowX or HighX don't evaluate to numbers.", vbExclamation, sTtl
Exit Sub
End If

Application.ScreenUpdating = False

For i = 1 To intMinIts
Trap [xrng], [FofX], [lowx], [highx], i, dblS
Next

Do
dblOldS = dblS
Trap [xrng], [FofX], [lowx], [highx], i, dblS

If dblS = dblOldS Then
MsgBox "F(x) is in cell " & [FofX].Address & vbCr & _
"Low x = " & [lowx] & vbCr & _
"High x =" & [highx] & vbCr & vbCr & _
"Answer: " & dblS, , sTtl
Exit Do
End If

'else
dblPrec = Abs(dblS / dblOldS - 1)
ret = Application.Round(dblS, -Int(Int(Log(dblPrec) / Log(10)) +
Log(dblS) / Log(10)))
If vbNo = MsgBox("F(x) is in cell " & [FofX].Address & vbCr & _
"Low x = " & [lowx] & vbCr & _
"High x =" & [highx] & vbCr & vbCr & _

"Answer: " & ret & vbCr & _

"Iterations: " & i & vbCr & _
"Rel. accuracy: " & Format(dblPrec, "0.0E+00") & vbCr & _
vbCr & "Iterate again?", vbYesNo, sTtl) Then Exit Do
i = i + 1
Loop
GoTo Wrapup

BadFunction:
MsgBox "There's a problem evaluating your function throughout its domain. "
& vbCr & _
"The problematic point is in cell " & [xrng].Address, vbInformation, sTtl

Wrapup:

Application.ScreenUpdating = True
End Sub

Private Sub Trap(rngX As Range, rngF As Range, a As Double, b As Double, n As
Integer, ret As Double)

Static s As Double
Dim it As Integer, j As Integer
Dim dblSum As Double, dblX As Double
Dim dblDel As Double

If n = 1 Then
rngX = a: s = rngF
rngX = b
s = (b - a) * (s + rngF) * 0.5
Else
it = 2 ^ (n - 2)
dblDel = (b - a) / it
dblX = a + 0.5 * dblDel
For j = 1 To it
rngX = dblX
dblSum = dblSum + rngF
dblX = dblX + dblDel
Next
s = 0.5 * (s + (b - a) * dblSum / it)
End If
ret = s
End Sub

sto...@pop.phnx.uswest.net

unread,

Jun 25, 1999, 3:00:00 AM6/25/99

to

David J. Braden wrote:
>
> sto...@pop.phnx.uswest.net wrote:
> >
> > Monte Carlo integration? Isn't that kind of like using a chain saw to mow the
> > lawn? It works, but it sure aint pretty! :-)
>
> You aren't kidding, but see below. [more]
> >
> > Also, if you know something about the function or the shape of the data, can't
> > you use some kind of weighting function so that it converges faster?
> >
> Of course. The point of not using a weighting function (in my area, at least, we
> refer to this as importance sampling) was to let this be a simple black box.
> What I posted will work for any absolutely continuous function; importance
> sampling requires that the user have a *lot* more sophistication in this area
> than most people have. Choosing a decent weighting function can't be automated;
> it's up to the user.

Aah yes! Importance sampling does seem to ring a bell. I can't believe how
much I've forgotten since my chemical engineering statistacal mechanics class,
where we used MC simulations to calculate thermo properties for Lennard-Jones
fluids. I'm sure that's child's play compared to what you're doing.

>
> MC simulation is the only game in town for integrating functions of 4 (maybe
> even 3) or more variables, and the code is invariant to the number of variables
> one has. In such cases, to use your metaphor, it is a finely tuned lawnmower,
> whereas your approach would be the chainsaw, and it would be on the wrong lawn! <bg>
>

I would have to say MC simulation is still the chainsaw, but now your chopping
trees, so the lawnmowers (the trapezoid rule and simpson's rule) are quite
useless.

> [more]
>
> > Finally, another option is to use Simpson's Rule. this is similar to the
> > Trapezoid rule, except that each 3 data points are essentially fit with a
> > parabola instead of each 2 points being fit with a straight line. It is usually
> > more efficient than the trapezoid rule.
>
> Originally I understood the poster wanted to integrate the function graphed when
> graphing a table; I had no idea he knew the function (why is it that that this
> sort of stuff comes out in later posts?), given that he was talking about
> regressing and integrating the fitted line. Given that, the trapezoidal function
> is exact, and Simpson's rule, of course, will be off the mark.

Sure, if your fitting a straight line the trapezoidal function is better
(Simpson's rule will give the same answer, but with more work), but if the
function has some curvature Simpson's rule will do a better job. Although, I'll
admit that for many cases, it may not be worth the extra effort.

>
> >
> > To use Simpson's rule, you must have 2*n+1 data points (where n is any integer)
> > and the x values must be equally spaced, the distance between adjacent x data
>
> <snip>
>
> Nice explanation. Messy implementation, though. <g> Seems to me it would be
> cleaner to pass the range of the cell containing the function to a VBA routine,
> along with an acceptable level of precision, and let the routine do the work
> without filling up the worksheet.
>

Hey, you get what you pay for! <bg> Perhaps I should have used the old
"textbook" routine, "implementation of this method in VBA is left as an
excercise for the reader". :-D

Actually, I'm a newbie to VB and VBA, so I probably wouldn't have done it very
efficiently in VBA either, but if you want I can send you a nifty FORTRAN
version.

Finally, after all this rambling, I'll add something slightly useful. After
digging through an old calculus book, I found the following expressions for the
error bounds for the Trapezoid rule and for Simpson's rule:

E_trap <= M(b-a)^3/(12n^2)

E_simp <= N(b-a)^5/(2880n^4)

where b and a are the upper and lower limits of x, n+1(trapezoid) or 2n+1
(simpson's) is the number of data points, M is the maximum value of |f"(x)| in
range [a,b] and N is the maximum value of |f""(x)| in range [a,b].

If the 2nd and/or 4th derivitaves of the function are known (or can be obtained)
these expressions can be used to determine the value of n that should be used
for a desired level of accuracy. This eliminates the need for doing multiple
iterations with increasing values of n, until the desired accuracy is reached as
suggested by Dave. Back in the days when all these calculations were done by
hand (before my time, fortunately) calculating n from the error bounds was
definitely worthwhile. However, since modern computers can handle the
calculation of hundreds or thousands of data points in seconds, it's probably
not worth the effort, unless you are doing it for fun, or possibly want some
obscene level of accuracy (i.e. E <= 1E-150). Hey, don't laugh for all I know
you could be trying to calculate Pi to the 150th decimal place by integrating
y=4*sqrt(1-x^2) from 0 to 1. OK, maybe you should laugh since for this case M
and N are infinite, so the above expressions are useless and you will have to
figure out how to deal with rounding errors since excel only handles about 15
digits. Well, I'll stop rambling. Good Luck.

David J. Braden

unread,

Jun 25, 1999, 3:00:00 AM6/25/99

to sto...@uswest.net

sto...@pop.phnx.uswest.net wrote:

> Aah yes! Importance sampling does seem to ring a bell. I can't believe how
> much I've forgotten since my chemical engineering statistacal mechanics class,
> where we used MC simulations to calculate thermo properties for Lennard-Jones
> fluids. I'm sure that's child's play compared to what you're doing.

You said it. And nowadays we would tend to use Gibbs sampling, but as you've
forgotten so much...<g>

> Actually, I'm a newbie to VB and VBA, so I probably wouldn't have done it very
> efficiently in VBA either, but if you want I can send you a nifty FORTRAN
> version.

Um, no thanks. I dropped it in 1979.

> If the 2nd and/or 4th derivitaves of the function are known (or can be obtained)
> these expressions can be used to determine the value of n that should be used
> for a desired level of accuracy.

Precisely my point. Funnctions that aren't sufficiently smooth fare worse by
methods that implicitly assume a degree of smoothness. Simpson's or Romberg's
methods will get you *very* quickly to the wrong answer in such cases. The plain
old Trapeziodal algorithm is more robust. Slow, yes. But robust. I wouldn't have
used it in 1976 if I knew my function; but today, in a situation where I might
not, and computers are so darned fast, I think I'll stick with it. Just a thought.

Thanks, Matt, for the input. Seems like we were able to come up with some nice
ideas, and useful code for both the spreadsheet and VBA (though I did most of
the work <g>). Now, I'm anticipating your post of a weighting function for an
unknown integrand <bg>.

Regards,
Dave Braden