fit adjust only first parameter

Gabriel Zachmann

unread,

Mar 14, 2000, 3:00:00 AM3/14/00

to

I am trying to fit some time series data (with gnuplot 3.7.1).
It seems to me that gnuplot only adjusts the first parameter
of the fitting function.
This seems to happen both for
fit f(x) 'foo.dat' using 1:2 via 'par.fit'
as well as
fit f(x) 'foo.dat' using 1:2 via a,b,c,d

Below, you can find my .plt file and the .dat file.
They are called "gewicht.*".
The log file of the fitting is attached at "gewicht.log".

The funny thing is:
when I replace every time datum in "gewicht.dat"
by its line number, say, then fitting seems to work fine!
For your reference, I enclose the appropriate files as q.plt and q.dat.

Does anybody have any idea what's going on?
Do I make a mistake?
(I just upgraded to gnuplot 3.7.1 because I first tried this with 3.7,
but then found out via dejanews that 3.7 had a bug with fitting time
data ...)

Any hints, sugestions, insights will be highly appreciated!
Gab.

Encl:

---- snip --- "gewicht.plt"
#set terminal postscript portrait "Helvetica" 14
#set output "gewicht.eps"
set terminal dumb 132 23

set xdata time
set timefmt "%d.%m.%y"

set ylabel "Gewicht / g"
set data style lines

f(x) = a*x**3 + b*x**2 + c*x + d

#fit f(x) "gewicht.dat" using 1:2 via "gewicht.fit"
#update "gewicht.fit"
fit f(x) "gewicht.dat" using 1:2 via a, b, c, d

plot "gewicht.dat" using 1:2 title "Mirjam", \
f(x) title "fit"
---- snip --- "gewicht.plt"
---- snip --- "gewicht.dat"
3.8.99 2960
6.8.99 2720
11.8.99 2830
16.8.99 3050
22.8.99 3290
29.8.99 3600
5.9.99 3850
12.9.99 4050
19.9.99 4280
26.9.99 4450
3.10.99 4615
10.10.99 4725
18.10.99 4825
24.10.99 4930
27.10.99 5000
31.10.99 5050
4.11.99 5160
8.11.99 5230
15.11.99 5265
18.11.99 5340
21.11.99 5435
24.11.99 5500
29.11.99 5565
6.12.99 5700
20.12.99 5770
26.12.99 5900
3.1.00 5970
5.1.00 6025
10.1.00 6125
11.1.00 6140
16.1.00 6190
19.1.00 6220
22.1.00 6250
28.1.00 6330
3.2.00 6380
4.2.00 6420
14.2.00 6490
23.2.00 6480
24.2.00 6515
2.3.00 6525
4.3.00 6610
6.3.00 6660
10.3.00 6670
---- snip --- "gewicht.dat"
---- snip --- "gewicht.log"
*******************************************************************************
Tue Mar 14 16:07:35 2000

FIT: data read from "gewicht.dat" using 1:2
#datapoints = 43
residuals are weighted equally (unit weight)

function used for fitting: f(x)
fitted parameters initialized with current variable values

Iteration 0
WSSR : 2.30136e+43 delta(WSSR)/WSSR : 0
delta(WSSR) : 0 limit for stopping : 1e-05
lambda : 3.65787e+20

initial set of free parameter values

a = 1
b = 1
c = 1
d = 1

After 4 iterations the fit converged.
final sum of squares of residuals : 1.57111e+28
rel. change during last iteration : -2.349e-07

degrees of freedom (ndf) : 39
rms of residuals (stdfit) = sqrt(WSSR/ndf) : 2.00711e+13
variance of residuals (reduced chisquare) = WSSR/ndf : 4.02849e+26

Final set of parameters Asymptotic Standard Error
======================= ==========================

a = 8.4192e-08 +/- 1.655e-08 (19.65%)
b = 1 +/- 0.1672 (16.72%)
c = 1 +/- 1.014e+06 (1.014e+08%)
d = 1 +/- 3.061e+12 (3.061e+14%)

correlation matrix of the fit parameters:

a b c d
a 1.000
b 0.847 1.000
c -0.515 -0.085 1.000
d -0.193 -0.287 0.218 1.000
---- snip --- "gewicht.log"
---- snip --- "q.plt"
#set terminal postscript portrait "Helvetica" 14
#set output "q.eps"

set terminal dumb 132 23

set ylabel "Gewicht / g"
set data style lines

f(x) = a*x**3 + b*x**2 + c*x + d

#fit f(x) "q.dat" using 1:2 via "q.fit"
#update "q.fit"
fit f(x) "q.dat" using 1:2 via a, b, c, d

plot "q.dat" using 1:2 title "q", \
f(x) title "fit"

---- snip --- "q.plt"
---- snip --- "q.dat"
2 2960
3 2720
4 2830
5 3050
6 3290
7 3600
8 3850
9 4050
10 4280
11 4450
12 4615
13 4725
14 4825
15 4930
16 5000
17 5050
18 5160
19 5230
20 5265
21 5340
22 5435
23 5500
24 5565
25 5700
26 5770
27 5900
28 5970
29 6025
30 6125
31 6140
32 6190
33 6220
34 6250
35 6330
36 6380
37 6420
38 6490
39 6480
40 6515
41 6525
42 6610
43 6660
44 6670
---- snip --- "q.dat"

--
/---------------------------------------------------------------------\
| What if you slept? And what if, in your sleep, you dreamed? |
| And what if, in your dream, you went to heaven and there plucked a |
| strange and beautiful flower? And what if, when you awoke, |
| you had the flower in your hand? Ah, what then? (Coleridge) |
| |
| mailto:za...@igd.fhg.de __@/' mailto:Gabriel....@gmx.net |
| http://www.igd.fhg.de/~zach |
\---------------------------------------------------------------------/

Hans-Bernhard Broeker

unread,

Mar 14, 2000, 3:00:00 AM3/14/00

to

Gabriel Zachmann <za...@igd.fhg.de> wrote:

> I am trying to fit some time series data (with gnuplot 3.7.1).
> It seems to me that gnuplot only adjusts the first parameter
> of the fitting function.

In your special case: yes. The problem is that you're using 'fit'
badly, by giving it startup values for the parameters that are *way*
off the actual solution, and seeking a solution that has parameters of
vastly different magnitude.

You don't provide any startup values, actually, making 'fit' default
to 1.0, for all of them. Now, time is counted in seconds since
1/1/2000, in gnuplot, and so your smallest datapoint has an 'x'
equivalent to -4*30*24*60*60 seconds (-4 months), or 10^7 seconds.

For such arguments, your polynomial starts out with values of
something like 10^21, whereas your data are only of ther order of 50.
See the enormous discrepancy?

A solution can be found by slightly modifying the fit and plot
commands, respectively:

plot 'gewicht.dat' u 1:2, f(x/1e7)
fit f(x/1e7) 'gewicht.dat' u 1:2 via a, b, c, d
replot

This solves both above-mentioned problems in one go: startup values of
'1' make at least some sense now, and the final parameters all stay in
the same general magnitude:

a = 331.282 +/- 77.53 (23.4%)
b = -486.669 +/- 89.64 (18.42%)
c = 1317.98 +/- 37.11 (2.816%)
d = 5994.87 +/- 19.59 (0.3268%)

Even with almost perfect startup values, the direct fit would have had
severe problems, due to the large differences of the un-modified
solution parameters (found by guidance of the above result, for the
startup values):

a = 4.04969e-19 +/- 8.222e-20 (20.3%)
b = -4.47735e-12 +/- 9.506e-13 (21.23%)
c = 0.000127959 +/- 3.936e-06 (3.076%)
d = 6011.87 +/- 20.78 (0.3456%)

That's 22 orders of magnitudes in the size of parameters, much more
than 'fit' is able to cope with (about sqrt(DPL_EPS), i.e. 8 orders of
magnitudes).

> The funny thing is:
> when I replace every time datum in "gewicht.dat"
> by its line number, say, then fitting seems to work fine!

That's because you effectively divided x by one million, that way,
much like I do in the trick above.
--
Hans-Bernhard Broeker (bro...@physik.rwth-aachen.de)
Even if all the snow were burnt, ashes would remain.

Gabriel Zachmann

unread,

Mar 15, 2000, 3:00:00 AM3/15/00

to

On 14 Mar 2000 17:09:52 GMT, Hans-Bernhard Broeker <bro...@acp3bf.physik.rwth-aachen.de> wrote:

" Gabriel Zachmann <za...@igd.fhg.de> wrote:
"
" > I am trying to fit some time series data (with gnuplot 3.7.1).
" > It seems to me that gnuplot only adjusts the first parameter
" > of the fitting function.
"

" In your special case: yes. The problem is that you're using 'fit'
" badly, by giving it startup values for the parameters that are *way*
" off the actual solution, and seeking a solution that has parameters of
" vastly different magnitude.
"
" You don't provide any startup values, actually, making 'fit' default
" to 1.0, for all of them. Now, time is counted in seconds since
" 1/1/2000, in gnuplot, and so your smallest datapoint has an 'x'
" equivalent to -4*30*24*60*60 seconds (-4 months), or 10^7 seconds.
"
" For such arguments, your polynomial starts out with values of
" something like 10^21, whereas your data are only of ther order of 50.
" See the enormous discrepancy?

I see. Now that you've explained it it seems obvious ... ;-)

Cool! I even get the same parameter values as you got below!
[of course, they should be the same, still ... ;-) ]

Thank you so very much!
(Especially for the detailed explanation and the quick response!)

Unfortunately, I've got another question now:
it seems that the temporary range feature of 'fit' doesn't work for me.
When I say

fit ["16.6.99" : *] g(x) "gewicht.dat" using 1:2 via "gewicht.fit"

the parameters found by fit are exactly the same as those without a
temporary range ....
(I defined g(x)=f(x/1e7) .)
Even when I define a complete xrange like

fit ["16.6.99" : "1.6.00"] g(x) "gewicht.dat" using 1:2 via "gewicht.fit"

there is no difference in the parameters found by fit...

Any idea what's going on?

In addition, I think I found a little bug in 'fit'.
When I say

fit g(x) "gewicht.dat" using 1:2 via "gewicht.fit"

and "gewicht.fit" contains parameters (at the end, in this case)
which do *not* occur in the function (g, in this case),
then 'fit' terminates with the error message:

After 1 iterations the fit converged.
final sum of squares of residuals : 313472
rel. change during last iteration : -2.0264e-09
[...]
Singular matrix in Invert_RtR

when I comment out the unused parameter, 'fit' works fine.
From a user's perspective, this seems like a bug,
because IMHO 'fit' should not be confused by harmless "garbage" in the
parameters file.
Maybe it should emit a warning saying that the user specified parameters
which did not occur in the function, and that 'fit' ignored them.
(I found this bug because I just changed f() to a parabola, and kept
the parameters file as left from the last fit with f() being 3-rd order -
which I think should be allowed; after all, when I don't know what the
model is, I would like to "experiment" a little, omitting a parameter
sometimes ...)

TIA,
Gab.

Hans-Bernhard Broeker

unread,

Mar 15, 2000, 3:00:00 AM3/15/00

to

Gabriel Zachmann <za...@igd.fhg.de> wrote:

> Unfortunately, I've got another question now:
> it seems that the temporary range feature of 'fit' doesn't work for me.
> When I say

> fit ["16.6.99" : *] g(x) "gewicht.dat" using 1:2 via "gewicht.fit"

Have you switched datafiles? The one you sent us didn't contain any
datapoints before about September 1999, so the above range would not
be limiting anything, effectively.

Using the file you sent yesterday, and ranges like ["1.10.99":*] or
["1.10.99":"1.2.00"] in the fit command, I do see quite noticeable
changes in the fit function's shape and parameters.

> In addition, I think I found a little bug in 'fit'.
> When I say

> fit g(x) "gewicht.dat" using 1:2 via "gewicht.fit"

> and "gewicht.fit" contains parameters (at the end, in this case)
> which do *not* occur in the function (g, in this case),
> then 'fit' terminates with the error message:

That's to be expected. Fit tries to find the change of WSSR you get by
modifying that unused parameter, finds zero change. That means the
matrix will have a whole column of zeroes in it, and cause the routine
to break.

> From a user's perspective, this seems like a bug,
> because IMHO 'fit' should not be confused by harmless "garbage" in the
> parameters file.

Nice reasoning, but there's a reason that we don't (yet) test for
this: there's currently no way for fit to know which variables are
used in evaluating a fit function, and which aren't.

> Maybe it should emit a warning saying that the user specified
> parameters which did not occur in the function, and that 'fit'
> ignored them.

'Should'? Probably yes. But OTOH, 'fit' was never meant to be the end
of all worries regarding data fitting, was it? Implementing checks
like this would be time-consuming, with little effect on the average
user. Feel free to implement it yourself (hint: search for that whole
column of zeroes, and flag the variable unused if you find one).