Time Series Prediction Example (Time Delay NN)

Greg Heath

unread,

Dec 13, 2010, 1:19:27 PM12/13/10

to

In the NN Toolbox documentation for the TimeDelay NN
(http://www.mathworks.com/help/pdf_doc/nnet/nnet.pdf),
the following statements are found regarding the laser
data time series prediction example

ftdnn_net = newfftd(y,y,[1:8], H); % H = 5 (Ver 2009b)
ftdnn_net = timedelaynet([1:8], H); % H = 10 (Ver 2010b)
Pi = y(1:8);
p = y(9:end); % length(y) = 600
t = y(9:end);

The last two statements are in error.
If t = p there is no step ahead prediction.

For an s-step ahead predictor using a delay buffer
of d previous inputs,

Pi = y(1:d);
p = y(d+1,end-s); % Input
t = y(d+1+s,end); % target

Given this correction, three questions immediately come
to mind:

Q1: WHY USE d = 8?
Q2: WHY USE H = 5 IN 2009b AND H = 10 IN 2010b?
Q3. WHAT IS THE PREDICTION PERFORMANCE ON
NONDESIGN DATA?

To answer the questions I downloaded the original
N0 = 1000 observations from the Santa Fe Competition
website
(http://www-psych.stanford.edu/~andreas/Time-Series/SantaFe/A.dat)
and normalized them to the interval [-1 1]. Then I extracted
the N = 600 observations used in the MATLAB example.

To answer Q1 consider the nonnegative lag values of the
autocorrelation function of y given by

y = y(1:600);
N = length(y)
acf = xcorr(y);
acfss = acf(N:end)/max(acf); % Single-sided (nonegative lags)
movavgacf = cumsum(acfss(2:21))./(1:20); %Moving avg positive lags

%Find maximum correlation for positive lags (d = 8)

[maxacf1 d] = max(acfss(2:end)) % [0.91551 8]

%Tabulate the positive lag correlations and their
%moving average:

acftab = [(1:10)' acfss(2:11)' movavgacf(1:10)']

N = 600
maxacf1 = 0.91551
d = 8
LAG ACFSS MOVAVG
acftab = 1 0.82013 0.82013
2 0.54863 0.68438
3 0.41022 0.59299
4 0.3866 0.5414
5 0.45524 0.52416
6 0.63382 0.54244
7 0.8636 0.58832
8 ** 0.91551 0.62922
9 0.71236 ** 0.63846
10 0.49943 0.62455

Therefore, d = 8 is a good choice, However, d = 9
might be better.

To answer questions Q2 and Q3, the N = 600
observations used in the example were divided into
a training set (Ntrn = 400) and a holdout test set
(Ntst = 200) in order to obtain an unbiased estimate
of the prediction performance.

Sixty nets were designed: 10 random weight initializtion
trials for each of the 6 values of H in [0 5]. The
training goal was to obtain a degree-of-freedom adjusted
R^2 ( Ratrn^2), of at least 0.99.

When H = 0 and 1, the performance results did not vary
with the random weight initialization.

When H = 0 all trials converged in 3 epochs. For H = 1,
the convergence ranged between 44 and 56 epochs.

The resulting R^2 statistic results are

H = 0 H = 1
Ratrn^2 0.822 0.855
Rtst^2 0.674 0.671

The results for H = 2:5 are tabulated below. The
maximum No. of allowed epochs was 1000

No. of Epochs

Trial H = 2 H = 3 H = 4 H = 5

1 102 114 351 1000
2 250 1000 31 31
3 45 1000 150 30
4 631 27 23 57
5 841 52 88 28
6 34 14 108 123
7 1000 1000 555 251
8 37 1000 1000 12
9 1000 1000 713 58
10 1000 16 1000 19

Ratrn^2 ( x indicates nonconvergence )

Trial H = 2 H = 3 H = 4 H = 5

1 x 0.944 0.994 0.992 x 0.976
2 x 0.872 x 0.945 0.992 0.990
3 0.990 x 0.926 0.992 0.992
4 x 0.899 0.991 0.990 0.991
5 x 0.876 0.990 0.990 0.992
6 0.990 0.993 0.998 0.991
7 x 0.922 x 0.949 x 0.965 0.995
8 0.990 x 0.968 x 0.922 0.990
9 x 0.922 x 0.954 x 0.932 0.991
10 x 0.980 0.993 x 0.954 0.991

Rtst^2

x indicates nonconvergence
** indicates best performance
* indicates good performance

Trial H = 2 H = 3 H = 4 H = 5

1 x 0.778 ** 0.900 0.796 x 0.735
2 x 0.463 x 0.395 0.859 0.734
3 ** 0.935 x 0.612 0.894 * 0.915
4 x 0.770 0.861 0.818 0.835
5 x 0.484 0.826 0.771 0.825
6 * 0.934 0.845 ** 0.946 ** 0.949
7 x 0.660 x 0.557 x 0.653 0.873
8 ** 0.935 x 0.431 x 0.748 0.855
9 x 0.660 x 0.298 x 0.780 * 0.922
10 x 0.010 0.899 x 0.746 0.867

It is somewhat surprising that 3 good test
results (Rtst^2 >= 0.915) are obtained when
H = 2 and 5, but only 1 good reult for H = 4
and a maximum of only 0.90 for H =3.

For those who are interested in a more detailed
investigation of this website's time series data

a. Train with 600 / Test with next 400
b. Train with 1000 / Test with next 10,000.

Also check out the other time series data on the
website.

clear all, close all, clc, k = 0

load laserdata.m
x0 = laserdata';
[I0 N0] = size(x0) % [1 1000]
[minx0 imin] = min(x0) % [ 2 610 ]
[maxx0 imax] = max(x0) % [ 255 603 ]

k=k+1,figure(k) % figure 1
hold on
plot(x0,'k')
plot(x0,'.')
ylabel('Intensity')
title('Laser Data')

x0n = 2*(x0-minx0)/(maxx0-minx0)-1;

k=k+1,figure(k) % figure 2
hold on
plot(x0n,'k')
plot(x0n,'.')
ylabel('Normalized Intensity')
title('Normalized Laser Data')

Ntrn = 400
acf = xcorr(x0n(1:Ntrn));
acfss = acf(Ntrn:end)/max(acf); % Single-sided (nonegative lags)

% Find maximum correlation for positive lags (d = 8)
[maxacf d] = max(acfss(2:end)) %^ [0.90368 7]

movavgacf = cumsum(acfss(2:21))./(1:20); %Moving avg positive lags
[maxmovavg d] = max(movavgacf(2:end)) % [0.92094 8]

% Tabulate the positive lag correlations and their
% moving average:

acftab = [(1:10)' acfss(2:11)' movavgacf(1:10)']

% Plot the autocorrelation results
k=k+1,figure(k) % figure 3
subplot(211)
plot(1:Ntrn-1,acfss(2:end),'.')
axis([0 Ntrn 0 1])
title('Laser Data Autocorrelation Function')
subplot(212)
hold on
plot(1:20,acfss(2:21),'.')
plot(1:20,movavgacf(1:20),'r')
axis([0 21 0 1])
legend('Autocorr','MovingAvg',4)

N = 600
Ntrn = 400
d = 8
y = x0n(1:N);

Pitrn = y(1:d); % inputs that yield no direct output
ptrn = y(d+1:Ntrn-1); % Training input
ttrn = y(d+2:Ntrn); % Target for predicting 1 point ahead
[I Nt] = size(ptrn) % [1 391]
[O Nt] = size(ttrn) % [1 391]
Neq = Nt*O % No. of training equations

% Find reference MSE corresponding to a
% constant model

MSEa00 = mean(var(ttrn')) % 0.1385
RMSEa00 = sqrt(MSEa00) % 0.37216

% Intialize random number generator so that
% results are reproducible

rand('state',0)

% Nw = (I+d+1)*H + (H+1)*O % No of weights <= Neq
Hub = floor((Neq-O)/(I+d+1+O)) % 35 = H upper bound
Hmax = 5 % 2009b Demo choice
Nwmax = (I+d+1)*Hmax + (Hmax+1)*O % 56 = max No. of weights
r = Neq/Nwmax % 6.9821 Equation/Unknowm ratio
Ntrials = 10

Pitrns = con2seq(Pitrn);
ptrns = con2seq(ptrn);
ttrns = con2seq(ttrn);

Pitsts = con2seq(y(Ntrn-d+1:Ntrn));
ptsts = con2seq(y(Ntrn:N-1));
ttst = y(Ntrn+1:N);

for j = 0:Hmax
H = j

for i = 1:Ntrials

if H ==0
net = newfftdgh(minmax(ptrn),[1:d],[O]); %Older Version
Nw = (I+d+1)*O;
else
net = newfftdgh(minmax(ptrn),[1:d],[H O]);
Nw = (I+d+1)*H + (H+1)*O;
end
net.trainParam.goal = 0.01*(Neq-Nw)*MSEa00/Neq;
net.trainParam.epochs = 1000;
net.trainParam.show = inf;

[net tr yptrns etrns Pftrns] = train(net,ptrns,ttrns,Pitrns);

Nepochs(i,j+1) = tr.epoch(end);
MSEtrn = tr.perf(end)
MSEatrn = Neq*MSEtrn/(Neq-Nw)
RMSEatrn = sqrt(MSEatrn)
R2atrn(i,j+1) = 1-MSEatrn/MSEa00;

% Uncomment this code to obtain 60 plots
% k=k+1,figure(k)
% subplot(211)
% plot(tr.epoch,1-Neq*(tr.perf/MSEa00)/(Neq-Nw))
% ylim([0 1])
% xlabel('Epoch')
% ylabel('Adjusted R^2 Statistic')
% title(['Laser Data Training Performance: H = ',...
% num2str(j),', Trial No. ',num2str(i)])
% subplot(212)
% plot(1:Nt,cell2mat(etrns))
% ylim([-1 1])
% xlabel('Observation No.')
% ylabel('Training Error')

yptsts = sim(net,ptsts,Pftrns);
yptst = cell2mat(yptsts);
etst = ttst-yptst;
MSEtst = mse(etst)
RMSEtst = sqrt(MSEtst)
R2tst(i,j+1) = 1-MSEtst/MSEa00;

% k=k+1, figure(k)
% hold on
% plot(ttrn,'.')
% plot(yptrn,'r.')
% drawnow
end
end

for kk= k:-1:1
figure(kk)
end

resulttrn1 = [(1:Ntrials)' Nepochs]

% 1 3 48 102 114 351
1000
% 2 3 49 250 1000
31 31
% 3 3 44 45 1000
150 30
% 4 3 46 631 27
23 57
% 5 3 47 841 52
88 28
% 6 3 51 34 14
108 123
% 7 3 51 1000 1000
555 251
% 8 3 48 37 1000
1000 12
% 9 3 51 1000 1000
713 58
% 10 3 56 1000 16
1000 19

resulttrn2 = [(1:Ntrials)' R2atrn]

% 1 0.82155 0.8548 0.94439 0.99418
0.99153 0.9759
% 2 0.82155 0.8548 0.87227 0.94492
0.99219 0.99045
% 3 0.82155 0.8548 0.99007 0.92551
0.99171 0.99169
% 4 0.82155 0.8548 0.89933 0.99132
0.99026 0.99101
% 5 0.82155 0.8548 0.87647 0.99031
0.99016 0.99207
% 6 0.82155 0.8548 0.99005 0.99297
0.99079 0.99133
% 7 0.82155 0.8548 0.92216 0.94885
0.96529 0.99454
% 8 0.82155 0.8548 0.99006 0.96842
0.92218 0.99049
% 9 0.82155 0.8548 0.92216 0.95419
0.93237 0.9906
% 10 0.82155 0.8548 0.98043 0.99299
0.95443 0.99095

resulttst = [(1:Ntrials)' R2tst]

% 1 0.67354 0.67085 0.77774 0.90006* 0.796
0.73522
% 2 0.67354 0.67085 0.4632 0.39454 0.8589*
0.73365
% 3 0.67354 0.67085 0.9346* 0.61228 0.89415*
0.91544*
% 4 0.67354 0.67085 0.77017 0.86093* 0.81804
0.83511
% 5 0.67354 0.67085 0.4843 0.82633 0.77111
0.82466
% 6 0.67354 0.67085 0.93425* 0.84488 0.94588*
0.94863*
% 7 0.67354 0.67085 0.65968 0.55672 0.65269
0.87336
% 8 0.67354 0.67085 0.9346* 0.43103 0.74806
0.85471
% 9 0.67354 0.67085 0.65967 0.29821 0.78016
0.9224*
% 10 0.67354 0.67085 0.0098806 0.89895* 0.74645
0.86665

Hope this helps.

Greg

Greg Heath

unread,

Dec 13, 2010, 2:16:22 PM12/13/10

to

In the NN Toolbox documentation for the TimeDelay NN
(http://www.mathworks.com/help/pdf_doc/nnet/nnet.pdf),
the following statements are found regarding the laser
data time series prediction example

ftdnn_net = newfftd(y,y,[1:8], H); % H = 5 (Ver 2009b)

ftdnn_net = timedelaynet([1:8], H); % H = 10(Ver 2010b)

Pi = y(1:8);
p = y(9:end); % length(y) = 600
t = y(9:end);

The last two statements are in error.
If t = p there is no step ahead prediction.

For an s-step ahead predictor using a delay buffer
of d previous inputs,

Pi = y(1:d);
p = y(d+1,end-s); % Input
t = y(d+1+s,end); % target

Given this correction, three questions immediately come
to mind:

Q1: WHY USE d = 8?
Q2: WHY USE H = 5 IN 2009b AND H = 10 IN 2010b?
Q3. WHAT IS THE PREDICTION PERFORMANCE ON NONDESIGN DATA?

To answer the questions I downloaded the original
N0 = 1000 observations from the Santa Fe Competition
website
(http://www-psych.stanford.edu/~andreas/Time-Series/SantaFe/A.dat)
and normalized them to the interval [-1 1]. Then I extracted
the N = 600 observations used in the MATLAB example.

To answer question Q1, consider the nonnegative lag values

of the autocorrelation function of y given by

y = y(1:600);
N = length(y)
acf = xcorr(y);
acfss = acf(N:end)/max(acf); % Single-sided (nonegative lags)
movavgacf = cumsum(acfss(2:21))./(1:20); %Moving avg positive lags

% Find maximum correlation for positive lags (d = 8)

[maxacf1 d] = max(acfss(2:end)) % [0.91551 8]

% Tabulate the positive lag correlations and their
% moving average:

acftab = [(1:10)' acfss(2:11)' movavgacf(1:10)']

N = 600

No. of Epochs

Rtst^2

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

x0n = 2*(x0-minx0)/(maxx0-minx0)-1;

rand('state',0)

%Trial H = 0 H = 1 H = 2 H = 3 H = 4 H = 5
%

% 1 3 48 102 114 351 1000
% 2 3 49 250 1000 31 31
% 3 3 44 45 1000 150 30
% 4 3 46 631 27 23 57

% 5 3 4 841 52 88 28

% 6 3 51 34 14 108 123
% 7 3 51 1000 1000 555 251
% 8 3 48 37 1000 1000 12
% 9 3 51 1000 1000 713 58
% 10 3 56 1000 16 1000 19

resulttrn2 = [(1:Ntrials)' R2atrn]

%Trial H = 0 H = 1 H = 2 H = 3 H = 4 H = 5
%
% 1 0.8216 0.8548 0.9444 0.9942 0.9915 0.9759
% 2 0.8216 0.8548 0.8723 0.9449 0.9922 0.9905
% 3 0.8216 0.8548 0.9901 0.9255 0.9917 0.9917
% 4 0.8216 0.8548 0.8993 0.9913 0.9903 0.9910
% 5 0.8216 0.8548 0.8765 0.9903 0.9902 0.9921
% 6 0.8216 0.8548 0.9901 0.9930 0.9908 0.9913
% 7 0.8216 0.8548 0.9222 0.9489 0.9653 0.9945
% 8 0.8216 0.8548 0.9901 0.9684 0.9222 0.9905
% 9 0.8216 0.8548 0.9222 0.9542 0.9324 0.9906
% 10 0.8216 0.8548 0.9804 0.9930 0.9544 0.9910

resulttst = [(1:Ntrials)' R2tst]

%Trial H = 0 H = 1 H = 2 H = 3 H = 4 H = 5
%
% 1 0.6735 0.6709 0.7777 0.9000 0.7960 0.7352
% 2 0.6735 0.6709 0.4632 0.3945 0.8589 0.7337
% 3 0.6735 0.6709 0.9346 0.6123 0.8942 0.9154
% 4 0.6735 0.6709 0.7702 0.8609 0.8180 0.8351
% 5 0.6735 0.6709 0.4843 0.8263 0.7711 0.8247
% 6 0.6735 0.6709 0.9343 0.8449 0.9459 0.9486
% 7 0.6735 0.6709 0.6597 0.5567 0.6527 0.8734
% 8 0.6735 0.6709 0.9346 0.4310 0.7481 0.8547
% 9 0.6735 0.6709 0.6597 0.2982 0.7802 0.9224
% 10 0.6735 0.6709 0.0099 0.8990 0.7465 0.8667

Hope this helps.

Greg

Greg Heath

unread,

Dec 16, 2010, 10:53:03 PM12/16/10

to

The previous results were obtained for 0 <= H <=5 at
d = 8, the value of positive delay for which the
autocorrelation function (acf) has a maximum when
all of the data (N = 600) is used.

However, to avoid biasing the test set estimate of
prediction error, it is better to only use parameters
estimated from nontest design data.

In this example validation is not used. First, the acf
of the training (Ntrn = 400) data is tabulated. The
maximum value of the acf for positive delays is 0.9
at d = 7.

Next, a limited search for an optimal value of d in
[1:10] is performed for H = 5. Tabulations of the
test R^2 statistic (R2tst) are obtained from ten
trials each for each candidate value of d.

The resulting summary of test set R^2 statistics is

Delay Max Median Mean Min
1 -0.53 -0.58 -0.57 -0.66
2 0.89 0.85 0.78 0.44
3 0.94 0.89 0.89 0.83
4 0.94 0.90 -13.33 -141.37
5 0.96 0.91 0.88 0.68
6 0.94 0.92 0.90 0.84
7 0.96 0.87 0.86 0.77
8 0.93 0.78 0.77 0.55
9 0.91 0.72 0.67 0.19
10 0.86 0.69 -1.58 -17.01

The negative values indicate that some of the NNs
perform worse than a constant model that always
outputs the average of the target values.

When negative Rtst^2 values are replaced by NaNs,
the summary statistics change to

Delay Max Median Mean Min
1 NaN NaN NaN NaN
2 0.89 0.84 0.78 0.44
3 0.94 0.89 0.89 0.83
4 0.93 0.90 0.90 0.82
5 0.96 0.91 0.88 0.68
6 0.94 0.92 0.90 0.84
7 0.96 0.87 0.86 0.75
8 0.93 0.78 0.77 0.55
9 0.91 0.72 0.67 0.19
10 0.86 0.76 0.70 0.40

Although more confident statistics would result
from choosing Ntrials >> 10, the following
observations are interesting:

1. d = 1 is useless.
2. Although the best result occurred for d = 5,
very good results were obtained throughout
the interval 3 <= d <= 8.
3. Median and mean statistics tend to increase the
confidence for the narrower interval 3 <= d <= 6.
4. Training set max and median statistics in the
following tabulation tend to favor the broader
interval 4 <= d <= 10. This result tends to
emphasize the need to use nontraining data to
confidently estimate both parameter bounds and
prediction performance on unseen data.

Delay Max Median Mean Min
1 0.12 0.06 0.08 0.03
2 0.95 0.93 0.92 0.83
3 0.98 0.96 0.96 0.94
4 0.99 0.99 0.98 0.93
5 0.99 0.99 0.98 0.93
6 0.99 0.99 0.99 0.99
7 0.99 0.99 0.99 0.99
8 0.99 0.99 0.98 0.94
9 0.99 0.99 0.99 0.97
10 0.99 0.99 0.98 0.94

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

clear all, close all, clc, k = 0

tic

load laserdata.m
x0 = laserdata';
[I0 N0] = size(x0) % [1 1000]

[minx0 imin] = min(x0) % [ 2 610 ]
[maxx0 imax] = max(x0) % [ 255 603 ]

x0n = 2*(x0-minx0)/(maxx0-minx0)-1;

N = 600
y = x0n(1:N);
Ntrn = 400
ytrn = y(1:Ntrn);
delaymax = 10

acf = xcorr(ytrn); % Autocorrelation Function
acfss = acf(Ntrn:end)/max(acf); % Single-sided (delays >=0)
movavgacf = cumsum(acfss(2:21))./(1:20); % Moving average

% Find maximum correlation for positive delays (d > 0)

[maxacf1 d] = max(acfss(2:end)) % [0.904 7]

% Tabulate the positive delay correlations and their
% moving average:

acftab = [(1:delaymax)' acfss(2:delaymax+1)' ...
movavgacf(1:delaymax)']

% maxacf1 = 0.904
% d = 7
% acf1tab =
% DELAY ACFSS MOVAVG
% 1 0.848 0.848
% 2 0.608 0.728
% 3 0.476 0.644
% 4 0.457 0.597
% 5 0.536 0.585
% 6 0.715 0.607
% 7 ** 0.904 0.649
% 8 * 0.903 0.681
% 9 0.706 * 0.684
% 10 0.529 0.668

% Intialize random number generator so that results
% are reproducible

rand('state',0)

for j = 1:delaymax

d = j

Pitrn = y(1:d); % inputs that yield no direct output
ptrn = y(d+1:Ntrn-1);% Training input
ttrn = y(d+2:Ntrn); % Target for predicting 1 point ahead

[I Nt] = size(ptrn) % [1 Ntrn-d-1]
[O Nt] = size(ttrn) % [1 Ntrn-d-1]

Neq = Nt*O % No. of training equations

Pitrns = con2seq(Pitrn);

ptrns = con2seq(ptrn);
ttrns = con2seq(ttrn);

Pitsts = con2seq(y(Ntrn-d+1:Ntrn));
ptsts = con2seq(y(Ntrn:N-1));
ttst = y(Ntrn+1:N);

% Find reference MSE corresponding to a constant model

MSEa00 = mean(var(ttrn'))
RMSEa00 = sqrt(MSEa00)

% Nw = (I+d+1)*H + (H+1)*O % No of weights <= Neq

Hub = floor((Neq-O)/(I+d+1+O)) % upper bound for H
H = 5 % 2009b Demo choice
Nw = (I+d+1)*H + (H+1)*O % No. of weights
r = Neq/Nw % Equation/Unknowm ratio
MSEgoal = 0.01*(Neq-Nw)*MSEa00/Neq % Training Goal

Ntrials = 10

for i = 1:Ntrials

net = newfftdgh(minmax(ptrn),[1:d],[H O]);

net.trainParam.goal = MSEgoal;

net.trainParam.epochs = 1000;
net.trainParam.show = inf;

warning off MATLAB:nearlySingularMatrix

[net tr yptrns etrns Pftrns] = ...
train(net,ptrns,ttrns,Pitrns);

Nepochs(i,j) = tr.epoch(end);
MSEtrn = tr.perf(end)
yptrn = cell2mat(yptrns);
eptrn = cell2mat(etrns);

MSEatrn = Neq*MSEtrn/(Neq-Nw)
RMSEatrn = sqrt(MSEatrn)

R2atrn(i,j) = 1-MSEatrn/MSEa00;

% Uncomment to obtain 100 plots
%
% k=k+1, figure(k)
%
% subplot(311)

% plot(tr.epoch,1-Neq*(tr.perf/MSEa00)/(Neq-Nw))
% ylim([0 1])

% xlabel('Training Epoch')

% ylabel('Adjusted R^2 Statistic')

% title(['H = ',int2str(H),', Input Delay = ',...
% int2str(d),', Trial No. ',int2str(i)])
%
% subplot(312)

% hold on
% plot(ttrn,'.')
% plot(yptrn,'r.')

% legend('Target','Prediction')
% ylim([-1.5 1])
% ylabel('Laser Intensity')
%
% subplot(313)
% hold on
% plot(1:Nt,zeros(1,Nt),'k')
% plot(1:Nt,eptrn)

% ylim([-1 1])
% xlabel('Observation No.')

% ylabel('Prediction Error')

yptsts = sim(net,ptsts,Pftrns);
yptst = cell2mat(yptsts);
etst = ttst-yptst;
MSEtst = mse(etst)
RMSEtst = sqrt(MSEtst)

R2tst(i,j) = 1-MSEtst/MSEa00;

end
end

% for kk= 1:k
% figure(kk)
% pause(1)
% end
%
% for kk= k:-1:1
% figure(kk)
% end

Nepochs = Nepochs
epochstats = [(1:delaymax)' max(Nepochs)' median(Nepochs)'...
mean(Nepochs)' min(Nepochs)']

R2atrn = R2atrn
R2trnstats = [(1:delaymax)' max(R2atrn)' median(R2atrn)'...
mean(R2atrn)' min(R2atrn)']

R2tst = R2tst
R2tststats = [(1:delaymax)' max(R2tst)' median(R2tst)'...
mean(R2tst)' min(R2tst)']

r2tst = R2tst;
r2tst(r2tst < 0) = NaN;
r2tststats = [(1:delaymax)' nanmax(r2tst)' nanmedian(r2tst)'...
nanmean(r2tst)' nanmin(r2tst)']

k=k+1,figure(k)
subplot(211)
hold on
plot(1:delaymax,max(R2atrn),'k.-')
plot(1:delaymax,median(R2atrn),'b.-')
plot(1:delaymax,mean(R2atrn),'r.-')
plot(1:delaymax,min(R2atrn),'g.-')
legend('max','median','mean','min',4)
axis([1 11 0.8 1.05])
% xlabel('Maximum Input Delay')
ylabel('Training R^2 ')
title('Training and Prediction Performance (Laser Data)')
subplot(212)
hold on

plot(1:delaymax,max(R2tst),'k.-')
plot(1:delaymax,nanmedian(r2tst),'b.-')
plot(1:delaymax,nanmean(r2tst),'r.-')
plot(1:delaymax,nanmin(r2tst),'g.-')
legend('max','median','mean','min',4)
axis([1 11 0 1.2])
xlabel('Maximum Input Delay')
ylabel(' Prediction R^2')

time = toc %558.22 ( ~ 10 min )

Hopw thia hwlpa.

Greg

Greg Heath

unread,

Dec 16, 2010, 10:57:48 PM12/16/10

to

On Dec 16, 10:53 pm, Greg Heath <he...@alumni.brown.edu> wrote:
>
> Hopw thia hwlpa.

So much for touch typing.

Greg

Jorge Javier Gutierrez

unread,

Feb 28, 2011, 12:07:58 AM2/28/11

to

Dear Greg,

Is Nw <> size(getX(net),1) ?

From Matlab Help:
getx
Purpose
All network weight and bias values as single vector

Best regards
Jorge

Jorge Javier Gutierrez

unread,

Feb 28, 2011, 12:09:48 AM2/28/11

to

Matlab R2010a

Greg Heath

unread,

Mar 15, 2011, 9:21:40 PM3/15/11

to

On Feb 28, 1:07 am, Jorge Javier Gutierrez

Since it's been more than 2 weeks since your post,
you have probably found out by using trial and
error with a smaller data set.

What did you find?

Sorry for the late response.

Greg

Jorge Javier Gutierrez

unread,

Mar 15, 2011, 10:03:51 PM3/15/11

to

Dear Greg,

by running your code I found that Nw <> size(getX(net),1), so I replace in your code:

Nw = (I+d+1)*H + (H+1)*O %% Nw <> size(getX(net),1)
with
Nw = (d+1)*H + (H+1)*O %% Nw = size(getX(net),1)

and so
Hub = floor((Neq-O)/(I+d+1+O))
with
Hub = floor((Neq-O)/(d+1+O))

is that correct? can you check it?

Best regards
Jorge

Greg Heath

unread,

Mar 15, 2011, 11:03:59 PM3/15/11

to

Zinon Taousianis <taousian...@ucy.ac.cy>
Sent: Mar 14, 2011 06:24:20 PM

>I'm a student from University of Cyprus. I don't
>want to take your time but i saw your code in
>mathwork about newffdt. I didn't understand,
>where is the ahead prediction of your code? i mean,
>how your code predicts the value at 601,602..?

With d = 7, Use 594-600 to predict 601, then
use 595-601 to predict 602, etc.

Zinon Taousianis <taousian...@ucy.ac.cy>
Sent: Mar 15, 2011 03:38:04 PM

> I don't understand exactly where is the prediction.
> i.e. There are 600 observations, where is the values
> of s-step ahead predictions?

I only considered the case s = 1.

Pi = y(1:d);
p = y(d+1,end-s); % Input
t = y(d+1+s,end); % target

You can easily generalize to an s >= 1 dimensional output

Pitrn = y(1:d); % inputs that yield no direct output

ptrn = y(d+1:Ntrn-s); % Training input

% Target for predicting the next s points

ttrn = y(d+2:d+1+s,Ntrn+1-s:Ntrn);

There will be Ntrn-d-s s-dimensional predictions in yptrns
There will be Ntst-d-s s-dimensional predictions in yptsts

Obviously, there will be s separate predictions for most
of these points.

>I would like to ask you something more. Assume you
>have a vector with 500 values [x1,x2.......x500].
>I thought a way to make prediction ahead but i
>don't know if the idea is good and if it worths
>to implement it.

>i thought the following procedure for training :
>inputs x1,x2,x3 and the target to be x4. Then
>inputs x2,x3,x4 and target x5 and so on until to
>reach the final three observations and forecast
>the next one. is that procedure a good choice or
>there is no meaning?

Yes it can be done with newff. I have several
time series posts which do just that. You might
be able to find them searching with

heath time-series newff

Hope this helps.

Greg

Greg Heath

unread,

Mar 16, 2011, 12:22:52 AM3/16/11

to

On Mar 15, 10:03 pm, Jorge Javier Gutierrez

For an I dimensional input series used to
predict an O-dimensional output series
with H hidden nodes, there are

(I+1)*H (input+bias)-to-hidden weights
(H+1)*O (hidden+bias)-to-output weights
therefore, for newff,

Nw = (I+1)*H+(H+1)*O

For newfftd, there will be d additional
delayed inputs. However, those inputs
are I-dimensional.

So now I get

Nw = ((d+1)*I+1)*H+(H+1)*O

clear all, clc
I = 3
H = 4
O = 5
N = 6
p = rand(I,N);
t = rand(O,N);
net = newff(minmax(p),[H O]);
NwL = (H+1)*O % 25
NwI0=length(getx(net))-NwL % 16
NwI = (I+1)*H % 16

for d = 0:3
net = newfftdgh(minmax(p),[0:d],[H O]);
d = d
NwL = (H+1)*O % 25
NwI0 = length(getx(net))-NwL
NwI = ((d+1)*I+1)*H
end

Hope this helps.

Greg

Jorge Javier Gutierrez

unread,

Mar 28, 2011, 1:54:33 AM3/28/11

to Greg Heath

Dear Greg,

When input delay (ID) does begin with 0, NwI0 = NwI. But when ID does not begin with 0 then NwI0 <> NwI.

clear all, clc
I = 3
H = 4
O = 5
N = 6
p = rand(I,N);
t = rand(O,N);

d = 3

ID = 0:d
net = newfftdgh(minmax(p),ID,[H O]);
NwL = (H+1)*O
NwI0 = length(getx(net))-NwL
NwI = ((d+1)*I+1)*H % NwI = NwI0

ID = 1:d
net = newfftdgh(minmax(p),ID,[H O]);
NwL = (H+1)*O
NwI0 = length(getx(net))-NwL
NwI = ((d+1)*I+1)*H % NwI <> NwI0

when
>> NwI = ((I*length(ID))+1)*H
then NwI=NwI0

Best regards
Jorge

Greg Heath

unread,

Mar 28, 2011, 9:00:26 PM3/28/11

to

On Mar 28, 1:54 am, Jorge Javier Gutierrez

Correct! ...Thanks.

Replace d+1 by length(ID)

i = 0
for d = 0:5
d = d
% ID = [0:d]
% ID = [1:d]
% ID = [0:2:d]
ID = [1:2:d]
LID = length(ID)

net = newfftdgh(minmax(p),ID,[H O]);

NwL = (H+1)*O % 25
NwI0 = length(getx(net))-NwL
NwI = (LID*I+1)*H
i=i+1
dNw(i) = NwI-NwI0

Andrzej Miekina

unread,

Jun 17, 2011, 3:51:04 PM6/17/11

to

Hi Greg,

I try to predict vector of three variables. First of all I spent lot of time with a "laser example" and lines
p = y(9:end);
t = y(9:end);
Fortunatelly I found your explanation.
I have some apriori knowledge about my variables: v1<=v2<=v3 for the same time
Is any way to introduce this constrains in dynamic net ?
In input data this relation is always satisfy but in predicted result not always.

Regards -

Andrzej

Greg Heath

unread,

Jun 22, 2011, 2:00:22 AM6/22/11

to

I don't know of anyway to introduce that constraint.

Sorry,

Greg