Multivariate Kernel Density Estimation Code

Holger Junker

unread,

Mar 5, 2003, 9:29:25 AM3/5/03

to

Hi,
does anybody know if there is some code around that allows to estimate
the pdf of a vector of random variables using gaussian kernels?
Thank you

Peter Perkins

unread,

Mar 5, 2003, 9:57:28 AM3/5/03

to

Hi Holger -

> does anybody know if there is some code around that allows to estimate
> the pdf of a vector of random variables using gaussian kernels?

As far as I know, this is generally thought of as being impractical in
more than a couple of dimensions, because (1) the number of observations
needed to get decent estimates grows very quickly with the dimension,
and (2) there's not an easy way to know what shape the kernels should
take (i.e., what covariance matrix should be used).

Nonetheless, attached is a 2D version that might get you started. Hope
it helps.

- Peter Perkins
The MathWorks, Inc.

ksdensity2d.m

Peter Perkins

unread,

Mar 6, 2003, 3:29:39 PM3/6/03

to

> Nonetheless, attached is a 2D version that might get you started. Hope
> it helps.

Apparently this attachment was not universally readable. Here is the code
in text form:

function test
gridx1 = 0:.05:1;
gridx2 = 5:.1:10;
X = [0+.5*rand(20,1) 5+2.5*rand(20,1);
.75+.25*rand(10,1) 8.75+1.25*rand(10,1)];
ksdensity2d(X,gridx1,gridx2);

function f = ksdensity2d(x,gridx1,gridx2,bw)
% KSDENSITY2D Compute kernel density estimate in 2D.
% F = KSDENSITY2D(X,GRIDX,GRIDX2,BW) computes a nonparametric estimate of
% the probability density function of the sample in the N-by-2 matrix X.
% F is the vector of density values evaluated at the points in the grid
% defined by the vectors GRIDX1 and GRIDX2. The estimate is based on a
% normal kernel function, using a window parameter (bandwidth) that is a
% function of the number of points in X.
[n,p] = size(x);
m1 = length(gridx1);
m2 = length(gridx2);

% Choose bandwidths optimally for Gaussian kernel
if nargin < 4 || isempty(bw)
sig1 = median(abs(gridx1-median(gridx1))) / 0.6745;
if sig1 <= 0, sig1 = max(gridx1) - min(gridx1); end
if sig1 > 0
bw(1) = sig1 * (1/n)^(1/6);
else
bw(1) = 1;
end
sig2 = median(abs(gridx2-median(gridx2))) / 0.6745;
if sig2 <= 0, sig2 = max(gridx2) - min(gridx2); end
if sig2 > 0
bw(2) = sig2 * (1/n)^(1/6);
else
bw(2) = 1;
end
end

% Compute the kernel density estimate
[gridx2,gridx1] = meshgrid(gridx2,gridx1);
x1 = repmat(gridx1, [1,1,n]);
x2 = repmat(gridx2, [1,1,n]);
mu1(1,1,:) = x(:,1); mu1 = repmat(mu1,[m1,m2,1]);
mu2(1,1,:) = x(:,2); mu2 = repmat(mu2,[m1,m2,1]);
f = sum((normpdf(x1,mu1,bw(1)) .* normpdf(x2,mu2,bw(2))), 3) / n;

% Plot the estimate
surf(gridx1,gridx2,f);
hold on;
plot3(x(:,1),x(:,2),zeros(n,1),'bo');
hold off;
view(-37.50,30);

Curtis Williams

unread,

Jul 24, 2015, 1:40:09 PM7/24/15

to

Hi Peter,

I can follow along most of the code below, but I have a quick question regarding the calculations for the optimal bandwith. Where does the value of 0.6745 come from? I ask because I noticed that if you change the value, it changes the "resolution" of the output plot.

Thank you,

Curtis

Peter Perkins <pper...@RemoveThis.mathworks.com> wrote in message <b4538o$csm$1...@ginger.mathworks.com>...
> This is a multi-part message in MIME format.
> --------------000006070606020202010102
> Content-Type: text/plain; charset=us-ascii; format=flowed
> Content-Transfer-Encoding: 7bit

> --------------000006070606020202010102
> Content-Type: text/plain;
> name="ksdensity2d.m"
> Content-Transfer-Encoding: 7bit
> Content-Disposition: inline;
> filename="ksdensity2d.m"

> --------------000006070606020202010102--
>