Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Find duplicate entries in a matrix and replace them with average

128 views
Skip to first unread message

prof rumsdiegeige

unread,
May 13, 2009, 3:14:01 AM5/13/09
to
Hello,

I've got a matrix of integer values with three columns (X,Y,Z) and about 10k rows.

X and Y are in range 0..1000 and Z is in the range of 0..200.

In this matrix, there are some pairs of (X,Y) that occur more than once, but with different Z values. Now I want to remove these duplicates and replace all of them with (X,Y,mean of Z).

I try this:

for A = 0 : 1000
for B = 0 : 1000
c = find(x==A & y==B);
if (numel(c)>1)
z(c(1))=mean(z(c));
x(c(2:end))=[];
y(c(2:end))=[];
z(c(2:end))=[];
end
end
end

Unfortunately, this loop takes ages to compute...
Any hints on how to improve the speed?

Thanks a lot!
Sabine Lorentz

Yi Cao

unread,
May 13, 2009, 4:46:01 AM5/13/09
to
"prof rumsdiegeige" <professor_r...@yahoo.com> wrote in message <gudrvp$akr$1...@fred.mathworks.com>...

If you do not care about the original sequence, try the code bellow to see it works:

t = X*1000 + Y;
[s,idx] = unique(t);
X=X(idx);
Y=Y(idx);
[T,S]=meshgrid(t,s);
F=(T==S);
z=repmat(Z(:)',numel(s),1);
Z=sum(z.*F,2)./sum(F,2);

Otherwise, the second line needs to be replaced with
[s,idx] = unique(t,'first');
[idx,loc] = sort(idx);
s=s(loc);

HTH
Yi

Nasser Abbasi

unread,
May 13, 2009, 5:03:56 AM5/13/09
to

"prof rumsdiegeige" <professor_r...@yahoo.com> wrote in message
news:gudrvp$akr$1...@fred.mathworks.com...

Try this:

m=cell2mat(textscan(sprintf('%d%d ',A(:,1:2)'),'%d'));
[u,idx,j]=unique(m,'rows');
A(idx,3)=arrayfun(@(x) mean(A(m==m(x),3)),idx(:))
A=A(idx,:)

For example, for this A

A =

250 350 100
240 340 110
250 350 120
110 98 9
250 350 120
240 340 999

You'll get

A =

110.0000 98.0000 9.0000
240.0000 340.0000 554.5000
250.0000 350.0000 113.3333

--Nasser


prof rumsdiegeige

unread,
May 13, 2009, 6:36:02 AM5/13/09
to
Thanks for your solution, but what is "mll2mat" ? My Matlab (R2007b) doesn't know it... is it a particular toolbox?

"Nasser Abbasi" <n...@12000.org> wrote in message <JWvOl.18949$8_3....@flpi147.ffdc.sbc.com>...

> mll2mat(textscan(sprintf('%d%d ',A(:,1:2)'),'%d'));

us

unread,
May 13, 2009, 6:53:01 AM5/13/09
to
"prof rumsdiegeige" <professor_r...@yahoo.com> wrote in message <gudrvp$akr$1...@fred.mathworks.com>...

> Hello,
>
> I've got a matrix of integer values with three columns (X,Y,Z) and about 10k rows.
>
> X and Y are in range 0..1000 and Z is in the range of 0..200.
>
> In this matrix, there are some pairs of (X,Y) that occur more than once, but with different Z values. Now I want to remove these duplicates and replace all of them with (X,Y,mean of Z)...

you will find this great FEX submission by john d'errico very helpful

http://www.mathworks.com/matlabcentral/fileexchange/8354

us

Nasser Abbasi

unread,
May 13, 2009, 7:08:21 AM5/13/09
to

> "Nasser Abbasi" <n...@12000.org> wrote in message
> <JWvOl.18949$8_3....@flpi147.ffdc.sbc.com>...
>>

>> Try this:


>>
>> mll2mat(textscan(sprintf('%d%d ',A(:,1:2)'),'%d'));
>> [u,idx,j]=unique(m,'rows');
>> A(idx,3)=arrayfun(@(x) mean(A(m==m(x),3)),idx(:))
>> A=A(idx,:)
>>
>> For example, for this A
>>
>> A =
>>
>> 250 350 100
>> 240 340 110
>> 250 350 120
>> 110 98 9
>> 250 350 120
>> 240 340 999
>>
>> You'll get
>>
>> A =
>>
>> 110.0000 98.0000 9.0000
>> 240.0000 340.0000 554.5000
>> 250.0000 350.0000 113.3333
>>
>> --Nasser
>>

"prof rumsdiegeige" <professor_r...@yahoo.com> wrote in message
news:gue7qi$hld$1...@fred.mathworks.com...


> Thanks for your solution, but what is "mll2mat" ? My Matlab (R2007b)
> doesn't know it... is it a particular toolbox?
>

That what I was saying in another post. You seem to be reading this from
Matlabcentral? It corrupted what I wrote.

It is "m = c e l l 2 m a t" and not "m l l 2 m a t" (I added spaces
between letters so that MatlabCentral do not corrupt it again, hopefully.

Matlab central seems to convert "m = c e l l 2 m a t" to "m l l 2 m a t".

See my response on google newsgroup and you'll see how it is different.

http://groups.google.com/group/comp.soft-sys.matlab/browse_thread/thread/1e60d4721175af9d?hl=en


--Nasser

Siyi

unread,
May 13, 2009, 11:11:42 PM5/13/09
to

A = [ 250 350 100
240 340 110
250 350 120
110 98 9
250 350 120
240 340 999 ];


[u,id1,id2] = unique(A(:,1:2),'rows');


B = [u,accumarray(id2,A(:,3))./accumarray(id2,1)];

0 new messages