speedup of ismember

Michal Kvasnicka

unread,

Apr 22, 2011, 6:25:08 AM4/22/11

to

Hello,
I am looking for any possibilities, how to speedup following code for location of the integer vector within the matrix (buffer).

Lets start with testing integer matrix (buffer) respresented by:
m = 1000000; % ... number of rows
n = 20; % ... dimension of vector
M = randi([1 n],m,n);

and then the function which is able to find the location of unique row vector x within matrix M may looks like:

testloc = 123456;
x = M(testloc,:);
[tf loc] = ismember(x,M,'rows');

Is possible to find locations of multiple vectors, too:
[tf loc] = ismember([x;y;z],M,'rows');

This solution is very elegant, but I am looking for something faster. For large m > 1e6 and m>20 is typical response time about 0.1sec. In a case of multiple vectors is the response time about 10times longer.
Is there any chance to speedup this searching method based on "ismember" function?

Important notes:
1. The matrix M contains only unique vectors (rows).
2. The matrix M plays a role of the sampling vectors buffer, so its starting size is zero and then the matrix M is extended by each unique vector sample without any ordering.
3. The problem is motivated by optimization of the cpu time consuming black-box function. In this case is very important to avoid repeated evaluation randomly generated input sampling vectors by properly designed buffering.
4. The response of the searching method must be significantly faster then evaluation of the black-box function even in the case of large number od samplicng points in the buffer (matrix).

Michal

per isakson

unread,

Apr 22, 2011, 9:07:05 AM4/22/11

to

"Michal Kvasnicka" wrote in message <iorl24$dal$1...@fred.mathworks.com>...

I have a function, which is six times faster than ismember with your first example (one row). It is bases on a contribution by us, which I once downloaded from the FEX (or copied from a post to this Newsgroup). The function is based on STRFIND (us' trick).

Now, I cannot find it in the FEX, but see: strpat: a pedestrian, exactly matching pattern finder / replacer, by us

BTW: tic, ix = find( all( M == repmat( x, m, 1 ), 2 ), 1 ); toc is more than twice as fast as ismember with your example.

- per

Michal Kvasnicka

unread,

Apr 22, 2011, 10:05:19 AM4/22/11

to

"per isakson" wrote in message <ioruhp$fko$1...@fred.mathworks.com>...

My be I am stupid, but I can not find the way how to effectively use STRPAT as a faster ISMEMBER replacement.

Could you help me a bit more?

Michal

Matt J

unread,

Apr 22, 2011, 10:21:04 AM4/22/11

to

"Michal Kvasnicka" wrote in message <iorl24$dal$1...@fred.mathworks.com>...
>

> This solution is very elegant, but I am looking for something faster. For large m > 1e6 and m>20 is typical response time about 0.1sec. In a case of multiple vectors is the response time about 10times longer.

================================

Using BSXFUN in conjunction with FIND, I get nearly a factor of 20 speed-up in the test case below:

m = 1000000; % ... number of rows
n = 20; % ... dimension of vector
M = randi([1 n],m,n);

testloc = 123456;
testloc=testloc:testloc+5;
x = M(testloc,:);

tic;
[~,loc1] = ismember(x,M,'rows');
toc;
%Elapsed time is 4.058689 seconds.

tic;
loc= find(any(all(bsxfun(@eq,reshape(x.',1,n,[]),M),2),3));
toc;
%Elapsed time is 0.260783 seconds.

Matt J

unread,

Apr 22, 2011, 10:30:20 AM4/22/11

to

"Michal Kvasnicka" wrote in message <iorl24$dal$1...@fred.mathworks.com>...
>

> 2. The matrix M plays a role of the sampling vectors buffer, so its starting size is zero and then the matrix M is extended by each unique vector sample without any ordering.

================

Incidentally, this is a notoriously bad thing to do. It is better that you pre-allocate M with the largest size it will need to assume. Then, run your search on some truncated portion of the matrix M(1:N,:);

Also, if you are going to be appending vectors to matrices, it is better to do so column-wise instead of row-wise. This makes memory access more efficient.

Michal Kvasnicka

unread,

Apr 22, 2011, 10:59:05 AM4/22/11

to

"Matt J" wrote in message <ios3ds$d56$1...@fred.mathworks.com>...

So, if I understand you well, you propose work with matrix M with n rows and m columns and them use M' in your search method:
loc= find(any(all(bsxfun(@eq,reshape(x,1,n,[]),M'),2),3))
Am I right?

per isakson

unread,

Apr 22, 2011, 11:40:04 AM4/22/11

to

"Michal Kvasnicka" wrote in message <ios1uv$fp6$1...@fred.mathworks.com>...

> "per isakson" wrote in message <ioruhp$fko$1...@fred.mathworks.com>...
> > "Michal Kvasnicka" wrote in message <iorl24$dal$1...@fred.mathworks.com>...

....

>
> Could you help me a bit more?
>
> Michal

Matt's solution is faster than my function, which is based one STRFIND.

However, the idea (which is not at all obvious) is to rearrange M and x so that they fit STRFIND. Use STRFIND as in:
>> row = (1:12);
>> pttrn= (3:5);
>> strfind( row, pttrn )
ans =
3
and next check that the index returned match the beginning of a row.

- per

Matt J

unread,

Apr 22, 2011, 11:52:05 AM4/22/11

to

"Michal Kvasnicka" wrote in message <ios53p$cf5$1...@fred.mathworks.com>...

> "Matt J" wrote in message <ios3ds$d56$1...@fred.mathworks.com>...
> > "Michal Kvasnicka" wrote in message <iorl24$dal$1...@fred.mathworks.com>...
> > >
> > > 2. The matrix M plays a role of the sampling vectors buffer, so its starting size is zero and then the matrix M is extended by each unique vector sample without any ordering.
> > ================
> >
> > Incidentally, this is a notoriously bad thing to do. It is better that you pre-allocate M with the largest size it will need to assume. Then, run your search on some truncated portion of the matrix M(1:N,:);
> >
> > Also, if you are going to be appending vectors to matrices, it is better to do so column-wise instead of row-wise. This makes memory access more efficient.
>
> So, if I understand you well, you propose work with matrix M with n rows and m columns

Yes.

>and them use M' in your search method:
> loc= find(any(all(bsxfun(@eq,reshape(x,1,n,[]),M'),2),3))

No, if you work with M organized columnwise, I would reorganize this accordingly

tic;
loc1= find(any(all(bsxfun(@eq,M,reshape(x,n,1,size(x,2))),1),3));
toc;
%Elapsed time is 0.351774 seconds.

Another possibly faster approach based on Cauchy-Schwartz is as follows. I don't include the computation of Mnorms in the tic...toc, because that is something you could update incrementally, as you are doing with M,

Mnorms=sqrt(sum(M.^2,1));
tic;
[~,loc]=max(bsxfun(@rdivide,x.'*M, Mnorms),[],2);
toc;
%Elapsed time is 0.144179 seconds.