Lets start with testing integer matrix (buffer) respresented by:
m = 1000000; % ... number of rows
n = 20; % ... dimension of vector
M = randi([1 n],m,n);
and then the function which is able to find the location of unique row vector x within matrix M may looks like:
testloc = 123456;
x = M(testloc,:);
[tf loc] = ismember(x,M,'rows');
Is possible to find locations of multiple vectors, too:
[tf loc] = ismember([x;y;z],M,'rows');
This solution is very elegant, but I am looking for something faster. For large m > 1e6 and m>20 is typical response time about 0.1sec. In a case of multiple vectors is the response time about 10times longer.
Is there any chance to speedup this searching method based on "ismember" function?
Important notes:
1. The matrix M contains only unique vectors (rows).
2. The matrix M plays a role of the sampling vectors buffer, so its starting size is zero and then the matrix M is extended by each unique vector sample without any ordering.
3. The problem is motivated by optimization of the cpu time consuming black-box function. In this case is very important to avoid repeated evaluation randomly generated input sampling vectors by properly designed buffering.
4. The response of the searching method must be significantly faster then evaluation of the black-box function even in the case of large number od samplicng points in the buffer (matrix).
Michal
I have a function, which is six times faster than ismember with your first example (one row). It is bases on a contribution by us, which I once downloaded from the FEX (or copied from a post to this Newsgroup). The function is based on STRFIND (us' trick).
Now, I cannot find it in the FEX, but see: strpat: a pedestrian, exactly matching pattern finder / replacer, by us
BTW: tic, ix = find( all( M == repmat( x, m, 1 ), 2 ), 1 ); toc is more than twice as fast as ismember with your example.
- per
Could you help me a bit more?
Michal
Using BSXFUN in conjunction with FIND, I get nearly a factor of 20 speed-up in the test case below:
m = 1000000; % ... number of rows
n = 20; % ... dimension of vector
M = randi([1 n],m,n);
testloc = 123456;
testloc=testloc:testloc+5;
x = M(testloc,:);
tic;
[~,loc1] = ismember(x,M,'rows');
toc;
%Elapsed time is 4.058689 seconds.
tic;
loc= find(any(all(bsxfun(@eq,reshape(x.',1,n,[]),M),2),3));
toc;
%Elapsed time is 0.260783 seconds.
Incidentally, this is a notoriously bad thing to do. It is better that you pre-allocate M with the largest size it will need to assume. Then, run your search on some truncated portion of the matrix M(1:N,:);
Also, if you are going to be appending vectors to matrices, it is better to do so column-wise instead of row-wise. This makes memory access more efficient.
So, if I understand you well, you propose work with matrix M with n rows and m columns and them use M' in your search method:
loc= find(any(all(bsxfun(@eq,reshape(x,1,n,[]),M'),2),3))
Am I right?
Matt's solution is faster than my function, which is based one STRFIND.
However, the idea (which is not at all obvious) is to rearrange M and x so that they fit STRFIND. Use STRFIND as in:
>> row = (1:12);
>> pttrn= (3:5);
>> strfind( row, pttrn )
ans =
3
and next check that the index returned match the beginning of a row.
- per
Yes.
>and them use M' in your search method:
> loc= find(any(all(bsxfun(@eq,reshape(x,1,n,[]),M'),2),3))
No, if you work with M organized columnwise, I would reorganize this accordingly
tic;
loc1= find(any(all(bsxfun(@eq,M,reshape(x,n,1,size(x,2))),1),3));
toc;
%Elapsed time is 0.351774 seconds.
Another possibly faster approach based on Cauchy-Schwartz is as follows. I don't include the computation of Mnorms in the tic...toc, because that is something you could update incrementally, as you are doing with M,
Mnorms=sqrt(sum(M.^2,1));
tic;
[~,loc]=max(bsxfun(@rdivide,x.'*M, Mnorms),[],2);
toc;
%Elapsed time is 0.144179 seconds.