Feature request - MATCHES function

48 views
Skip to first unread message

John Brown

unread,
Jul 10, 2024, 11:32:10 PM7/10/24
to FastExcelV4
Hi Charles

The MATCHES function is good, but IMHO it is missing a parameter; or maybe there is need for another function that is like MATCHES but with the extra parameter.

In SQL you can do inner/left/right joins, using the equals operator.
But if you want, you can also do the joins using any of:
>
<
>=
<=
<>
(not sure about the last one).

MATCHES function defaults to 'doing a join', or the equivalent of, using the = operator. But it would be good to be able to choose other operators. This would enable more sophisticated joining/matching, e.g. joining rows based on ranges, etc. Maybe even one of those 'all rows to all rows' joins you can do in SQL where you then filter the rows returned by the WHERE statement only (which is discorouged I know, or even deprecated).

I get that MATCHES function has the Match_Type parameter, but this is more for deciding the equivalent of inner/left/right join, whereas I am asking for a parameter to control which of these is used: =, >, <. >=, <=, <>.

Thanks very much.
John

Charles Williams

unread,
Jul 11, 2024, 9:38:10 AM7/11/24
to FastExcelV4

There is also MISSMATCHES for the Anti-Join.

 

The problem with using GT LT etc operators is that they force a linear scan of the data which is very slow unless the data is already sorted  or indexed.
You can do this with the native FILTER function if you don't mind it being slow.

 

Regards

Charles

Lambda Explorer Released

https://www.decisionmodels.com/FastExcelLambdaExplorer.htm

John Brown

unread,
Nov 4, 2024, 4:43:18 PM11/4/24
to FastExcelV4
Thanks Charles,

" the data is already sorted  or indexed" seems similar to what I have read in the Help for certain of the functions, that they are really fast when handling sorted data. I have a couple of questions:

1. I don't understand exactly what I need to do to bring this about. Apart from the obvious that is, which is to click on the AZ|v button (my depiction of the graphic) for the column or columns before running a formula with the function(s) on that column or columns.
But:
Do I need to signal to the function that the data is sorted? - i.e. set a parameter 'this data is sorted'?
For FastExcel, or for that matter a built-in function that runs fastest on sorted data, to know that the data is sorted - does it not have to scan the data to determine that it is sorted? Is that not slow? Does it only achieve a speed-up the next time it runs?
What if I go into the sorted row and change one value, breaking the 'sortedness' of the data?
What if I sort in descending order, Z-A?
" sorted  or indexed" - what is the difference in Excel? This is not Access or SQL Server, where indexes are things that you define/create and then they update automatically. How do I index in Excel? That is one of the things I most hate about Excel - it doesn't seem to have the concept of a primary key, even if lookup/match formula columns seem quite like foreign keys. I used to try to use the Scripting dictionary to 'bolt on' this key functionality but I don't bother any more, as it requires too much work to transfer the data to/from Excel. An Excel table provides uniqueness of columns, but that is not uniqueness of rows, or anything like a multi-column primary key.

2. "The problem with using GT LT etc operators is that they force a linear scan of the data which is very slow unless the data is already sorted  or indexed."
Fair point, but I suggest it could still be worth it, at least for the case of sorted data. After all there are many functions that run fast on sorted data and slow on unsorted data. But we still value them for their richer functionality than if built more simply.

cheers

John

Charles Williams

unread,
Nov 5, 2024, 6:51:35 AM11/5/24
to FastExcelV4
Different functions use different techniques and are optimised for different use cases.

FXL -
MATCHES uses unordered multimaps so does not care if the data is already sorted and does not sort it.
MISSMATCHES uses unordered maps
AMATCH, AVLOOKUP etc linear search with memory if the data is unsorted, binary search if sorted
the FILTER.IFS family use progressive subsetting of hi-lo binary searches on sorted columns and linear searches on unsorted columns or regex matching
COMPARE.LISTS uses sets

Functions that use Binary Search need to know if the data is sorted or not. Functions that create their own indexes using MAP, MULTIMAP etc do not care if the data is sorted.

XL -
XLOOKUP uses something like dynamic indexes which get rebuilt when dirtied, binary search with sorted data, linear search with regex etc.
FILTER does not do any indexing and ignores sorted data
SUMIFS does progressive subsetting but does not care about sorted columns.

Regds
Charles
Reply all
Reply to author
Forward
0 new messages