Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

dbf.py API question concerning Index.index_search()

14 views
Skip to first unread message

Ethan Furman

unread,
Aug 15, 2012, 7:26:09 PM8/15/12
to Python
Indexes have a new method (rebirth of an old one, really):

.index_search(
match,
start=None,
stop=None,
nearest=False,
partial=False )

The defaults are to search the entire index for exact matches and raise
NotFoundError if it can't find anything.

match is the search criteria
start and stop is the range to search in
nearest returns where the match should be instead of raising an error
partial will find partial matches

The question is what should the return value be?

I don't like the usual pattern of -1 meaning not found (as in
'nothere'.find('a')), so I thought a fun and interesting way would be to
subclass long and override the __nonzero__ method to return True/False
based on whether the (partial) match was found. The main problems I see
here is that the special return value reverts to a normal int/long if
anything is done to it (adding, subtracting, etc), and the found status
is lost.

The other option is returning a (number, bool) tuple -- safer, yet more
boring... ;)

Thoughts?

~Ethan~

Tim Chase

unread,
Aug 15, 2012, 7:38:44 PM8/15/12
to Ethan Furman, Python
On 08/15/12 18:26, Ethan Furman wrote:
> .index_search(
> match,
> start=None,
> stop=None,
> nearest=False,
> partial=False )
>
> The defaults are to search the entire index for exact matches and raise
> NotFoundError if it can't find anything.
>
> The question is what should the return value be?
>
> I don't like the usual pattern of -1 meaning not found (as in
> 'nothere'.find('a')), so I thought a fun and interesting way would be to
> subclass long and override the __nonzero__ method to return True/False
> based on whether the (partial) match was found. The main problems I see
> here is that the special return value reverts to a normal int/long if
> anything is done to it (adding, subtracting, etc), and the found status
> is lost.
>
> The other option is returning a (number, bool) tuple -- safer, yet more
> boring... ;)


I'm not quite sure I follow...you start off by saying that it will
"raise NotFoundError" if it can't find anything. So if it finds
something, just return it. Because if it found the item, it gives
it to you; if it didn't find the item, it raised an error. That
sounds like a good (easy to understand) interface, similar to how
string.index() works.

-tkc



Ethan Furman

unread,
Aug 15, 2012, 8:21:15 PM8/15/12
to Python
Indeed, it's even less clear without the part you snipped. ;) Which
wasn't very.

The well-hidden clue was this line:

nearest returns where the match should be instead of raising an error

And my question should have been:

What should the return value be when nearest == True?

My bit of fun was this class:

class IndexLocation(long):
"""used by Index.index_search -- represents the index where the
match criteria is if True, or would be if False"""
def __new__(cls, value, found):
"value is the number, found is True/False"
result = long.__new__(cls, value)
result.found = found
return result
def __nonzero__(self):
return self.found

~Ethan~

Tim Chase

unread,
Aug 15, 2012, 8:28:31 PM8/15/12
to Ethan Furman, Python
On 08/15/12 19:21, Ethan Furman wrote:
> The well-hidden clue was this line:
>
> nearest returns where the match should be instead of raising an error
>
> And my question should have been:
>
> What should the return value be when nearest == True?

Ah, well that's somewhat clearer. Return the closest and not bother
to let the user know it was inexact. Upon requesting it with
nearest=True, they *knew* that the result might be a nearest match.
Though if they ask for nearest, an exact match *better* be the
nearest if it exists. :-P

I'd say the API-user shouldn't ask for what they don't want.

-tkc





Steven D'Aprano

unread,
Aug 15, 2012, 8:52:10 PM8/15/12
to
On Wed, 15 Aug 2012 16:26:09 -0700, Ethan Furman wrote:

> Indexes have a new method (rebirth of an old one, really):
>
> .index_search(
> match,
> start=None,
> stop=None,
> nearest=False,
> partial=False )
[...]

Why "index_search" rather than just "search"?


> The question is what should the return value be?
>
> I don't like the usual pattern of -1 meaning not found (as in
> 'nothere'.find('a'))

And you are right not to. The problem with returning -1 as a "not found"
sentinel is that if it is mistakenly used where you would use a "found"
result, your code silently does the wrong thing instead of giving an
exception.

So pick a sentinel value which *cannot* be used as a successful found
result.

Since successful searches return integer offsets (yes?), one possible
sentinel might be None. (That's what re.search and re.match return
instead of a MatchObject.) But first ensure that None is *not* valid
input to any of your methods that take an integer.

For example, if str.find was changed to return None instead of -1 that
would not solve the problem, because None is a valid argument for slices:

p = mystring.find(":")
print(mystring[p:-1]) # Oops, no better with None

You don't have to predict every imaginable failure mode or defend against
utterly incompetent programmers, just against the obvious failure modes.

If None is not suitable as a sentinel, create a constant value that can't
be mistaken for anything else:

class NotFoundType(object):
def __repr__(self):
return "Not Found"
__str__ = __repr__

NOTFOUND = NotFoundType()
del NotFoundType


and then return that.


(By the way, I'm assuming that negative offsets are valid for your
application. If they aren't, then using -1 as sentinel is perfectly safe,
since passing a "not found" -1 as offset to another method will result in
an immediate exception.)


> The other option is returning a (number, bool) tuple -- safer, yet more
> boring... ;)

Boring is good, but it is also a PITA to use, and that's not good. I
never remember whether the signature is (offset, flag) or (flag, offset),
and if you get it wrong, your code will probably fail silently:

py> flag, offset = (23, False) # Oops, I got it wrong.
py> if flag:
... print("hello world"[offset+1:])
...
ello world




--
Steven

Ethan Furman

unread,
Aug 15, 2012, 9:22:02 PM8/15/12
to pytho...@python.org
Steven D'Aprano wrote:
> On Wed, 15 Aug 2012 16:26:09 -0700, Ethan Furman wrote:
>
>> Indexes have a new method (rebirth of an old one, really):
>>
>> .index_search(
>> match,
>> start=None,
>> stop=None,
>> nearest=False,
>> partial=False )
> [...]
>
> Why "index_search" rather than just "search"?

Because "search" already exists and returns a dbf.List of all matching
records.

~Ethan~

MRAB

unread,
Aug 15, 2012, 10:20:04 PM8/15/12
to pytho...@python.org
+1

MRAB

unread,
Aug 15, 2012, 10:21:18 PM8/15/12
to pytho...@python.org
On 16/08/2012 02:22, Ethan Furman wrote:
> Steven D'Aprano wrote:
>> On Wed, 15 Aug 2012 16:26:09 -0700, Ethan Furman wrote:
>>
>>> Indexes have a new method (rebirth of an old one, really):
>>>
>>> .index_search(
>>> match,
>>> start=None,
>>> stop=None,
>>> nearest=False,
>>> partial=False )
>> [...]
>>
>> Why "index_search" rather than just "search"?
>
> Because "search" already exists and returns a dbf.List of all matching
> records.
>
Perhaps that should've been called "find_all"!

Hans Mulder

unread,
Aug 16, 2012, 6:34:35 AM8/16/12
to
I think you should go for the safe boring option, because in many use
cases the caller will need to known whether the number you're returning
is the index of a match or just the nearest non-match. The caller could
redo the match to find out. But you have already done the match, so you
might as well tell them the result.


Hope this helps,

-- HansM

Ethan Furman

unread,
Aug 16, 2012, 12:13:27 PM8/16/12
to pytho...@python.org
MRAB wrote:
> On 16/08/2012 02:22, Ethan Furman wrote:
>> Steven D'Aprano wrote:
>>> On Wed, 15 Aug 2012 16:26:09 -0700, Ethan Furman wrote:
>>>
>>>> Indexes have a new method (rebirth of an old one, really):
>>>>
>>>> .index_search(
>>>> match,
>>>> start=None,
>>>> stop=None,
>>>> nearest=False,
>>>> partial=False )
>>> [...]
>>>
>>> Why "index_search" rather than just "search"?
>>
>> Because "search" already exists and returns a dbf.List of all matching
>> records.
>>
> Perhaps that should've been called "find_all"!

In interesting thought.

Currently there are:

.index(data) --> returns index of data in Index, or raises error
.query(string) --> brute force search, returns all matching records
.search(match) --> binary search through table, returns all matching
records

'index' and 'query' are supported by Tables, Lists, and Indexes; search
(and now index_search) are only supported on Indexes.

~Ethan~

MRAB

unread,
Aug 16, 2012, 12:43:13 PM8/16/12
to pytho...@python.org
On 16/08/2012 17:13, Ethan Furman wrote:
> MRAB wrote:
>> On 16/08/2012 02:22, Ethan Furman wrote:
>>> Steven D'Aprano wrote:
>>>> On Wed, 15 Aug 2012 16:26:09 -0700, Ethan Furman wrote:
>>>>
>>>>> Indexes have a new method (rebirth of an old one, really):
>>>>>
>>>>> .index_search(
>>>>> match,
>>>>> start=None,
>>>>> stop=None,
>>>>> nearest=False,
>>>>> partial=False )
>>>> [...]
>>>>
>>>> Why "index_search" rather than just "search"?
>>>
>>> Because "search" already exists and returns a dbf.List of all matching
>>> records.
>>>
>> Perhaps that should've been called "find_all"!
>
> In interesting thought.
>
> Currently there are:
>
> .index(data) --> returns index of data in Index, or raises error
> .query(string) --> brute force search, returns all matching records
> .search(match) --> binary search through table, returns all matching
> records
>
> 'index' and 'query' are supported by Tables, Lists, and Indexes; search
> (and now index_search) are only supported on Indexes.
>
What exactly is the difference between .index and .index_search with
the default arguments?

Ethan Furman

unread,
Aug 16, 2012, 1:46:30 PM8/16/12
to pytho...@python.org
.index requires a data structure that can be compared to a record
(another record, a dictionary with the same field/key names, or a
list/tuple with values in the same order as the fields). It returns the
index or raises NotFoundError. It is brute force.

.index_search requires match criteria (a tuple with the desired values
in the same order as the key). It returns the index or raises
NotFoundError (unless nearest is True -- then the value returned is
where the match should be). It is binary search.

So the only similarity is that they both return a number or raise
NotFoundError. What they use for the search and how they perform the
search are both completely different.

~Ethan~
0 new messages