Google Groups unterstützt keine neuen Usenet-Beiträge oder ‑Abos mehr. Bisherige Inhalte sind weiterhin sichtbar.

# Bug in slice type

5 Aufrufe
Direkt zur ersten ungelesenen Nachricht

### Bryan Olson

ungelesen,
10.08.2005, 10:54:5410.08.05
an

The Python slice type has one method 'indices', and reportedly:

This method takes a single integer argument /length/ and
computes information about the extended slice that the slice
object would describe if applied to a sequence of length
items. It returns a tuple of three integers; respectively
these are the /start/ and /stop/ indices and the /step/ or
stride length of the slice. Missing or out-of-bounds indices
are handled in a manner consistent with regular slices.

It behaves incorrectly when step is negative and the slice
includes the 0 index.

class BuggerAll:

def __init__(self, somelist):
self.sequence = somelist[:]

def __getitem__(self, key):
if isinstance(key, slice):
start, stop, step = key.indices(len(self.sequence))
# print 'Slice says start, stop, step are:', start,
stop, step
return self.sequence[start : stop : step]

print range(10) [None : None : -2]
print BuggerAll(range(10))[None : None : -2]

The above prints:

[9, 7, 5, 3, 1]
[]

Un-commenting the print statement in __getitem__ shows:

Slice says start, stop, step are: 9 -1 -2

The slice object seems to think that -1 is a valid exclusive
bound, but when using it to actually slice, Python interprets
negative numbers as an offset from the high end of the sequence.

Good start-stop-step values are (9, None, -2), or (9, -11, -2),
or (-1, -11, -2). The later two have the advantage of being
consistend with the documented behavior of returning three
integers.

--
--Bryan

### Steven Bethard

ungelesen,
11.08.2005, 15:35:3411.08.05
an
Bryan Olson wrote:
>
> class BuggerAll:
>
> def __init__(self, somelist):
> self.sequence = somelist[:]
>
> def __getitem__(self, key):
> if isinstance(key, slice):
> start, stop, step = key.indices(len(self.sequence))
> # print 'Slice says start, stop, step are:', start,
> stop, step
> return self.sequence[start : stop : step]
>
>
> print range(10) [None : None : -2]
> print BuggerAll(range(10))[None : None : -2]
>
> The above prints:
>
> [9, 7, 5, 3, 1]
> []
>
> Un-commenting the print statement in __getitem__ shows:
>
> Slice says start, stop, step are: 9 -1 -2
>
> The slice object seems to think that -1 is a valid exclusive
> bound, but when using it to actually slice, Python interprets
> negative numbers as an offset from the high end of the sequence.
>
> Good start-stop-step values are (9, None, -2), or (9, -11, -2),
> or (-1, -11, -2). The later two have the advantage of being
> consistend with the documented behavior of returning three
> integers.

I suspect there's a reason that it's done this way, but I agree with you
that this seems strange. Have you filed a bug report on Sourceforge?

BTW, a simpler example of the same phenomenon is:

py> range(10)[slice(None, None, -2)]

[9, 7, 5, 3, 1]

py> slice(None, None, -2).indices(10)
(9, -1, -2)
py> range(10)[9:-1:-2]
[]

STeVe

### Bryan Olson

ungelesen,
11.08.2005, 22:14:2011.08.05
an
Steven Bethard wrote:
> I suspect there's a reason that it's done this way, but I agree with you
> that this seems strange. Have you filed a bug report on Sourceforge?

I gather that the slice class is young, so my guess is bug. I
filed the report -- my first Sourceforge bug report.

> BTW, a simpler example of the same phenomenon is:
>
> py> range(10)[slice(None, None, -2)]
> [9, 7, 5, 3, 1]
> py> slice(None, None, -2).indices(10)
> (9, -1, -2)
> py> range(10)[9:-1:-2]
> []

Ah, thanks.

--
--Bryan

### John Machin

ungelesen,
11.08.2005, 22:53:1911.08.05
an

>>> rt = range(10)
>>> rt[slice(None, None, -2)]

[9, 7, 5, 3, 1]

>>> rt[::-2]

[9, 7, 5, 3, 1]

>>> slice(None, None, -2).indices(10)
(9, -1, -2)

>>> [rt[x] for x in range(9, -1, -2)]

[9, 7, 5, 3, 1]
>>>

Looks good to me. indices has returned a usable (start, stop, step).
Maybe the docs need expanding.

### Bryan Olson

ungelesen,
12.08.2005, 00:08:4212.08.05
an
John Machin wrote:
> Steven Bethard wrote:
[...]

>> BTW, a simpler example of the same phenomenon is:
>>
>> py> range(10)[slice(None, None, -2)]
>> [9, 7, 5, 3, 1]
>> py> slice(None, None, -2).indices(10)
>> (9, -1, -2)
>> py> range(10)[9:-1:-2]
>> []
>>
>
> >>> rt = range(10)
> >>> rt[slice(None, None, -2)]
> [9, 7, 5, 3, 1]
> >>> rt[::-2]
> [9, 7, 5, 3, 1]
> >>> slice(None, None, -2).indices(10)
> (9, -1, -2)
> >>> [rt[x] for x in range(9, -1, -2)]
> [9, 7, 5, 3, 1]
> >>>
>
> Looks good to me. indices has returned a usable (start, stop, step).
> Maybe the docs need expanding.

But not a usable [start: stop: step], which is what 'slice' is

--
--Bryan

### Michael Hudson

ungelesen,
12.08.2005, 10:51:0312.08.05
an
Bryan Olson <fakea...@nowhere.org> writes:

> The Python slice type has one method 'indices', and reportedly:
>
> This method takes a single integer argument /length/ and
> computes information about the extended slice that the slice
> object would describe if applied to a sequence of length
> items. It returns a tuple of three integers; respectively
> these are the /start/ and /stop/ indices and the /step/ or
> stride length of the slice. Missing or out-of-bounds indices
> are handled in a manner consistent with regular slices.
>
> http://docs.python.org/ref/types.html
>
>
> It behaves incorrectly

In some sense; it certainly does what I intended it to do.

> when step is negative and the slice includes the 0 index.
>
>
> class BuggerAll:
>
> def __init__(self, somelist):
> self.sequence = somelist[:]
>
> def __getitem__(self, key):
> if isinstance(key, slice):
> start, stop, step = key.indices(len(self.sequence))
> # print 'Slice says start, stop, step are:', start,
> stop, step
> return self.sequence[start : stop : step]

But if that's what you want to do with the slice object, just write

start, stop, step = key.start, key.stop, key.step

return self.sequence[start : stop : step]

or even

return self.sequence[key]

What the values returned from indices are for is to pass to the
range() function, more or less. They're not intended to be
interpreted in the way things passed to __getitem__ are.

(Well, _actually_ the main motivation for writing .indices() was to
use it in unittests...)

> print range(10) [None : None : -2]
> print BuggerAll(range(10))[None : None : -2]
>
>
> The above prints:
>
> [9, 7, 5, 3, 1]
> []
>
> Un-commenting the print statement in __getitem__ shows:
>
> Slice says start, stop, step are: 9 -1 -2
>
> The slice object seems to think that -1 is a valid exclusive
> bound,

It is, when you're doing arithmetic, which is what the client code to
PySlice_GetIndicesEx() which in turn is what indices() is a thin
wrapper of, does

> but when using it to actually slice, Python interprets negative
> numbers as an offset from the high end of the sequence.
>
> Good start-stop-step values are (9, None, -2), or (9, -11, -2),
> or (-1, -11, -2). The later two have the advantage of being
> consistend with the documented behavior of returning three
> integers.

I'm not going to change the behaviour. The docs probably aren't
especially clear, though.

Cheers,
mwh

--
(ps: don't feed the lawyers: they just lose their fear of humans)
-- Peter Wood, comp.lang.lisp

### bryanjuggler...@yahoo.com

ungelesen,
15.08.2005, 22:54:4215.08.05
an

Michael Hudson wrote:

> Bryan Olson writes:
> In some sense; it certainly does what I intended it to do.

[...]

> I'm not going to change the behaviour. The docs probably aren't
> especially clear, though.

The docs and the behavior contradict:

[...] these are the /start/ and /stop/ indices and the
/step/ or stride length of the slice [emphasis added].

I'm fine with your favored behavior. What do we do next to get
the doc fixed?

--
--Bryan

### Michael Hudson

ungelesen,
18.08.2005, 03:22:5118.08.05
an
bryanjuggler...@yahoo.com writes:

I guess one of us comes up with some less misleading words. It's not
totally obvious to me what to do, seeing as the returned values *are*
indices is a sense, just not the sense in which they are used in
Python. Any ideas?

Cheers,
mwh

--
First of all, email me your AOL password as a security measure. You
may find that won't be able to connect to the 'net for a while. This
is normal. The next thing to do is turn your computer upside down
and shake it to reboot it. -- Darren Tucker, asr

### Steven Bethard

ungelesen,
18.08.2005, 10:34:4518.08.05
an
Michael Hudson wrote:

> bryanjuggler...@yahoo.com writes:
>> I'm fine with your favored behavior. What do we do next to get
>> the doc fixed?
>
> I guess one of us comes up with some less misleading words. It's not
> totally obvious to me what to do, seeing as the returned values *are*
> indices is a sense, just not the sense in which they are used in
> Python. Any ideas?

Maybe you could replace:

"these are the start and stop indices and the step or stride length of
the slice"

with

"these are start, stop and step values suitable for passing to range or
xrange"

I wanted to say something about what happens with a negative stride, to
indicate that it produces (9, -1, -2) instead of (-1, -11, -2), but I
wasn't able to navigate the Python documentation well enough.

Looking at the Language Reference section on the slice type[1] (section
3.2), I find that "Missing or out-of-bounds indices are handled in a
manner consistent with regular slices." So I looked for the
documentation of "regular slices". My best guess was that this meant
looking at the Language Reference on slicings[2]. But all I could find
in this documentation about the "stride" argument was:

"The conversion of a proper slice is a slice object (see section 3.2)
whose start, stop and step attributes are the values of the expressions
given as lower bound, upper bound and stride, respectively, substituting
None for missing expressions."

This feels circular to me. Can someone help me find where the semantics
of a negative stride index is defined?

Steve

### Steven Bethard

ungelesen,
18.08.2005, 11:17:2018.08.05
an
I wrote:
> I wanted to say something about what happens with a negative stride, to
> indicate that it produces (9, -1, -2) instead of (-1, -11, -2), but I
> wasn't able to navigate the Python documentation well enough.
>
> Looking at the Language Reference section on the slice type[1] (section
> 3.2), I find that "Missing or out-of-bounds indices are handled in a
> manner consistent with regular slices." So I looked for the
> documentation of "regular slices". My best guess was that this meant
> looking at the Language Reference on slicings[2]. But all I could find
> in this documentation about the "stride" argument was:
>
> "The conversion of a proper slice is a slice object (see section 3.2)
> whose start, stop and step attributes are the values of the expressions
> given as lower bound, upper bound and stride, respectively, substituting
> None for missing expressions."
>
> This feels circular to me. Can someone help me find where the semantics
> of a negative stride index is defined?

Well, I couldn't find where the general semantics of a negative stride
index are defined, but for sequences at least[1]:

"The slice of s from i to j with step k is defined as the sequence of
items with index x = i + n*k such that 0 <= n < (j-i)/k."

This seems to contradict list behavior though.
range(10)[9:-1:-2] == []
But the values of n that satisfy
0 <= n < (-1 - 9)/-2 = -10/-2 = 5
are 0, 1, 2, 3, 4, corresponding to the x values of 9, 7, 5, 3, 1. But
[range(10)[x] for x in [9, 7, 5, 3, 1]] == [9, 7, 5, 3, 1]
Does this mean that there's a bug in the list object?

STeVe

### Bryan Olson

ungelesen,
20.08.2005, 15:22:1220.08.05
an
Steven Bethard wrote:
> Well, I couldn't find where the general semantics of a negative stride
> index are defined, but for sequences at least[1]:
>
> "The slice of s from i to j with step k is defined as the sequence of
> items with index x = i + n*k such that 0 <= n < (j-i)/k."
>
> This seems to contradict list behavior though. [...]

The conclusion is inescapable: Python's handling of negative
subscripts is a wart. Indexing from the high end is too useful
to give up, but it should be specified by the slicing/indexing
operation, not by the value of the index expression.

PPEP (Proposed Python Enhancement Proposal): New-Style Indexing

sequence[start : stop : step]

new-style slicing uses the syntax:

sequence[start ; stop ; step]

It works like current slicing, except that negative start or
stop values do not trigger from-the-high-end interpretation.
Omissions and None work the same as in old-style slicing.

Within the square-brackets, the '\$' symbol stands for the length
of the sequence. One can index from the high end by subtracting
the index from '\$'. Instead of:

seq[3 : -4]

we write:

seq[3 ; \$ - 4]

When square-brackets appear within other square-brackets, the
inner-most bracket-pair determines which sequence '\$' describes.
(Perhaps '\$\$' should be the length of the next containing
bracket pair, and '\$\$\$' the next-out and...?)

So far, I don't think the proposal breaks anything; let's keep
it that way. The next bit is tricky...

Obviously '\$' should also work in simple (non-slice) indexing.

seq[-2]

we write:

seq[\$ - 2]

So really seq[-2] should be out-of-bounds. Alas, that would
break way too much code. For now, simple indexing with a
negative subscript (and no '\$') should continue to index from
the high end, as a deprecated feature. The presence of '\$'
always indicates new-style slicing, so a programmer who needs a
negative index to trigger a range error can write:

seq[(\$ - \$) + index]

An Alternative Variant:

Suppose instead of using semicolons as the PPEP proposes, we use
commas, as in:

sequence[start, stop, step]

Commas are already in use to form tuples, and we let them do
just that. A slice is a subscript that is a tuple (or perhaps we
should allow any sequence). We could just as well write:

index_tuple = (start, stop, step)
sequence[index_tuple]

This variant *reduces* the number and complexity of rules that
define Python semantics. There is no special interpretation of
the comma, and no need for a distinct slice type.

The '\$' character works as in the PPEP above. It is undefined
outside square brackets, but that makes no real difference; the
programmer can use len(sequence).

This variant might break some tricky code.

--
--Bryan

### Steven Bethard

ungelesen,
20.08.2005, 17:33:2220.08.05
an
Bryan Olson wrote:
> Steven Bethard wrote:
> > Well, I couldn't find where the general semantics of a negative stride
> > index are defined, but for sequences at least[1]:
> >
> > "The slice of s from i to j with step k is defined as the sequence of
> > items with index x = i + n*k such that 0 <= n < (j-i)/k."
> >
> > This seems to contradict list behavior though. [...]
>
> The conclusion is inescapable: Python's handling of negative
> subscripts is a wart.

I'm not sure I'd go that far. Note that my confusion above was the
order of combination of points (3) and (5) on the page quoted above[1].
I think the problem is not the subscript handling so much as the
documentation patch based on that message [3].

> Suppose instead of using semicolons as the PPEP proposes, we use
> commas, as in:
>
> sequence[start, stop, step]

This definitely won't work. This is already valid syntax, and is used
heavily by the numarray/numeric folks.

STeVe

### Kay Schluehr

ungelesen,
21.08.2005, 04:20:5321.08.05
an
Steven Bethard wrote:

> "The slice of s from i to j with step k is defined as the sequence of
> items with index x = i + n*k such that 0 <= n < (j-i)/k."
>
> This seems to contradict list behavior though.
> range(10)[9:-1:-2] == []

No, both is correct. But we don't have to interpret the second slice
argument m as the limit j of the above definition. For positive values
of m the identity
m==j holds. For negative values of m we have j = max(0,i+m). This is
consistent with the convenient negative indexing:

>>> range(9)[-1] == range(9)[8]

If we remember how -1 is interpreted as an index not as some limit the
behaviour makes perfect sense.

Kay

### Kay Schluehr

ungelesen,
21.08.2005, 04:29:5021.08.05
an
Bryan Olson wrote:
> Steven Bethard wrote:
> > Well, I couldn't find where the general semantics of a negative stride
> > index are defined, but for sequences at least[1]:
> >
> > "The slice of s from i to j with step k is defined as the sequence of
> > items with index x = i + n*k such that 0 <= n < (j-i)/k."
> >
> > This seems to contradict list behavior though. [...]
>
> The conclusion is inescapable: Python's handling of negative
> subscripts is a wart. Indexing from the high end is too useful
> to give up, but it should be specified by the slicing/indexing
> operation, not by the value of the index expression.

It is a Python gotcha, but the identity X[-1] == X[len(X)-1] holds and
is very usefull IMO. If you want to slice to the bottom, take 0 as
bottom value. The docs have to be extended in this respect.

Kay

### Paul Rubin

ungelesen,
21.08.2005, 05:41:5421.08.05
an
Bryan Olson <fakea...@nowhere.org> writes:
> seq[3 : -4]
>
> we write:
>
> seq[3 ; \$ - 4]

+1

> When square-brackets appear within other square-brackets, the
> inner-most bracket-pair determines which sequence '\$' describes.
> (Perhaps '\$\$' should be the length of the next containing
> bracket pair, and '\$\$\$' the next-out and...?)

Not sure. \$1, \$2, etc. might be better, or \$<tag> like in regexps, etc.

> So really seq[-2] should be out-of-bounds. Alas, that would
> break way too much code. For now, simple indexing with a
> negative subscript (and no '\$') should continue to index from
> the high end, as a deprecated feature. The presence of '\$'
> always indicates new-style slicing, so a programmer who needs a
> negative index to trigger a range error can write:
>
> seq[(\$ - \$) + index]

+1

> Commas are already in use to form tuples, and we let them do
> just that. A slice is a subscript that is a tuple (or perhaps we
> should allow any sequence). We could just as well write:
>
> index_tuple = (start, stop, step)
> sequence[index_tuple]

Hmm, tuples are hashable and are already valid indices to mapping
objects like dictionaries. Having slices means an object can
implement both the mapping and sequence interfaces. Whether that's
worth caring about, I don't know.

### Bryan Olson

ungelesen,
24.08.2005, 09:28:1624.08.05
an
Paul Rubin wrote:

> Bryan Olson writes:
>
>> seq[3 : -4]
>>
>>we write:
>>
>> seq[3 ; \$ - 4]
>
>
> +1

I think you're wrong about the "+1". I defined '\$' to stand for
the length of the sequence (not the address of the last
element).

>>When square-brackets appear within other square-brackets, the
>>inner-most bracket-pair determines which sequence '\$' describes.
>>(Perhaps '\$\$' should be the length of the next containing
>>bracket pair, and '\$\$\$' the next-out and...?)
>
> Not sure. \$1, \$2, etc. might be better, or \$<tag> like in regexps, etc.

Sounds reasonable.

[...]

> Hmm, tuples are hashable and are already valid indices to mapping
> objects like dictionaries. Having slices means an object can
> implement both the mapping and sequence interfaces. Whether that's
> worth caring about, I don't know.

Yeah, I thought that alternative might break peoples code, and
it turns out it does.

--
--Bryan

### Bryan Olson

ungelesen,
24.08.2005, 09:42:1124.08.05
an
Kay Schluehr wrote:
> Bryan Olson wrote:
>
>>Steven Bethard wrote:
>> > Well, I couldn't find where the general semantics of a negative stride
>> > index are defined, but for sequences at least[1]:
>> >
>> > "The slice of s from i to j with step k is defined as the sequence of
>> > items with index x = i + n*k such that 0 <= n < (j-i)/k."
>> >
>> > This seems to contradict list behavior though. [...]
>>
>>The conclusion is inescapable: Python's handling of negative
>>subscripts is a wart. Indexing from the high end is too useful
>>to give up, but it should be specified by the slicing/indexing
>>operation, not by the value of the index expression.
>
>
> It is a Python gotcha, but the identity X[-1] == X[len(X)-1] holds and
> is very usefull IMO.

No question index-from-the-far-end is useful, but I think
special-casing some otherwise-out-of-bounds indexes is a
mistake.

Are there any cases in popular Python code where my proposal
would not allow as elegant a solution?

> If you want to slice to the bottom, take 0 as
> bottom value. The docs have to be extended in this respect.

I'm not sure what you mean. Slicing with a negative step and a
stop value of zero will not reach the bottom (unless the
sequence is empty). In general, Python uses inclusive beginning
bounds and exclusive ending bounds. (The rule is frequently
stated incorrectly as "inclusive lower bounds and exclusive
upper bounds," which fails to consider negative increments.)

--
--Bryan

### Bryan Olson

ungelesen,
24.08.2005, 11:03:0224.08.05
an
Kay Schluehr wrote:
> Steven Bethard wrote:
>>"The slice of s from i to j with step k is defined as the sequence of
>>items with index x = i + n*k such that 0 <= n < (j-i)/k."
>>
>>This seems to contradict list behavior though.
>> range(10)[9:-1:-2] == []
>
>
> No, both is correct. But we don't have to interpret the second slice
> argument m as the limit j of the above definition.

Even if "we don't have to," it sure reads like we should.

> For positive values
> of m the identity
> m==j holds. For negative values of m we have j = max(0,i+m).

First, the definition from the doc is still ambiguous: Is the
division in

0 <= n < (j-i)/k

real division, or is it Python integer (truncating) division? It
matters.

Second, the rule Kay Schluehr states is wrong for either type
of division. Look at:

range(5)[4 : -6 : -2]

Since Python is so programmer-friendly, I wrote some code to
make the "look at" task easy:

slice_definition = """"

The slice of s from i to j with step k is defined as the sequence of
items with index x = i + n*k such that 0 <= n < (j-i)/k.
"""

Kay_Schluehr_rule = """

For positive values of m the identity m==j holds. For negative values
of m we have j = max(0,i+m).
"""

def m_to_j(i, m):
""" Compute slice_definition's 'j' according to Kay_Schluehr_rule
when the slice of sequence is specified as,
sequence[i : m : k].
"""
if m > 0:
j = m
else:
j = max(0, i + m)
return j

def extract_slice(sequence, i, m, k, div_type='i'):
""" Apply the slice definition with Kay Schluehr's rule to find
what the slice should be. Pass div_type of 'i' to use integer
division, or 'f' for float (~real) division, in the
slice_definition expression,
(j-i)/k.
"""
j = m_to_j(i, m)
result = []
n = 0
if div_type == 'i':
end_bound = (j - i) / k
else:
assert div_type == 'f', "div_type must be 'i' or 'f'."
end_bound = float(j - i) / k
while n < end_bound:
result.append(sequence[i + n * k])
n += 1
return result

def show(sequence, i, m, k):
""" Print what happens, both actually and according to stated rules.
"""
print "Checking: %s[%d : %d : %d]" % (sequence, i, m, k)
print "actual :", sequence[i : m : k]
print "Kay's rule, int division :", extract_slice(sequence, i, m, k)
print "Kay's rule, real division:", extract_slice(sequence, i, m,
k, 'f')
print

show(range(5), 4, -6, -2)

--
--Bryan

### Bryan Olson

ungelesen,
24.08.2005, 11:21:2224.08.05
an
Steven Bethard wrote:
> Bryan Olson wrote:
>
>> Steven Bethard wrote:
>> > Well, I couldn't find where the general semantics of a negative
stride
>> > index are defined, but for sequences at least[1]:
>> >
>> > "The slice of s from i to j with step k is defined as the sequence of
>> > items with index x = i + n*k such that 0 <= n < (j-i)/k."
>> >
>> > This seems to contradict list behavior though. [...]
>>
>> The conclusion is inescapable: Python's handling of negative
>> subscripts is a wart.
>
>
> I'm not sure I'd go that far. Note that my confusion above was the
> order of combination of points (3) and (5) on the page quoted above[1].
> I think the problem is not the subscript handling so much as the
> documentation thereof.

Any bug can be pseudo-fixed by changing the documentation to
conform to the behavior. Here, the doc clearly went wrong by
expecting Python's behavior to follow from a few consistent
rules. The special-case handling of negative indexes looks
handy, but raises more difficulties than people realized.

I believe my PPEP avoids the proliferation of special cases. The
one additional issue I've discovered is that user-defined types
that are to support __getitem__ and/or __setitem__ *must* also
implement __len__. Sensible sequence types already do, so I
don't think it's much of an issue.

> This is already valid syntax, and is used
> heavily by the numarray/numeric folks.

Yeah, I thought that variant might break some code. I didn't
know it would be that much. Forget that variant.

--
--Bryan

### Robert Kern

ungelesen,
24.08.2005, 15:08:5224.08.05
an pytho...@python.org
Bryan Olson wrote:
> Paul Rubin wrote:
> > Bryan Olson writes:
> >
> >> seq[3 : -4]
> >>
> >>we write:
> >>
> >> seq[3 ; \$ - 4]
> >
> > +1
>
> I think you're wrong about the "+1". I defined '\$' to stand for
> the length of the sequence (not the address of the last
> element).

By "+1" he means, "I like it." He's not correcting you.

--
Robert Kern
rk...@ucsd.edu

"In the fields of hell where the grass grows high
Are the graves of dreams allowed to die."
-- Richard Harter

### Bryan Olson

ungelesen,
24.08.2005, 22:50:2224.08.05
an
Robert Kern wrote:

> By "+1" he means, "I like it." He's not correcting you.

Ah, O.K. Thanks.

--
--Bryan

### Bryan Olson

ungelesen,
24.08.2005, 23:33:1024.08.05
an

The doc for the find() method of string objects, which is
essentially the same as the string.find() function, states:

find(sub[, start[, end]])
Return the lowest index in the string where substring sub
is found, such that sub is contained in the range [start,
end). Optional arguments start and end are interpreted as

Consider:

print 'Hello'.find('o')

or:

import string
print string.find('Hello', 'o')

The substring 'o' is found in 'Hello' at the index -1, and at
the index 4, and it is not found at any other index. Both the
locations found are in the range [start, end), and obviously -1
is less than 4, so according to the documentation, find() should
return -1.

What the either of the above actually prints is:

4

which shows yet another bug resulting from Python's handling of
negative indexes. This one is clearly a documentation error, but
the real fix is to cure the wart so that Python's behavior is
consistent enough that we'll be able to describe it correctly.

--
--Bryan

### Steve Holden

ungelesen,
25.08.2005, 00:05:1825.08.05
an pytho...@python.org
Do you just go round looking for trouble?

As far as position reporting goes, it seems pretty clear that find()
will always report positive index values. In a five-character string
then -1 and 4 are effectively equivalent.

What on earth makes you call this a bug? And what are you proposing that
find() should return if the substring isn't found at all? please don't
suggest it should raise an exception, as index() exists to provide that
functionality.

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/

### Casey Hawthorne

ungelesen,
25.08.2005, 00:57:3525.08.05
an
>contained in the range [start, end)

Does range(start, end) generate negative integers in Python if start
>= 0 and end >= start?
--
Regards,
Casey

### en.kar...@ospaz.ru

ungelesen,
25.08.2005, 02:45:5925.08.05
an pytho...@python.org
On Thu, 25 Aug 2005 00:05:18 -0400
Steve Holden wrote:

> What on earth makes you call this a bug? And what are you proposing that
> find() should return if the substring isn't found at all? please don't
> suggest it should raise an exception, as index() exists to provide that
> functionality.

Returning -1 looks like C-ism for me. It could better return None when none
is found.

index = "Hello".find("z")
if index is not None:
# ...

Now it's too late for it, I know.

--
jk

### Paul Rubin

ungelesen,
25.08.2005, 14:22:1625.08.05
an
Steve Holden <st...@holdenweb.com> writes:
> As far as position reporting goes, it seems pretty clear that find()
> will always report positive index values. In a five-character string
> then -1 and 4 are effectively equivalent.
>
> What on earth makes you call this a bug? And what are you proposing
> that find() should return if the substring isn't found at all? please
> don't suggest it should raise an exception, as index() exists to
> provide that functionality.

Bryan is making the case that Python's use of negative subscripts to
measure from the end of sequences is bogus, and that it should be done
some other way instead. I've certainly had bugs in my own programs
related to that "feature".

### Bryan Olson

ungelesen,
25.08.2005, 18:30:2325.08.05
an

> Do you just go round looking for trouble?

In the course of programming, yes, absolutly.

> As far as position reporting goes, it seems pretty clear that find()
> will always report positive index values. In a five-character string
> then -1 and 4 are effectively equivalent.
>
> What on earth makes you call this a bug?

What you just said, versus what the doc says.

> And what are you proposing that
> find() should return if the substring isn't found at all? please don't
> suggest it should raise an exception, as index() exists to provide that
> functionality.

There are a number of good options. A legal index is not one of
them.

--
--Bryan

### Antoon Pardon

ungelesen,
26.08.2005, 04:22:3326.08.05
an
Op 2005-08-25, Bryan Olson schreef <fakea...@nowhere.org>:

IMO, with find a number of "features" of python come together.
that create an awkward situation.

1) 0 is a false value, but indexes start at 0 so you can't
return 0 to indicate nothing was found.

2) -1 is returned, which is both a true value and a legal
index.

It probably is too late now, but I always felt, find should
have returned None when the substring isn't found.

--
Antoon Pardon

### Bryan Olson

ungelesen,
26.08.2005, 05:37:3126.08.05
an
Antoon Pardon wrote:
> Bryan Olson schreef:
>

>>>And what are you proposing that
>>>find() should return if the substring isn't found at all? please don't
>>>suggest it should raise an exception, as index() exists to provide that
>>>functionality.
>>
>>There are a number of good options. A legal index is not one of
>>them.
>
> IMO, with find a number of "features" of python come together.
> that create an awkward situation.
>
> 1) 0 is a false value, but indexes start at 0 so you can't
> return 0 to indicate nothing was found.
>
> 2) -1 is returned, which is both a true value and a legal
> index.
>
> It probably is too late now, but I always felt, find should
> have returned None when the substring isn't found.

None is certainly a reasonable candidate. The one-past-the-end
value, len(sequence), would be fine, and follows the preferred
idiom of C/C++. I don't see any elegant way to arrange for
successful finds always to return a true value and unsuccessful
calls to return a false value.

The really broken part is that unsuccessful searches return a
legal index.

My suggestion doesn't change what find() returns, and doesn't
break code. Negative one is a reasonable choice to represent an
unsuccessful search -- provided it is not a legal index. Instead
of changing what find() returns, we should heal the
special-case-when-index-is-negative-in-a-certain-range wart.

--
--Bryan

### Rick Wotnaz

ungelesen,
26.08.2005, 07:20:3326.08.05
an
Bryan Olson <fakea...@nowhere.org> wrote in
news:3ErPe.853\$sV7...@newssvr21.news.prodigy.com:

Practically speaking, what difference would it make? Supposing find
returned None for not-found. How would you use it in your code that
would make it superior to what happens now? In either case you
would have to test for the not-found state before relying on the
index returned, wouldn't you? Or do you have a use that would
eliminate that step?

--
rzed

### Steve Holden

ungelesen,
26.08.2005, 12:32:1726.08.05
an pytho...@python.org
We might agree, before further discussion, that this isn't the most
elegant part of Python's design, and it's down to history that this tiny
little wart remains.

> My suggestion doesn't change what find() returns, and doesn't
> break code. Negative one is a reasonable choice to represent an
> unsuccessful search -- provided it is not a legal index. Instead
> of changing what find() returns, we should heal the
> special-case-when-index-is-negative-in-a-certain-range wart.
>
>

What I don't understand is why you want it to return something that
isn't a legal index. Before using the result you always have to perform
a test to discriminate between the found and not found cases. So I don't
really see why this wart has put such a bug up your ass.

### Bryan Olson

ungelesen,
26.08.2005, 14:46:2726.08.05
an
Steve Holden wrote:
> Bryan Olson wrote:
>> Antoon Pardon wrote:

>> > It probably is too late now, but I always felt, find should
>> > have returned None when the substring isn't found.
>>
>> None is certainly a reasonable candidate.

[...]

>> The really broken part is that unsuccessful searches return a
>> legal index.
>>
> We might agree, before further discussion, that this isn't the most
> elegant part of Python's design, and it's down to history that this tiny
> little wart remains.

I don't think my proposal breaks historic Python code, and I
don't think it has the same kind of unfortunate subtle
consequences as the current indexing scheme. You may think the
wart is tiny, but the duct-tape* is available so let's cure it.

>> My suggestion doesn't change what find() returns, and doesn't
>> break code. Negative one is a reasonable choice to represent an
>> unsuccessful search -- provided it is not a legal index. Instead
>> of changing what find() returns, we should heal the
>> special-case-when-index-is-negative-in-a-certain-range wart.
>>
>>
> What I don't understand is why you want it to return something that
> isn't a legal index.

In this case, so that errors are caught as close to their
occurrence as possible. I see no good reason for the following
to happily print 'y'.

s = 'buggy'
print s[s.find('w')]

> Before using the result you always have to perform
> a test to discriminate between the found and not found cases. So I don't
> really see why this wart has put such a bug up your ass.

The bug that got me was what a slice object reports as the
'stop' bound when the step is negative and the slice includes
index 0. Took me hours to figure out why my code was failing.

The double-meaning of -1, as both an exclusive stopping bound
and an alias for the highest valid index, is just plain whacked.
Unfortunately, as negative indexes are currently handled, there
is no it-just-works value that slice could return.

--
--Bryan

### Reinhold Birkenfeld

ungelesen,
26.08.2005, 14:57:1126.08.05
an
Bryan Olson wrote:
> Steve Holden wrote:
> > Bryan Olson wrote:
> >> Antoon Pardon wrote:
>
> >> > It probably is too late now, but I always felt, find should
> >> > have returned None when the substring isn't found.
> >>
> >> None is certainly a reasonable candidate.
> [...]
> >> The really broken part is that unsuccessful searches return a
> >> legal index.
> >>
> > We might agree, before further discussion, that this isn't the most
> > elegant part of Python's design, and it's down to history that this tiny
> > little wart remains.
>
> I don't think my proposal breaks historic Python code, and I
> don't think it has the same kind of unfortunate subtle
> consequences as the current indexing scheme. You may think the
> wart is tiny, but the duct-tape* is available so let's cure it.
>

Well, nobody stops you from posting this on python-dev and be screamed
at by Guido...

just-kidding-ly
Reinhold

### Terry Reedy

ungelesen,
26.08.2005, 15:28:2926.08.05
an pytho...@python.org

"Bryan Olson" <fakea...@nowhere.org> wrote in message
news:7sJPe.573\$MN5...@newssvr25.news.prodigy.net...

> The double-meaning of -1, as both an exclusive stopping bound
> and an alias for the highest valid index, is just plain whacked.

I agree in this sense: the use of any int as an error return is an
unPythonic *nix-Cism, which I believe was copied therefrom. Str.find is
redundant with the Pythonic exception-raising str.index and I think it
should be removed in Py3.

Therefore, I think changing it now is untimely and changing the language
because of it backwards.

Terry J. Reedy

### Paul Rubin

ungelesen,
26.08.2005, 15:35:5126.08.05
an
"Terry Reedy" <tjr...@udel.edu> writes:
> I agree in this sense: the use of any int as an error return is an
> unPythonic *nix-Cism, which I believe was copied therefrom. Str.find is
> redundant with the Pythonic exception-raising str.index and I think it
> should be removed in Py3.

I like having it available so you don't have to clutter your code with
try/except if the substring isn't there. But it should not return a
valid integer index.

### Terry Reedy

ungelesen,
26.08.2005, 17:02:4326.08.05
an pytho...@python.org

"Paul Rubin" <"http://phr.cx"@NOSPAM.invalid> wrote in message
news:7xmzn41...@ruckus.brouhaha.com...

> "Terry Reedy" <tjr...@udel.edu> writes:
>>Str.find is
>> redundant with the Pythonic exception-raising str.index
>> and I think it should be removed in Py3.
>
> I like having it available so you don't have to clutter your code with
> try/except if the substring isn't there. But it should not return a
> valid integer index.

The try/except pattern is a pretty basic part of Python's design. One
could say the same about clutter for *every* function or method that raises
an exception on invalid input. Should more or even all be duplicated? Why
just this one?

Terry J. Reedy

### Torsten Bronger

ungelesen,
26.08.2005, 17:22:1326.08.05
an
Hallöchen!

"Terry Reedy" <tjr...@udel.edu> writes:

Granted, try/except can be used for deliberate case discrimination
(which may even happen in the standard library in many places),
however, it is only the second most elegant method -- the most
elegant being "if". Where "if" does the job, it should be prefered
in my opinion.

Tschö,
Torsten.

--
Torsten Bronger, aquisgrana, europa vetus ICQ 264-296-646

### Paul Rubin

ungelesen,
26.08.2005, 17:31:5026.08.05
an
"Terry Reedy" <tjr...@udel.edu> writes:
> The try/except pattern is a pretty basic part of Python's design. One
> could say the same about clutter for *every* function or method that raises
> an exception on invalid input. Should more or even all be duplicated? Why
> just this one?

Someone must have thought str.find was worth having, or else it
wouldn't be in the library.

### Raymond Hettinger

ungelesen,
26.08.2005, 18:39:1026.08.05
an
Bryan Olson wrote:
> The conclusion is inescapable: Python's handling of negative
> subscripts is a wart. Indexing from the high end is too useful
> to give up, but it should be specified by the slicing/indexing
> operation, not by the value of the index expression.
>
>
> PPEP (Proposed Python Enhancement Proposal): New-Style Indexing
>
>
> sequence[start : stop : step]
>
> new-style slicing uses the syntax:
>
> sequence[start ; stop ; step]

<klingon>
Bah!
</klingon>

The pythonic way to handle negative slicing is to use reversed(). The
principle is that the mind more easily handles this in two steps,
specifying the range a forward direction, and then reversing it.

IOW, it is easier to identify the included elements and see the
direction of:

reversed(xrange(1, 20, 2))

than it is for:

xrange(19, -1, -2)

See PEP 322 for discussion and examples:
http://www.python.org/peps/pep-0322.html

Raymond

### Terry Reedy

ungelesen,
26.08.2005, 21:20:0226.08.05
an pytho...@python.org

"Paul Rubin" <"http://phr.cx"@NOSPAM.invalid> wrote in message
news:7xslwwj...@ruckus.brouhaha.com...

Well, Guido no longer thinks it worth having and emphatically agreed that
it should be added to one of the 'To be removed' sections of PEP 3000.

Terry J. Reedy

### Steve Holden

ungelesen,
26.08.2005, 22:16:2326.08.05
an pytho...@python.org
Of course. But onc you (sensibly) decide to use an "if" then there
really isn't much difference between -1, None, () and sys.maxint as
a sentinel value, is there?

Which is what I've been trying to say all along.

### Steve Holden

ungelesen,
26.08.2005, 22:13:3026.08.05
an pytho...@python.org
If you want an exception from your code when 'w' isn't in the string you
should consider using index() rather than find.

Otherwise, whatever find() returns you will have to have an "if" in
there to handle the not-found case.

This just sounds like whining to me. If you want to catch errors, use a
function that will raise an exception rather than relying on the
invalidity of the result.

### Robert Kern

ungelesen,
26.08.2005, 23:29:1426.08.05
an pytho...@python.org
Steve Holden wrote:

> Of course. But onc you (sensibly) decide to use an "if" then there
> really isn't much difference between -1, None, () and sys.maxint as
> a sentinel value, is there?

Sure there is. -1 is a valid index; None is not. -1 as a sentinel is
specific to str.find(); None is used all over Python as a sentinel.

If I may digress for a bit, my advisor is currently working on a project
that is processing seafloor depth datasets starting from a few decades
ago. A lot of this data was orginally to be processed using FORTRAN
software, so in the idiom of much FORTRAN software from those days, 9999
is often used to mark missing data. Unfortunately, 9999 is a perfectly
valid datum in most of the unit systems used by the various datasets.

Now he has to find a grad student to traul through the datasets and
clean up the really invalid 9999's (as well as other such fun tasks like
deciding if a dataset that says it's using feet is actually using meters).

I have already called "Not It."

### Paul Rubin

ungelesen,
27.08.2005, 00:05:0027.08.05
an
Steve Holden <st...@holdenweb.com> writes:
> Of course. But onc you (sensibly) decide to use an "if" then there
> really isn't much difference between -1, None, () and sys.maxint as
> a sentinel value, is there?

Of course there is. -1 is (under Python's perverse semantics) a valid
subscript. sys.maxint is an artifact of Python's fixed-size int
datatype, which is fading away under int/long unification, so it's
something that soon won't exist and shouldn't be used. None and ()
are invalid subscripts so would be reasonable return values, unlike -1
and sys.maxint. Of those, None is preferable to () because of its
semantic connotations.

### Paul Rubin

ungelesen,
27.08.2005, 00:08:0527.08.05
an
Steve Holden <st...@holdenweb.com> writes:
> If you want an exception from your code when 'w' isn't in the string
> you should consider using index() rather than find.

The idea is you expect w to be in the string. If w isn't in the
string, your code has a bug, and programs with bugs should fail as
early as possible so you can locate the bugs quickly and easily. That
is why, for example,

x = 'buggy'[None]

raises an exception instead of doing something stupid like returning 'g'.

### Terry Reedy

ungelesen,
27.08.2005, 03:59:0827.08.05
an pytho...@python.org

"Paul Rubin" <"http://phr.cx"@NOSPAM.invalid> wrote in message
news:7xslww1...@ruckus.brouhaha.com...

I agree here that None is importantly different from -1 for the reason
stated. The use of -1 is, I am sure, a holdover from statically typed
languages (C, in particular) that require all return values to be of the
same type, even if the 'return value' is actually meant to indicat that
there is no valid return value.

Terry J. Reedy

### Bryan Olson

ungelesen,
27.08.2005, 04:08:4727.08.05
an
Steve Holden wrote:
> Bryan Olson wrote:
>> [...] I see no good reason for the following

>> to happily print 'y'.
>>
>> s = 'buggy'
>> print s[s.find('w')]
>>
>> > Before using the result you always have to perform
>> > a test to discriminate between the found and not found cases. So I
>> don't
>> > really see why this wart has put such a bug up your ass.
>>
>> The bug that got me was what a slice object reports as the
>> 'stop' bound when the step is negative and the slice includes
>> index 0. Took me hours to figure out why my code was failing.
>>
>> The double-meaning of -1, as both an exclusive stopping bound
>> and an alias for the highest valid index, is just plain whacked.
>> Unfortunately, as negative indexes are currently handled, there
>> is no it-just-works value that slice could return.
>>
>>
> If you want an exception from your code when 'w' isn't in the string you
> should consider using index() rather than find.

That misses the point. The code is a hypothetical example of
what a novice or imperfect Pythoners might have to deal with.
The exception isn't really wanted; it's just vastly superior to
silently returning a nonsensical value.

> Otherwise, whatever find() returns you will have to have an "if" in
> there to handle the not-found case.
>
> This just sounds like whining to me. If you want to catch errors, use a
> function that will raise an exception rather than relying on the
> invalidity of the result.

I suppose if you ignore the real problems and the proposed
solution, it might sound a lot like whining.

--
--Bryan

### Steve Holden

ungelesen,
27.08.2005, 11:15:4627.08.05