Bug in slice type

Bryan Olson

ungelesen,

10.08.2005, 10:54:5410.08.05

an

The Python slice type has one method 'indices', and reportedly:

This method takes a single integer argument /length/ and
computes information about the extended slice that the slice
object would describe if applied to a sequence of length
items. It returns a tuple of three integers; respectively
these are the /start/ and /stop/ indices and the /step/ or
stride length of the slice. Missing or out-of-bounds indices
are handled in a manner consistent with regular slices.

http://docs.python.org/ref/types.html

It behaves incorrectly when step is negative and the slice
includes the 0 index.

class BuggerAll:

def __init__(self, somelist):
self.sequence = somelist[:]

def __getitem__(self, key):
if isinstance(key, slice):
start, stop, step = key.indices(len(self.sequence))
# print 'Slice says start, stop, step are:', start,
stop, step
return self.sequence[start : stop : step]

print range(10) [None : None : -2]
print BuggerAll(range(10))[None : None : -2]

The above prints:

[9, 7, 5, 3, 1]
[]

Un-commenting the print statement in __getitem__ shows:

Slice says start, stop, step are: 9 -1 -2

The slice object seems to think that -1 is a valid exclusive
bound, but when using it to actually slice, Python interprets
negative numbers as an offset from the high end of the sequence.

Good start-stop-step values are (9, None, -2), or (9, -11, -2),
or (-1, -11, -2). The later two have the advantage of being
consistend with the documented behavior of returning three
integers.

--
--Bryan

Steven Bethard

ungelesen,

11.08.2005, 15:35:3411.08.05

an

Bryan Olson wrote:
>
> class BuggerAll:
>
> def __init__(self, somelist):
> self.sequence = somelist[:]
>
> def __getitem__(self, key):
> if isinstance(key, slice):
> start, stop, step = key.indices(len(self.sequence))
> # print 'Slice says start, stop, step are:', start,
> stop, step
> return self.sequence[start : stop : step]
>
>
> print range(10) [None : None : -2]
> print BuggerAll(range(10))[None : None : -2]
>
> The above prints:
>
> [9, 7, 5, 3, 1]
> []
>
> Un-commenting the print statement in __getitem__ shows:
>
> Slice says start, stop, step are: 9 -1 -2
>
> The slice object seems to think that -1 is a valid exclusive
> bound, but when using it to actually slice, Python interprets
> negative numbers as an offset from the high end of the sequence.
>
> Good start-stop-step values are (9, None, -2), or (9, -11, -2),
> or (-1, -11, -2). The later two have the advantage of being
> consistend with the documented behavior of returning three
> integers.

I suspect there's a reason that it's done this way, but I agree with you
that this seems strange. Have you filed a bug report on Sourceforge?

BTW, a simpler example of the same phenomenon is:

py> range(10)[slice(None, None, -2)]

[9, 7, 5, 3, 1]

py> slice(None, None, -2).indices(10)
(9, -1, -2)
py> range(10)[9:-1:-2]
[]

STeVe

Bryan Olson

ungelesen,

11.08.2005, 22:14:2011.08.05

an

Steven Bethard wrote:
> I suspect there's a reason that it's done this way, but I agree with you
> that this seems strange. Have you filed a bug report on Sourceforge?

I gather that the slice class is young, so my guess is bug. I
filed the report -- my first Sourceforge bug report.

> BTW, a simpler example of the same phenomenon is:
>
> py> range(10)[slice(None, None, -2)]
> [9, 7, 5, 3, 1]
> py> slice(None, None, -2).indices(10)
> (9, -1, -2)
> py> range(10)[9:-1:-2]
> []

Ah, thanks.

--
--Bryan

John Machin

ungelesen,

11.08.2005, 22:53:1911.08.05

an

>>> rt = range(10)
>>> rt[slice(None, None, -2)]

[9, 7, 5, 3, 1]

>>> rt[::-2]

[9, 7, 5, 3, 1]

>>> slice(None, None, -2).indices(10)
(9, -1, -2)

>>> [rt[x] for x in range(9, -1, -2)]

[9, 7, 5, 3, 1]
>>>

Looks good to me. indices has returned a usable (start, stop, step).
Maybe the docs need expanding.

Bryan Olson

ungelesen,

12.08.2005, 00:08:4212.08.05

an

John Machin wrote:
> Steven Bethard wrote:
[...]

>> BTW, a simpler example of the same phenomenon is:
>>
>> py> range(10)[slice(None, None, -2)]
>> [9, 7, 5, 3, 1]
>> py> slice(None, None, -2).indices(10)
>> (9, -1, -2)
>> py> range(10)[9:-1:-2]
>> []
>>
>
> >>> rt = range(10)
> >>> rt[slice(None, None, -2)]
> [9, 7, 5, 3, 1]
> >>> rt[::-2]
> [9, 7, 5, 3, 1]
> >>> slice(None, None, -2).indices(10)
> (9, -1, -2)
> >>> [rt[x] for x in range(9, -1, -2)]
> [9, 7, 5, 3, 1]
> >>>
>
> Looks good to me. indices has returned a usable (start, stop, step).
> Maybe the docs need expanding.

But not a usable [start: stop: step], which is what 'slice' is
all about.

--
--Bryan

Michael Hudson

ungelesen,

12.08.2005, 10:51:0312.08.05

an

Bryan Olson <fakea...@nowhere.org> writes:

> The Python slice type has one method 'indices', and reportedly:
>
> This method takes a single integer argument /length/ and
> computes information about the extended slice that the slice
> object would describe if applied to a sequence of length
> items. It returns a tuple of three integers; respectively
> these are the /start/ and /stop/ indices and the /step/ or
> stride length of the slice. Missing or out-of-bounds indices
> are handled in a manner consistent with regular slices.
>
> http://docs.python.org/ref/types.html
>
>
> It behaves incorrectly

In some sense; it certainly does what I intended it to do.

> when step is negative and the slice includes the 0 index.
>
>
> class BuggerAll:
>
> def __init__(self, somelist):
> self.sequence = somelist[:]
>
> def __getitem__(self, key):
> if isinstance(key, slice):
> start, stop, step = key.indices(len(self.sequence))
> # print 'Slice says start, stop, step are:', start,
> stop, step
> return self.sequence[start : stop : step]

But if that's what you want to do with the slice object, just write

start, stop, step = key.start, key.stop, key.step

return self.sequence[start : stop : step]

or even

return self.sequence[key]

What the values returned from indices are for is to pass to the
range() function, more or less. They're not intended to be
interpreted in the way things passed to __getitem__ are.

(Well, _actually_ the main motivation for writing .indices() was to
use it in unittests...)

> print range(10) [None : None : -2]
> print BuggerAll(range(10))[None : None : -2]
>
>
> The above prints:
>
> [9, 7, 5, 3, 1]
> []
>
> Un-commenting the print statement in __getitem__ shows:
>
> Slice says start, stop, step are: 9 -1 -2
>
> The slice object seems to think that -1 is a valid exclusive
> bound,

It is, when you're doing arithmetic, which is what the client code to
PySlice_GetIndicesEx() which in turn is what indices() is a thin
wrapper of, does

> but when using it to actually slice, Python interprets negative
> numbers as an offset from the high end of the sequence.
>
> Good start-stop-step values are (9, None, -2), or (9, -11, -2),
> or (-1, -11, -2). The later two have the advantage of being
> consistend with the documented behavior of returning three
> integers.

I'm not going to change the behaviour. The docs probably aren't
especially clear, though.

Cheers,
mwh

--
(ps: don't feed the lawyers: they just lose their fear of humans)
-- Peter Wood, comp.lang.lisp

bryanjuggler...@yahoo.com

ungelesen,

15.08.2005, 22:54:4215.08.05

an

Michael Hudson wrote:

> Bryan Olson writes:
> In some sense; it certainly does what I intended it to do.

[...]

> I'm not going to change the behaviour. The docs probably aren't
> especially clear, though.

The docs and the behavior contradict:

[...] these are the /start/ and /stop/ indices and the
/step/ or stride length of the slice [emphasis added].

I'm fine with your favored behavior. What do we do next to get
the doc fixed?

--
--Bryan

Michael Hudson

ungelesen,

18.08.2005, 03:22:5118.08.05

an

bryanjuggler...@yahoo.com writes:

I guess one of us comes up with some less misleading words. It's not
totally obvious to me what to do, seeing as the returned values *are*
indices is a sense, just not the sense in which they are used in
Python. Any ideas?

Cheers,
mwh

--
First of all, email me your AOL password as a security measure. You
may find that won't be able to connect to the 'net for a while. This
is normal. The next thing to do is turn your computer upside down
and shake it to reboot it. -- Darren Tucker, asr

Steven Bethard

ungelesen,

18.08.2005, 10:34:4518.08.05

an

Michael Hudson wrote:

> bryanjuggler...@yahoo.com writes:
>> I'm fine with your favored behavior. What do we do next to get
>> the doc fixed?
>
> I guess one of us comes up with some less misleading words. It's not
> totally obvious to me what to do, seeing as the returned values *are*
> indices is a sense, just not the sense in which they are used in
> Python. Any ideas?

Maybe you could replace:

"these are the start and stop indices and the step or stride length of
the slice"

with

"these are start, stop and step values suitable for passing to range or
xrange"

I wanted to say something about what happens with a negative stride, to
indicate that it produces (9, -1, -2) instead of (-1, -11, -2), but I
wasn't able to navigate the Python documentation well enough.

Looking at the Language Reference section on the slice type[1] (section
3.2), I find that "Missing or out-of-bounds indices are handled in a
manner consistent with regular slices." So I looked for the
documentation of "regular slices". My best guess was that this meant
looking at the Language Reference on slicings[2]. But all I could find
in this documentation about the "stride" argument was:

"The conversion of a proper slice is a slice object (see section 3.2)
whose start, stop and step attributes are the values of the expressions
given as lower bound, upper bound and stride, respectively, substituting
None for missing expressions."

This feels circular to me. Can someone help me find where the semantics
of a negative stride index is defined?

Steve

[1] http://docs.python.org/ref/types.html
[2] http://docs.python.org/ref/slicings.html

Steven Bethard

ungelesen,

18.08.2005, 11:17:2018.08.05

an

I wrote:
> I wanted to say something about what happens with a negative stride, to
> indicate that it produces (9, -1, -2) instead of (-1, -11, -2), but I
> wasn't able to navigate the Python documentation well enough.
>
> Looking at the Language Reference section on the slice type[1] (section
> 3.2), I find that "Missing or out-of-bounds indices are handled in a
> manner consistent with regular slices." So I looked for the
> documentation of "regular slices". My best guess was that this meant
> looking at the Language Reference on slicings[2]. But all I could find
> in this documentation about the "stride" argument was:
>
> "The conversion of a proper slice is a slice object (see section 3.2)
> whose start, stop and step attributes are the values of the expressions
> given as lower bound, upper bound and stride, respectively, substituting
> None for missing expressions."
>
> This feels circular to me. Can someone help me find where the semantics
> of a negative stride index is defined?

Well, I couldn't find where the general semantics of a negative stride
index are defined, but for sequences at least[1]:

"The slice of s from i to j with step k is defined as the sequence of
items with index x = i + n*k such that 0 <= n < (j-i)/k."

This seems to contradict list behavior though.
range(10)[9:-1:-2] == []
But the values of n that satisfy
0 <= n < (-1 - 9)/-2 = -10/-2 = 5
are 0, 1, 2, 3, 4, corresponding to the x values of 9, 7, 5, 3, 1. But
[range(10)[x] for x in [9, 7, 5, 3, 1]] == [9, 7, 5, 3, 1]
Does this mean that there's a bug in the list object?

STeVe

[1] http://docs.python.org/lib/typesseq.html

Bryan Olson

ungelesen,

20.08.2005, 15:22:1220.08.05

an

Steven Bethard wrote:
> Well, I couldn't find where the general semantics of a negative stride
> index are defined, but for sequences at least[1]:
>
> "The slice of s from i to j with step k is defined as the sequence of
> items with index x = i + n*k such that 0 <= n < (j-i)/k."
>

> This seems to contradict list behavior though. [...]

The conclusion is inescapable: Python's handling of negative
subscripts is a wart. Indexing from the high end is too useful
to give up, but it should be specified by the slicing/indexing
operation, not by the value of the index expression.

PPEP (Proposed Python Enhancement Proposal): New-Style Indexing

Instead of:

sequence[start : stop : step]

new-style slicing uses the syntax:

sequence[start ; stop ; step]

It works like current slicing, except that negative start or
stop values do not trigger from-the-high-end interpretation.
Omissions and None work the same as in old-style slicing.

Within the square-brackets, the '$' symbol stands for the length
of the sequence. One can index from the high end by subtracting
the index from '$'. Instead of:

seq[3 : -4]

we write:

seq[3 ; $ - 4]

When square-brackets appear within other square-brackets, the
inner-most bracket-pair determines which sequence '$' describes.
(Perhaps '$$' should be the length of the next containing
bracket pair, and '$$$' the next-out and...?)

So far, I don't think the proposal breaks anything; let's keep
it that way. The next bit is tricky...

Obviously '$' should also work in simple (non-slice) indexing.
Instead of:

seq[-2]

we write:

seq[$ - 2]

So really seq[-2] should be out-of-bounds. Alas, that would
break way too much code. For now, simple indexing with a
negative subscript (and no '$') should continue to index from
the high end, as a deprecated feature. The presence of '$'
always indicates new-style slicing, so a programmer who needs a
negative index to trigger a range error can write:

seq[($ - $) + index]

An Alternative Variant:

Suppose instead of using semicolons as the PPEP proposes, we use
commas, as in:

sequence[start, stop, step]

Commas are already in use to form tuples, and we let them do
just that. A slice is a subscript that is a tuple (or perhaps we
should allow any sequence). We could just as well write:

index_tuple = (start, stop, step)
sequence[index_tuple]

This variant *reduces* the number and complexity of rules that
define Python semantics. There is no special interpretation of
the comma, and no need for a distinct slice type.

The '$' character works as in the PPEP above. It is undefined
outside square brackets, but that makes no real difference; the
programmer can use len(sequence).

This variant might break some tricky code.

--
--Bryan

Steven Bethard

ungelesen,

20.08.2005, 17:33:2220.08.05

an

Bryan Olson wrote:
> Steven Bethard wrote:
> > Well, I couldn't find where the general semantics of a negative stride
> > index are defined, but for sequences at least[1]:
> >
> > "The slice of s from i to j with step k is defined as the sequence of
> > items with index x = i + n*k such that 0 <= n < (j-i)/k."
> >
> > This seems to contradict list behavior though. [...]
>
> The conclusion is inescapable: Python's handling of negative
> subscripts is a wart.

I'm not sure I'd go that far. Note that my confusion above was the
order of combination of points (3) and (5) on the page quoted above[1].
I think the problem is not the subscript handling so much as the
documentation thereof. I posted a message about this [2], and a
documentation patch based on that message [3].

[1] http://docs.python.org/lib/typesseq.html
[2] http://mail.python.org/pipermail/python-list/2005-August/295260.html
[3] http://www.python.org/sf/1265100

> Suppose instead of using semicolons as the PPEP proposes, we use
> commas, as in:
>
> sequence[start, stop, step]

This definitely won't work. This is already valid syntax, and is used
heavily by the numarray/numeric folks.

STeVe

Kay Schluehr

ungelesen,

21.08.2005, 04:20:5321.08.05

an

Steven Bethard wrote:

> "The slice of s from i to j with step k is defined as the sequence of
> items with index x = i + n*k such that 0 <= n < (j-i)/k."
>
> This seems to contradict list behavior though.
> range(10)[9:-1:-2] == []

No, both is correct. But we don't have to interpret the second slice
argument m as the limit j of the above definition. For positive values
of m the identity
m==j holds. For negative values of m we have j = max(0,i+m). This is
consistent with the convenient negative indexing:

>>> range(9)[-1] == range(9)[8]

If we remember how -1 is interpreted as an index not as some limit the
behaviour makes perfect sense.

Kay

Kay Schluehr

ungelesen,

21.08.2005, 04:29:5021.08.05

an

Bryan Olson wrote:
> Steven Bethard wrote:
> > Well, I couldn't find where the general semantics of a negative stride
> > index are defined, but for sequences at least[1]:
> >
> > "The slice of s from i to j with step k is defined as the sequence of
> > items with index x = i + n*k such that 0 <= n < (j-i)/k."
> >
> > This seems to contradict list behavior though. [...]
>
> The conclusion is inescapable: Python's handling of negative
> subscripts is a wart. Indexing from the high end is too useful
> to give up, but it should be specified by the slicing/indexing
> operation, not by the value of the index expression.

It is a Python gotcha, but the identity X[-1] == X[len(X)-1] holds and
is very usefull IMO. If you want to slice to the bottom, take 0 as
bottom value. The docs have to be extended in this respect.

Kay

Paul Rubin

ungelesen,

21.08.2005, 05:41:5421.08.05

an

Bryan Olson <fakea...@nowhere.org> writes:
> seq[3 : -4]
>
> we write:
>
> seq[3 ; $ - 4]

+1

> When square-brackets appear within other square-brackets, the
> inner-most bracket-pair determines which sequence '$' describes.
> (Perhaps '$$' should be the length of the next containing
> bracket pair, and '$$$' the next-out and...?)

Not sure. $1, $2, etc. might be better, or $<tag> like in regexps, etc.

> So really seq[-2] should be out-of-bounds. Alas, that would
> break way too much code. For now, simple indexing with a
> negative subscript (and no '$') should continue to index from
> the high end, as a deprecated feature. The presence of '$'
> always indicates new-style slicing, so a programmer who needs a
> negative index to trigger a range error can write:
>
> seq[($ - $) + index]

+1

> Commas are already in use to form tuples, and we let them do
> just that. A slice is a subscript that is a tuple (or perhaps we
> should allow any sequence). We could just as well write:
>
> index_tuple = (start, stop, step)
> sequence[index_tuple]

Hmm, tuples are hashable and are already valid indices to mapping
objects like dictionaries. Having slices means an object can
implement both the mapping and sequence interfaces. Whether that's
worth caring about, I don't know.

Bryan Olson

ungelesen,

24.08.2005, 09:28:1624.08.05

an

Paul Rubin wrote:

> Bryan Olson writes:
>
>> seq[3 : -4]
>>
>>we write:
>>
>> seq[3 ; $ - 4]
>
>
> +1

I think you're wrong about the "+1". I defined '$' to stand for
the length of the sequence (not the address of the last
element).

>>When square-brackets appear within other square-brackets, the
>>inner-most bracket-pair determines which sequence '$' describes.
>>(Perhaps '$$' should be the length of the next containing
>>bracket pair, and '$$$' the next-out and...?)
>
> Not sure. $1, $2, etc. might be better, or $<tag> like in regexps, etc.

Sounds reasonable.

[...]

> Hmm, tuples are hashable and are already valid indices to mapping
> objects like dictionaries. Having slices means an object can
> implement both the mapping and sequence interfaces. Whether that's
> worth caring about, I don't know.

Yeah, I thought that alternative might break peoples code, and
it turns out it does.

--
--Bryan

Bryan Olson

ungelesen,

24.08.2005, 09:42:1124.08.05

an

Kay Schluehr wrote:
> Bryan Olson wrote:
>
>>Steven Bethard wrote:
>> > Well, I couldn't find where the general semantics of a negative stride
>> > index are defined, but for sequences at least[1]:
>> >
>> > "The slice of s from i to j with step k is defined as the sequence of
>> > items with index x = i + n*k such that 0 <= n < (j-i)/k."
>> >
>> > This seems to contradict list behavior though. [...]
>>
>>The conclusion is inescapable: Python's handling of negative
>>subscripts is a wart. Indexing from the high end is too useful
>>to give up, but it should be specified by the slicing/indexing
>>operation, not by the value of the index expression.
>
>
> It is a Python gotcha, but the identity X[-1] == X[len(X)-1] holds and
> is very usefull IMO.

No question index-from-the-far-end is useful, but I think
special-casing some otherwise-out-of-bounds indexes is a
mistake.

Are there any cases in popular Python code where my proposal
would not allow as elegant a solution?

> If you want to slice to the bottom, take 0 as
> bottom value. The docs have to be extended in this respect.

I'm not sure what you mean. Slicing with a negative step and a
stop value of zero will not reach the bottom (unless the
sequence is empty). In general, Python uses inclusive beginning
bounds and exclusive ending bounds. (The rule is frequently
stated incorrectly as "inclusive lower bounds and exclusive
upper bounds," which fails to consider negative increments.)

--
--Bryan

Bryan Olson

ungelesen,

24.08.2005, 11:03:0224.08.05

an

Kay Schluehr wrote:
> Steven Bethard wrote:
>>"The slice of s from i to j with step k is defined as the sequence of
>>items with index x = i + n*k such that 0 <= n < (j-i)/k."
>>
>>This seems to contradict list behavior though.
>> range(10)[9:-1:-2] == []
>
>
> No, both is correct. But we don't have to interpret the second slice
> argument m as the limit j of the above definition.

Even if "we don't have to," it sure reads like we should.

> For positive values
> of m the identity
> m==j holds. For negative values of m we have j = max(0,i+m).

First, the definition from the doc is still ambiguous: Is the
division in

0 <= n < (j-i)/k

real division, or is it Python integer (truncating) division? It
matters.

Second, the rule Kay Schluehr states is wrong for either type
of division. Look at:

range(5)[4 : -6 : -2]

Since Python is so programmer-friendly, I wrote some code to
make the "look at" task easy:

slice_definition = """"

The slice of s from i to j with step k is defined as the sequence of
items with index x = i + n*k such that 0 <= n < (j-i)/k.
"""

Kay_Schluehr_rule = """

For positive values of m the identity m==j holds. For negative values
of m we have j = max(0,i+m).
"""

def m_to_j(i, m):
""" Compute slice_definition's 'j' according to Kay_Schluehr_rule
when the slice of sequence is specified as,
sequence[i : m : k].
"""
if m > 0:
j = m
else:
j = max(0, i + m)
return j

def extract_slice(sequence, i, m, k, div_type='i'):
""" Apply the slice definition with Kay Schluehr's rule to find
what the slice should be. Pass div_type of 'i' to use integer
division, or 'f' for float (~real) division, in the
slice_definition expression,
(j-i)/k.
"""
j = m_to_j(i, m)
result = []
n = 0
if div_type == 'i':
end_bound = (j - i) / k
else:
assert div_type == 'f', "div_type must be 'i' or 'f'."
end_bound = float(j - i) / k
while n < end_bound:
result.append(sequence[i + n * k])
n += 1
return result

def show(sequence, i, m, k):
""" Print what happens, both actually and according to stated rules.
"""
print "Checking: %s[%d : %d : %d]" % (sequence, i, m, k)
print "actual :", sequence[i : m : k]
print "Kay's rule, int division :", extract_slice(sequence, i, m, k)
print "Kay's rule, real division:", extract_slice(sequence, i, m,
k, 'f')
print

show(range(5), 4, -6, -2)

--
--Bryan

Bryan Olson

ungelesen,

24.08.2005, 11:21:2224.08.05

an

Steven Bethard wrote:
> Bryan Olson wrote:
>
>> Steven Bethard wrote:
>> > Well, I couldn't find where the general semantics of a negative
stride
>> > index are defined, but for sequences at least[1]:
>> >
>> > "The slice of s from i to j with step k is defined as the sequence of
>> > items with index x = i + n*k such that 0 <= n < (j-i)/k."
>> >
>> > This seems to contradict list behavior though. [...]
>>
>> The conclusion is inescapable: Python's handling of negative
>> subscripts is a wart.
>
>
> I'm not sure I'd go that far. Note that my confusion above was the
> order of combination of points (3) and (5) on the page quoted above[1].
> I think the problem is not the subscript handling so much as the
> documentation thereof.

Any bug can be pseudo-fixed by changing the documentation to
conform to the behavior. Here, the doc clearly went wrong by
expecting Python's behavior to follow from a few consistent
rules. The special-case handling of negative indexes looks
handy, but raises more difficulties than people realized.

I believe my PPEP avoids the proliferation of special cases. The
one additional issue I've discovered is that user-defined types
that are to support __getitem__ and/or __setitem__ *must* also
implement __len__. Sensible sequence types already do, so I
don't think it's much of an issue.

> This is already valid syntax, and is used
> heavily by the numarray/numeric folks.

Yeah, I thought that variant might break some code. I didn't
know it would be that much. Forget that variant.

--
--Bryan

Robert Kern

ungelesen,

24.08.2005, 15:08:5224.08.05

an pytho...@python.org

Bryan Olson wrote:
> Paul Rubin wrote:
> > Bryan Olson writes:
> >
> >> seq[3 : -4]
> >>
> >>we write:
> >>
> >> seq[3 ; $ - 4]
> >
> > +1
>
> I think you're wrong about the "+1". I defined '$' to stand for
> the length of the sequence (not the address of the last
> element).

By "+1" he means, "I like it." He's not correcting you.

--
Robert Kern
rk...@ucsd.edu

"In the fields of hell where the grass grows high
Are the graves of dreams allowed to die."
-- Richard Harter

Bryan Olson

ungelesen,

24.08.2005, 22:50:2224.08.05

an

Robert Kern wrote:

> By "+1" he means, "I like it." He's not correcting you.

Ah, O.K. Thanks.

--
--Bryan

Bryan Olson

ungelesen,

24.08.2005, 23:33:1024.08.05

an

The doc for the find() method of string objects, which is
essentially the same as the string.find() function, states:

find(sub[, start[, end]])
Return the lowest index in the string where substring sub
is found, such that sub is contained in the range [start,
end). Optional arguments start and end are interpreted as
in slice notation. Return -1 if sub is not found.

Consider:

print 'Hello'.find('o')

or:

import string
print string.find('Hello', 'o')

The substring 'o' is found in 'Hello' at the index -1, and at
the index 4, and it is not found at any other index. Both the
locations found are in the range [start, end), and obviously -1
is less than 4, so according to the documentation, find() should
return -1.

What the either of the above actually prints is:

4

which shows yet another bug resulting from Python's handling of
negative indexes. This one is clearly a documentation error, but
the real fix is to cure the wart so that Python's behavior is
consistent enough that we'll be able to describe it correctly.

--
--Bryan

Steve Holden

ungelesen,

25.08.2005, 00:05:1825.08.05

an pytho...@python.org

Do you just go round looking for trouble?

As far as position reporting goes, it seems pretty clear that find()
will always report positive index values. In a five-character string
then -1 and 4 are effectively equivalent.

What on earth makes you call this a bug? And what are you proposing that
find() should return if the substring isn't found at all? please don't
suggest it should raise an exception, as index() exists to provide that
functionality.

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/

Casey Hawthorne

ungelesen,

25.08.2005, 00:57:3525.08.05

an

>contained in the range [start, end)

Does range(start, end) generate negative integers in Python if start
>= 0 and end >= start?
--
Regards,
Casey

en.kar...@ospaz.ru

ungelesen,

25.08.2005, 02:45:5925.08.05

an pytho...@python.org

On Thu, 25 Aug 2005 00:05:18 -0400
Steve Holden wrote:

> What on earth makes you call this a bug? And what are you proposing that
> find() should return if the substring isn't found at all? please don't
> suggest it should raise an exception, as index() exists to provide that
> functionality.

Returning -1 looks like C-ism for me. It could better return None when none
is found.

index = "Hello".find("z")
if index is not None:
# ...

Now it's too late for it, I know.

--
jk

Paul Rubin

ungelesen,

25.08.2005, 14:22:1625.08.05

an

Steve Holden <st...@holdenweb.com> writes:
> As far as position reporting goes, it seems pretty clear that find()
> will always report positive index values. In a five-character string
> then -1 and 4 are effectively equivalent.
>
> What on earth makes you call this a bug? And what are you proposing
> that find() should return if the substring isn't found at all? please
> don't suggest it should raise an exception, as index() exists to
> provide that functionality.

Bryan is making the case that Python's use of negative subscripts to
measure from the end of sequences is bogus, and that it should be done
some other way instead. I've certainly had bugs in my own programs
related to that "feature".

Bryan Olson

ungelesen,

25.08.2005, 18:30:2325.08.05

an

Steve Holden asked:

> Do you just go round looking for trouble?

In the course of programming, yes, absolutly.

> As far as position reporting goes, it seems pretty clear that find()
> will always report positive index values. In a five-character string
> then -1 and 4 are effectively equivalent.
>
> What on earth makes you call this a bug?

What you just said, versus what the doc says.

> And what are you proposing that
> find() should return if the substring isn't found at all? please don't
> suggest it should raise an exception, as index() exists to provide that
> functionality.

There are a number of good options. A legal index is not one of
them.

--
--Bryan

Antoon Pardon

ungelesen,

26.08.2005, 04:22:3326.08.05

an

Op 2005-08-25, Bryan Olson schreef <fakea...@nowhere.org>:

IMO, with find a number of "features" of python come together.
that create an awkward situation.

1) 0 is a false value, but indexes start at 0 so you can't
return 0 to indicate nothing was found.

2) -1 is returned, which is both a true value and a legal
index.

It probably is too late now, but I always felt, find should
have returned None when the substring isn't found.

--
Antoon Pardon

Bryan Olson

ungelesen,

26.08.2005, 05:37:3126.08.05

an

Antoon Pardon wrote:
> Bryan Olson schreef:
>
>>Steve Holden asked:

>>>And what are you proposing that
>>>find() should return if the substring isn't found at all? please don't
>>>suggest it should raise an exception, as index() exists to provide that
>>>functionality.
>>
>>There are a number of good options. A legal index is not one of
>>them.
>
> IMO, with find a number of "features" of python come together.
> that create an awkward situation.
>
> 1) 0 is a false value, but indexes start at 0 so you can't
> return 0 to indicate nothing was found.
>
> 2) -1 is returned, which is both a true value and a legal
> index.
>
> It probably is too late now, but I always felt, find should
> have returned None when the substring isn't found.

None is certainly a reasonable candidate. The one-past-the-end
value, len(sequence), would be fine, and follows the preferred
idiom of C/C++. I don't see any elegant way to arrange for
successful finds always to return a true value and unsuccessful
calls to return a false value.

The really broken part is that unsuccessful searches return a
legal index.

My suggestion doesn't change what find() returns, and doesn't
break code. Negative one is a reasonable choice to represent an
unsuccessful search -- provided it is not a legal index. Instead
of changing what find() returns, we should heal the
special-case-when-index-is-negative-in-a-certain-range wart.

--
--Bryan

Rick Wotnaz

ungelesen,

26.08.2005, 07:20:3326.08.05

an

Bryan Olson <fakea...@nowhere.org> wrote in
news:3ErPe.853$sV7...@newssvr21.news.prodigy.com:

Practically speaking, what difference would it make? Supposing find
returned None for not-found. How would you use it in your code that
would make it superior to what happens now? In either case you
would have to test for the not-found state before relying on the
index returned, wouldn't you? Or do you have a use that would
eliminate that step?

--
rzed

Steve Holden

ungelesen,

26.08.2005, 12:32:1726.08.05

an pytho...@python.org

We might agree, before further discussion, that this isn't the most
elegant part of Python's design, and it's down to history that this tiny
little wart remains.

> My suggestion doesn't change what find() returns, and doesn't
> break code. Negative one is a reasonable choice to represent an
> unsuccessful search -- provided it is not a legal index. Instead
> of changing what find() returns, we should heal the
> special-case-when-index-is-negative-in-a-certain-range wart.
>
>

What I don't understand is why you want it to return something that
isn't a legal index. Before using the result you always have to perform
a test to discriminate between the found and not found cases. So I don't
really see why this wart has put such a bug up your ass.

Bryan Olson

ungelesen,

26.08.2005, 14:46:2726.08.05

an

Steve Holden wrote:
> Bryan Olson wrote:
>> Antoon Pardon wrote:

>> > It probably is too late now, but I always felt, find should
>> > have returned None when the substring isn't found.
>>
>> None is certainly a reasonable candidate.

[...]

>> The really broken part is that unsuccessful searches return a
>> legal index.
>>
> We might agree, before further discussion, that this isn't the most
> elegant part of Python's design, and it's down to history that this tiny
> little wart remains.

I don't think my proposal breaks historic Python code, and I
don't think it has the same kind of unfortunate subtle
consequences as the current indexing scheme. You may think the
wart is tiny, but the duct-tape* is available so let's cure it.

[*] http://www.google.com/search?as_q=warts+%22duct+tape%22

>> My suggestion doesn't change what find() returns, and doesn't
>> break code. Negative one is a reasonable choice to represent an
>> unsuccessful search -- provided it is not a legal index. Instead
>> of changing what find() returns, we should heal the
>> special-case-when-index-is-negative-in-a-certain-range wart.
>>
>>
> What I don't understand is why you want it to return something that
> isn't a legal index.

In this case, so that errors are caught as close to their
occurrence as possible. I see no good reason for the following
to happily print 'y'.

s = 'buggy'
print s[s.find('w')]

> Before using the result you always have to perform
> a test to discriminate between the found and not found cases. So I don't
> really see why this wart has put such a bug up your ass.

The bug that got me was what a slice object reports as the
'stop' bound when the step is negative and the slice includes
index 0. Took me hours to figure out why my code was failing.

The double-meaning of -1, as both an exclusive stopping bound
and an alias for the highest valid index, is just plain whacked.
Unfortunately, as negative indexes are currently handled, there
is no it-just-works value that slice could return.

--
--Bryan

Reinhold Birkenfeld

ungelesen,

26.08.2005, 14:57:1126.08.05

an

Bryan Olson wrote:
> Steve Holden wrote:
> > Bryan Olson wrote:
> >> Antoon Pardon wrote:
>
> >> > It probably is too late now, but I always felt, find should
> >> > have returned None when the substring isn't found.
> >>
> >> None is certainly a reasonable candidate.
> [...]
> >> The really broken part is that unsuccessful searches return a
> >> legal index.
> >>
> > We might agree, before further discussion, that this isn't the most
> > elegant part of Python's design, and it's down to history that this tiny
> > little wart remains.
>
> I don't think my proposal breaks historic Python code, and I
> don't think it has the same kind of unfortunate subtle
> consequences as the current indexing scheme. You may think the
> wart is tiny, but the duct-tape* is available so let's cure it.
>
> [*] http://www.google.com/search?as_q=warts+%22duct+tape%22

Well, nobody stops you from posting this on python-dev and be screamed
at by Guido...

just-kidding-ly
Reinhold

Terry Reedy

ungelesen,

26.08.2005, 15:28:2926.08.05

an pytho...@python.org

"Bryan Olson" <fakea...@nowhere.org> wrote in message
news:7sJPe.573$MN5...@newssvr25.news.prodigy.net...

> The double-meaning of -1, as both an exclusive stopping bound
> and an alias for the highest valid index, is just plain whacked.

I agree in this sense: the use of any int as an error return is an
unPythonic *nix-Cism, which I believe was copied therefrom. Str.find is
redundant with the Pythonic exception-raising str.index and I think it
should be removed in Py3.

Therefore, I think changing it now is untimely and changing the language
because of it backwards.

Terry J. Reedy

Paul Rubin

ungelesen,

26.08.2005, 15:35:5126.08.05

an

"Terry Reedy" <tjr...@udel.edu> writes:
> I agree in this sense: the use of any int as an error return is an
> unPythonic *nix-Cism, which I believe was copied therefrom. Str.find is
> redundant with the Pythonic exception-raising str.index and I think it
> should be removed in Py3.

I like having it available so you don't have to clutter your code with
try/except if the substring isn't there. But it should not return a
valid integer index.

Terry Reedy

ungelesen,

26.08.2005, 17:02:4326.08.05

an pytho...@python.org

"Paul Rubin" <"http://phr.cx"@NOSPAM.invalid> wrote in message
news:7xmzn41...@ruckus.brouhaha.com...

> "Terry Reedy" <tjr...@udel.edu> writes:
>>Str.find is
>> redundant with the Pythonic exception-raising str.index
>> and I think it should be removed in Py3.
>
> I like having it available so you don't have to clutter your code with
> try/except if the substring isn't there. But it should not return a
> valid integer index.

The try/except pattern is a pretty basic part of Python's design. One
could say the same about clutter for *every* function or method that raises
an exception on invalid input. Should more or even all be duplicated? Why
just this one?

Terry J. Reedy

Torsten Bronger

ungelesen,

26.08.2005, 17:22:1326.08.05

an

Hallöchen!

"Terry Reedy" <tjr...@udel.edu> writes:

Granted, try/except can be used for deliberate case discrimination
(which may even happen in the standard library in many places),
however, it is only the second most elegant method -- the most
elegant being "if". Where "if" does the job, it should be prefered
in my opinion.

Tschö,
Torsten.

--
Torsten Bronger, aquisgrana, europa vetus ICQ 264-296-646

Paul Rubin

ungelesen,

26.08.2005, 17:31:5026.08.05

an

"Terry Reedy" <tjr...@udel.edu> writes:
> The try/except pattern is a pretty basic part of Python's design. One
> could say the same about clutter for *every* function or method that raises
> an exception on invalid input. Should more or even all be duplicated? Why
> just this one?

Someone must have thought str.find was worth having, or else it
wouldn't be in the library.

Raymond Hettinger

ungelesen,

26.08.2005, 18:39:1026.08.05

an

Bryan Olson wrote:
> The conclusion is inescapable: Python's handling of negative
> subscripts is a wart. Indexing from the high end is too useful
> to give up, but it should be specified by the slicing/indexing
> operation, not by the value of the index expression.
>
>

> PPEP (Proposed Python Enhancement Proposal): New-Style Indexing
>
> Instead of:
>
> sequence[start : stop : step]
>
> new-style slicing uses the syntax:
>
> sequence[start ; stop ; step]

The pythonic way to handle negative slicing is to use reversed(). The
principle is that the mind more easily handles this in two steps,
specifying the range a forward direction, and then reversing it.

IOW, it is easier to identify the included elements and see the
direction of:

reversed(xrange(1, 20, 2))

than it is for:

xrange(19, -1, -2)

See PEP 322 for discussion and examples:
http://www.python.org/peps/pep-0322.html

Raymond

Terry Reedy

ungelesen,

26.08.2005, 21:20:0226.08.05

an pytho...@python.org

"Paul Rubin" <"http://phr.cx"@NOSPAM.invalid> wrote in message

news:7xslwwj...@ruckus.brouhaha.com...

Well, Guido no longer thinks it worth having and emphatically agreed that
it should be added to one of the 'To be removed' sections of PEP 3000.

Terry J. Reedy

Steve Holden

ungelesen,

26.08.2005, 22:16:2326.08.05

an pytho...@python.org

Of course. But onc you (sensibly) decide to use an "if" then there
really isn't much difference between -1, None, () and sys.maxint as
a sentinel value, is there?

Which is what I've been trying to say all along.

Steve Holden

ungelesen,

26.08.2005, 22:13:3026.08.05

an pytho...@python.org

If you want an exception from your code when 'w' isn't in the string you
should consider using index() rather than find.

Otherwise, whatever find() returns you will have to have an "if" in
there to handle the not-found case.

This just sounds like whining to me. If you want to catch errors, use a
function that will raise an exception rather than relying on the
invalidity of the result.

Robert Kern

ungelesen,

26.08.2005, 23:29:1426.08.05

an pytho...@python.org

Steve Holden wrote:

> Of course. But onc you (sensibly) decide to use an "if" then there
> really isn't much difference between -1, None, () and sys.maxint as
> a sentinel value, is there?

Sure there is. -1 is a valid index; None is not. -1 as a sentinel is
specific to str.find(); None is used all over Python as a sentinel.

If I may digress for a bit, my advisor is currently working on a project
that is processing seafloor depth datasets starting from a few decades
ago. A lot of this data was orginally to be processed using FORTRAN
software, so in the idiom of much FORTRAN software from those days, 9999
is often used to mark missing data. Unfortunately, 9999 is a perfectly
valid datum in most of the unit systems used by the various datasets.

Now he has to find a grad student to traul through the datasets and
clean up the really invalid 9999's (as well as other such fun tasks like
deciding if a dataset that says it's using feet is actually using meters).

I have already called "Not It."

Paul Rubin

ungelesen,

27.08.2005, 00:05:0027.08.05

an

Steve Holden <st...@holdenweb.com> writes:
> Of course. But onc you (sensibly) decide to use an "if" then there
> really isn't much difference between -1, None, () and sys.maxint as
> a sentinel value, is there?

Of course there is. -1 is (under Python's perverse semantics) a valid
subscript. sys.maxint is an artifact of Python's fixed-size int
datatype, which is fading away under int/long unification, so it's
something that soon won't exist and shouldn't be used. None and ()
are invalid subscripts so would be reasonable return values, unlike -1
and sys.maxint. Of those, None is preferable to () because of its
semantic connotations.

Paul Rubin

ungelesen,

27.08.2005, 00:08:0527.08.05

an

Steve Holden <st...@holdenweb.com> writes:
> If you want an exception from your code when 'w' isn't in the string
> you should consider using index() rather than find.

The idea is you expect w to be in the string. If w isn't in the
string, your code has a bug, and programs with bugs should fail as
early as possible so you can locate the bugs quickly and easily. That
is why, for example,

x = 'buggy'[None]

raises an exception instead of doing something stupid like returning 'g'.

Terry Reedy

ungelesen,

27.08.2005, 03:59:0827.08.05

an pytho...@python.org

"Paul Rubin" <"http://phr.cx"@NOSPAM.invalid> wrote in message

news:7xslww1...@ruckus.brouhaha.com...

I agree here that None is importantly different from -1 for the reason
stated. The use of -1 is, I am sure, a holdover from statically typed
languages (C, in particular) that require all return values to be of the
same type, even if the 'return value' is actually meant to indicat that
there is no valid return value.

Terry J. Reedy

Bryan Olson

ungelesen,

27.08.2005, 04:08:4727.08.05

an

Steve Holden wrote:
> Bryan Olson wrote:

>> [...] I see no good reason for the following

>> to happily print 'y'.
>>
>> s = 'buggy'
>> print s[s.find('w')]
>>
>> > Before using the result you always have to perform
>> > a test to discriminate between the found and not found cases. So I
>> don't
>> > really see why this wart has put such a bug up your ass.
>>
>> The bug that got me was what a slice object reports as the
>> 'stop' bound when the step is negative and the slice includes
>> index 0. Took me hours to figure out why my code was failing.
>>
>> The double-meaning of -1, as both an exclusive stopping bound
>> and an alias for the highest valid index, is just plain whacked.
>> Unfortunately, as negative indexes are currently handled, there
>> is no it-just-works value that slice could return.
>>
>>
> If you want an exception from your code when 'w' isn't in the string you
> should consider using index() rather than find.

That misses the point. The code is a hypothetical example of
what a novice or imperfect Pythoners might have to deal with.
The exception isn't really wanted; it's just vastly superior to
silently returning a nonsensical value.

> Otherwise, whatever find() returns you will have to have an "if" in
> there to handle the not-found case.
>
> This just sounds like whining to me. If you want to catch errors, use a
> function that will raise an exception rather than relying on the
> invalidity of the result.

I suppose if you ignore the real problems and the proposed
solution, it might sound a lot like whining.

--
--Bryan

Steve Holden

ungelesen,

27.08.2005, 11:15:4627.08.05

an pytho...@python.org

You did read the sentence you were replying to, didn't you?

Steve Holden

ungelesen,

27.08.2005, 11:14:4627.08.05

an pytho...@python.org

Terry Reedy wrote:
> "Paul Rubin" <"http://phr.cx"@NOSPAM.invalid> wrote in message

While I agree that it would have been more sensible to choose None in
find()'s original design, there's really no reason to go breaking
existing code just to fix it.

Guido has already agreed that find() can change (or even disappear) in
Python 3.0, so please let's just leave things as they are for now.

A corrected find() that returns None on failure is a five-liner.

Paul Rubin

ungelesen,

27.08.2005, 15:19:0227.08.05

an

Steve Holden <st...@holdenweb.com> writes:
> A corrected find() that returns None on failure is a five-liner.

If I wanted to write five lines instead of one everywhere in a Python
program, I'd use Java.

sk...@pobox.com

ungelesen,

27.08.2005, 15:57:2527.08.05

an Paul Rubin, pytho...@python.org

Paul> Steve Holden <st...@holdenweb.com> writes:
>> A corrected find() that returns None on failure is a five-liner.

Paul> If I wanted to write five lines instead of one everywhere in a
Paul> Python program, I'd use Java.

+1 for QOTW.

Skip

Steve Holden

ungelesen,

27.08.2005, 16:31:2627.08.05

an pytho...@python.org

We are arguing about trivialities here. Let's stop before it gets
interesting :-)

Bryan Olson

ungelesen,

28.08.2005, 16:19:0728.08.05

an

Steve Holden wrote:

> Paul Rubin wrote:
> We are arguing about trivialities here. Let's stop before it gets
> interesting :-)

Some of us are looking beyond the trivia of what string.find()
should return, at an unfortunate interaction of Python features,
brought on by the special-casing of negative indexes. The wart
bites novice or imperfect Python programmers in simple cases
such as string.find(), or when their subscripts accidentally
fall off the low end. It bites programmers who want to fully
implement Python slicing, because of the double-and-
contradictory- interpretation of -1, as both an exclusive ending
bound and the index of the last element. It bites documentation
authors who naturally think of the non-negative subscript as
*the* index of a sequence item.

--
--Bryan

bearoph...@lycos.com

ungelesen,

28.08.2005, 16:46:0428.08.05

an

I agree with Bryan Olson.
I think it's a kind of bug, and it has to be fixed, like few other
things.

But I understand that this change can give little problems to the
already written code...

Bye,
bearophile

Steve Holden

ungelesen,

28.08.2005, 16:58:3028.08.05

an pytho...@python.org

Sure. I wrote two days ago:

> We might agree, before further discussion, that this isn't the most
> elegant part of Python's design, and it's down to history that this tiny
> little wart remains.

While I agree it's a trap for the unwary I still don't regard it as a
major wart. But I'm all in favor of discussions to make 3.0 a better
language.

Magnus Lycka

ungelesen,

29.08.2005, 03:04:4529.08.05

an

Robert Kern wrote:
> If I may digress for a bit, my advisor is currently working on a project
> that is processing seafloor depth datasets starting from a few decades
> ago. A lot of this data was orginally to be processed using FORTRAN
> software, so in the idiom of much FORTRAN software from those days, 9999
> is often used to mark missing data. Unfortunately, 9999 is a perfectly
> valid datum in most of the unit systems used by the various datasets.
>
> Now he has to find a grad student to traul through the datasets and
> clean up the really invalid 9999's (as well as other such fun tasks like
> deciding if a dataset that says it's using feet is actually using meters).

I'm afraid this didn't end with FORTRAN. It's not that long ago
that I wrote a program for my wife that combined a data editor
with a graph display, so that she could clean up time lines with
length and weight data for children (from an international research
project performed during the 90's). 99cm is not unreasonable as a
length, but if you see it in a graph with other length measurements,
it's easy to spot most of the false ones, just as mistyped year part
in a date (common in the beginning of a new year).

Perhaps graphics can help this grad student too? It's certainly much
easier to spot deviations in curves than in an endless line of
numbers if the curves would normally be reasonably smooth.

Antoon Pardon

ungelesen,

29.08.2005, 04:32:0329.08.05

an

Op 2005-08-27, Terry Reedy schreef <tjr...@udel.edu>:

I think a properly implented find is better than an index.

If we only have index, Then asking for permission is no longer a
possibility. If we have a find that returns None, we can either
ask permission before we index or be forgiven by the exception
that is raised.

--
Antoon Pardon

Antoon Pardon

ungelesen,

29.08.2005, 04:47:5229.08.05

an

Op 2005-08-27, Steve Holden schreef <st...@holdenweb.com>:

>>
>>
> If you want an exception from your code when 'w' isn't in the string you
> should consider using index() rather than find.

Sometimes it is convenient to have the exception thrown at a later
time.

> Otherwise, whatever find() returns you will have to have an "if" in
> there to handle the not-found case.

And maybe the more convenient place for this "if" is in a whole different
part of your program, a part where using -1 as an invalid index isn't
at all obvious.

> This just sounds like whining to me. If you want to catch errors, use a
> function that will raise an exception rather than relying on the
> invalidity of the result.

You always seem to look at such things in a very narrow scope. You never
seem to consider that various parts of a program have to work together.

So what happens if you have a module that is collecting string-index
pair, colleted from various other parts. In one part you
want to select the last letter, so you pythonically choose -1 as
index. In an other part you get a result of find and are happy
with -1 as an indictation for an invalid index. Then these
data meet.

--
Antoon Pardon

Robert Kern

ungelesen,

29.08.2005, 05:08:5829.08.05

an pytho...@python.org

Yes! In fact, that was the context of the discussion when my advisor
told me about this project. Another student had written an interactive
GUI for exploring bathymetry maps. My advisor: "That kind of thing would
be really great for this new project, etc. etc."

Steven Bethard

ungelesen,

29.08.2005, 13:03:2229.08.05

an

Antoon Pardon wrote:
> I think a properly implented find is better than an index.

See the current thread in python-dev[1], which proposes a new method,
str.partition(). I believe that Raymond Hettinger has shown that almost
all uses of str.find() can be more clearly be represented with his
proposed function.

STeVe

[1]http://mail.python.org/pipermail/python-dev/2005-August/055781.html

Steve Holden

ungelesen,

29.08.2005, 17:28:2229.08.05

an pytho...@python.org

Antoon Pardon wrote:
> Op 2005-08-27, Steve Holden schreef <st...@holdenweb.com>:
>
>>>
>>If you want an exception from your code when 'w' isn't in the string you
>>should consider using index() rather than find.
>
>
> Sometimes it is convenient to have the exception thrown at a later
> time.
>
>
>>Otherwise, whatever find() returns you will have to have an "if" in
>>there to handle the not-found case.
>
>
> And maybe the more convenient place for this "if" is in a whole different
> part of your program, a part where using -1 as an invalid index isn't
> at all obvious.
>
>
>>This just sounds like whining to me. If you want to catch errors, use a
>>function that will raise an exception rather than relying on the
>>invalidity of the result.
>
>
> You always seem to look at such things in a very narrow scope. You never
> seem to consider that various parts of a program have to work together.
>

Or perhaps it's just that I try not to mix parts inappropriately.

> So what happens if you have a module that is collecting string-index
> pair, colleted from various other parts. In one part you
> want to select the last letter, so you pythonically choose -1 as
> index. In an other part you get a result of find and are happy
> with -1 as an indictation for an invalid index. Then these
> data meet.
>

That's when debugging has to start. Mixing data of such types is
somewhat inadvisable, don't you agree?

I suppose I can't deny that people do things like that, myself included,
but mixing data sets where -1 is variously an error flag and a valid
index is only going to lead to trouble when the combined data is used.

Terry Reedy

ungelesen,

30.08.2005, 01:22:3130.08.05

an pytho...@python.org

"Steve Holden" <st...@holdenweb.com> wrote in message
news:devuln$bro$1...@sea.gmane.org...

> Antoon Pardon wrote:
>> So what happens if you have a module that is collecting string-index
>> pair, colleted from various other parts. In one part you
>> want to select the last letter, so you pythonically choose -1 as
>> index. In an other part you get a result of find and are happy
>> with -1 as an indictation for an invalid index. Then these
>> data meet.
>>

> That's when debugging has to start. Mixing data of such types is
> somewhat inadvisable, don't you agree?
>
> I suppose I can't deny that people do things like that, myself included,
> but mixing data sets where -1 is variously an error flag and a valid
> index is only going to lead to trouble when the combined data is used.

The fact that the -1 return *has* lead to bugs in actual code is the
primary reason Guido has currently decided that find and rfind should go.
A careful review of current usages in the standard library revealed at
least a couple bugs even there.

Terry J. Reedy

Paul Rubin

ungelesen,

30.08.2005, 01:54:0430.08.05

an

"Terry Reedy" <tjr...@udel.edu> writes:
> The fact that the -1 return *has* lead to bugs in actual code is the
> primary reason Guido has currently decided that find and rfind should go.
> A careful review of current usages in the standard library revealed at
> least a couple bugs even there.

Really it's x[-1]'s behavior that should go, not find/rfind.

Will socket.connect_ex also go? How about dict.get? Being able to
return some reasonable value for "failure" is a good thing, if failure
is expected. Exceptions are for unexpected, i.e., exceptional failures.

Antoon Pardon

ungelesen,

30.08.2005, 03:12:0530.08.05

an

Op 2005-08-29, Steve Holden schreef <st...@holdenweb.com>:

> Antoon Pardon wrote:
>> Op 2005-08-27, Steve Holden schreef <st...@holdenweb.com>:
>>
>>>>
>>>If you want an exception from your code when 'w' isn't in the string you
>>>should consider using index() rather than find.
>>
>>
>> Sometimes it is convenient to have the exception thrown at a later
>> time.
>>
>>
>>>Otherwise, whatever find() returns you will have to have an "if" in
>>>there to handle the not-found case.
>>
>>
>> And maybe the more convenient place for this "if" is in a whole different
>> part of your program, a part where using -1 as an invalid index isn't
>> at all obvious.
>>
>>
>>>This just sounds like whining to me. If you want to catch errors, use a
>>>function that will raise an exception rather than relying on the
>>>invalidity of the result.
>>
>>
>> You always seem to look at such things in a very narrow scope. You never
>> seem to consider that various parts of a program have to work together.
>>
> Or perhaps it's just that I try not to mix parts inappropriately.

I didn't know it was inappropriately to mix certain parts. Can you
give a list of modules in the standard list I shouldn't mix.

>> So what happens if you have a module that is collecting string-index
>> pair, colleted from various other parts. In one part you
>> want to select the last letter, so you pythonically choose -1 as
>> index. In an other part you get a result of find and are happy
>> with -1 as an indictation for an invalid index. Then these
>> data meet.
>>
> That's when debugging has to start. Mixing data of such types is
> somewhat inadvisable, don't you agree?

The type of both data is the same, it is a string-index pair in
both cases. The problem is that a module from the standard lib
uses a certain value to indicate an illegal index, that has
a very legal value in python in general.

> I suppose I can't deny that people do things like that, myself included,

It is not about what people do. If this was about someone implementing
find himself and using -1 as an illegal index, I would certainly agree
that it was inadvisable to do so. Yet when this is what python with
its libary offers the programmer, you seem reluctant find fault with
it.

> but mixing data sets where -1 is variously an error flag and a valid
> index is only going to lead to trouble when the combined data is used.

Yet this is what python does. Using -1 variously as an error flag and
a valid index and when people complain about that, you say it sounds like
whining.

--
Antoon Pardon

Bryan Olson

ungelesen,

30.08.2005, 04:05:5830.08.05

an

Steve Holden wrote:
> I'm all in favor of discussions to make 3.0 a better
> language.

This one should definitely be two-phase. First, the non-code-
breaking change that replaces-and-deprecates the warty handling
of negative indexes, and later the removal of the old style. For
the former, there's no need to wait for a X.0 release; for the
latter, 3.0 may be too early.

The draft PEP went to the PEP editors a couple days ago. Haven't
heard back yet.

--
--Bryan

Antoon Pardon

ungelesen,

30.08.2005, 04:07:4530.08.05

an

Op 2005-08-29, Steven Bethard schreef <steven....@gmail.com>:

> Antoon Pardon wrote:
>> I think a properly implented find is better than an index.
>
> See the current thread in python-dev[1], which proposes a new method,
> str.partition(). I believe that Raymond Hettinger has shown that almost
> all uses of str.find() can be more clearly be represented with his
> proposed function.

Do we really need this? As far as I understand most of this
functionality is already provided by str.split and str.rsplit

I think adding an optional third parameter 'full=False' to these
methods, would be all that is usefull here. If full was set
to True, split and rsplit would enforce that a list with
maxsplit + 1 elements was returned, filling up the list with
None's if necessary.

head, sep, tail = str.partion(sep)

would then almost be equivallent to

head, tail = str.find(sep, 1, True)

Code like the following:

head, found, tail = result.partition(' ')
if not found:
break
result = head + tail

Could be replaced by:

head, tail = result.split(' ', 1, full = True)
if tail is None
break
result = head + tail

I also think that code like this:

while tail:
head, _, tail = tail.partition('.')
mname = "%s.%s" % (m.__name__, head)
m = self.import_it(head, mname, m)
...

Would probably better be written as follows:

for head in tail.split('.'):
mname = "%s.%s" % (m.__name__, head)
m = self.import_it(head, mname, m)
...

Unless I'm missing something.

--
Antoon Pardon

[1]http://mail.python.org/pipermail/python-dev/2005-August/055781.html

Terry Reedy

ungelesen,

30.08.2005, 04:14:4230.08.05

an pytho...@python.org

"Paul Rubin" <"http://phr.cx"@NOSPAM.invalid> wrote in message

news:7xy86k3...@ruckus.brouhaha.com...

> Really it's x[-1]'s behavior that should go, not find/rfind.

I complete disagree, x[-1] as an abbreviation of x[len(x)-1] is extremely
useful, especially when 'x' is an expression instead of a name. But even
if -1 were not a legal subscript, I would still consider it a design error
for Python to mistype a non-numeric singleton indicator as an int. Such
mistyping is only necessary in a language like C that requires all return
values to be of the same type, even when the 'return value' is not really a
return value but an error signal. Python does not have that typing
restriction and should not act as if it does by copying C.

> Will socket.connect_ex also go?

Not familiar with it.

> How about dict.get?

A default value is not necessarily an error indicator. One can regard a
dict that is 'getted' as an infinite dict matching all keys with the
default except for a finite subset of keys, as recorded in the dict.

If the default is to be regarded a 'Nothing to return' indicator, then that
indicator *must not* be in the dict. A recommended idiom is to then create
a new, custom subset of object which *cannot* be a value in the dict.
Return values can they safely be compared with that indicator by using the
'is' operator.

In either case, .get is significantly different from .find.

Terry J. Reedy

Paul Rubin

ungelesen,

30.08.2005, 04:34:3230.08.05

an

"Terry Reedy" <tjr...@udel.edu> writes:
> > Really it's x[-1]'s behavior that should go, not find/rfind.
>
> I complete disagree, x[-1] as an abbreviation of x[len(x)-1] is extremely
> useful, especially when 'x' is an expression instead of a name.

There are other abbreviations possible, for example the one in the
proposed PEP at the beginning of this thread.

> But even
> if -1 were not a legal subscript, I would still consider it a design error
> for Python to mistype a non-numeric singleton indicator as an int.

OK, .find should return None if the string is not found.

Bryan Olson

ungelesen,

30.08.2005, 04:53:2730.08.05

an

Terry Reedy wrote:

> "Paul Rubin" wrote:
>
>>Really it's x[-1]'s behavior that should go, not find/rfind.
>
> I complete disagree, x[-1] as an abbreviation of x[len(x)-1] is
extremely
> useful, especially when 'x' is an expression instead of a name.

Hear us out; your disagreement might not be so complete as you
think. From-the-far-end indexing is too useful a feature to
trash. If you look back several posts, you'll see that the
suggestion here is that the index expression should explicitly
call for it, rather than treat negative integers as a special
case.

I wrote up and sent off my proposal, and once the PEP-Editors
respond, I'll be pitching it on the python-dev list. Below is
the version I sent (not yet a listed PEP).

--
--Bryan

PEP: -1
Title: Improved from-the-end indexing and slicing
Version: $Revision: 1.00 $
Last-Modified: $Date: 2005/08/26 00:00:00 $
Author: Bryan G. Olson <bryan...@acm.org>
Status: Draft
Type: Standards Track
Content-Type: text/plain
Created: 26 Aug 2005
Post-History:

Abstract

To index or slice a sequence from the far end, we propose
using a symbol, '$', to stand for the length, instead of
Python's current special-case interpretation of negative
subscripts. Where Python currently uses:

sequence[-i]

We propose:

sequence[$ - i]

Python's treatment of negative indexes as offsets from the
high end of a sequence causes minor obvious problems and
major subtle ones. This PEP proposes a consistent meaning
for indexes, yet still supports from-the-far-end
indexing. Use of new syntax avoids breaking existing code.

Specification

We propose a new style of slicing and indexing for Python
sequences. Instead of:

sequence[start : stop : step]

new-style slicing uses the syntax:

sequence[start ; stop ; step]

It works like current slicing, except that negative start or
stop values do not trigger from-the-high-end interpretation.
Omissions and 'None' work the same as in old-style slicing.

Within the square-brackets, the '$' symbol stands for the
length of the sequence. One can index from the high end by
subtracting the index from '$'. Instead of:

seq[3 : -4]

we write:

seq[3 ; $ - 4]

When square-brackets appear within other square-brackets,
the inner-most bracket-pair determines which sequence '$'
describes. The length of the next-outer sequence is denoted
by '$1', and the next-out after than by '$2', and so on. The
symbol '$0' behaves identically to '$'. Resolution of $x is
syntactic; a callable object invoked within square brackets
cannot use the symbol to examine the context of the call.

The '$' notation also works in simple (non-slice) indexing.
Instead of:

seq[-2]

we write:

seq[$ - 2]

If we did not care about backward compatibility, new-style
slicing would define seq[-2] to be out-of-bounds. Of course
we do care about backward compatibility, and rejecting
negative indexes would break way too much code. For now,
simple indexing with a negative subscript (and no '$') must
continue to index from the high end, as a deprecated
feature. The presence of '$' always indicates new-style
indexing, so a programmer who needs a negative index to
trigger a range error can write:

seq[($ - $) + index]

Motivation

From-the-far-end indexing is such a useful feature that we
cannot reasonably propose its removal; nevertheless Python's
current method, which is to treat a range of negative
indexes as special cases, is warty. The wart bites novice or
imperfect Pythoners by not raising an exceptions when they
need to know about a bug. For example, the following code
prints 'y' with no sign of error:

s = 'buggy'
print s[s.find('w')]

The wart becomes an even bigger problem with more
sophisticated use of Python sequences. What is the 'stop'
value for a slice when the step is negative and the slice
includes the zero index? An instance of Python's slice type
will report that the stop value is -1, but if we use this
stop value to slice, it gets misinterpreted as the last
index in the sequence. Here's an example:

class BuggerAll:

def __init__(self, somelist):
self.sequence = somelist[:]

def __getitem__(self, key):
if isinstance(key, slice):
start, stop, step = key.indices(len(self.sequence))
# print 'Slice says start, stop, step are:', start,
stop, step
return self.sequence[start : stop : step]

print range(10) [None : None : -2]
print BuggerAll(range(10))[None : None : -2]

The above prints:

[9, 7, 5, 3, 1]
[]

Un-commenting the print statement in __getitem__ shows:

Slice says start, stop, step are: 9 -1 -2

The slice object seems to think that -1 is a valid exclusive
bound, but when using it to actually slice, Python
interprets the negative number as an offset from the high
end of the sequence.

Steven Bethard offered the simpler example:

py> range(10)[slice(None, None, -2)]
[9, 7, 5, 3, 1]
py> slice(None, None, -2).indices(10)
(9, -1, -2)
py> range(10)[9:-1:-2]
[]

The double-meaning of -1, as both an exclusive stopping
bound and an alias for the highest valid index, is just

plain whacked. So what should the slice object return? With
Python's current indexing/slicing, there is no value that
just works. 'None' will work as a stop value in a slice, but
index arithmetic will fail. The value 0 - (len(sequence) +
1) will work as a stop value, and slice arithmetic and
range() will happily use it, but the result is not what the
programmer probably intended.

The problem is subtle. A Python sequence starts at index
zero. There is some appeal to giving negative indexes a
useful interpretation, on the theory that they were invalid
as subscripts and thus useless otherwise. That theory is
wrong, because negative indexes were already useful, even
though not legal subscripts, and the reinterpretation often
breaks their exiting use. Specifically, negative indexes are
useful in index arithmetic, and as exclusive stopping
bounds.

The problem is fixable. We propose that negative indexes not
be treated as a special case. To index from the far end of a
sequence, we use a syntax that explicitly calls for far-end
indexing.

Rationale

New-style slicing/indexing is designed to fix the problems
described above, yet live happily in Python along-side the
old style. The new syntax leaves the meaning of existing
code unchanged, and is even more Pythonic than current
Python.

Semicolons look a lot like colons, so the new semicolon
syntax follows the rule that things that are similar should
look similar. The semicolon syntax is currently illegal, so
its addition will not break existing code. Python is
historically tied to C, and the semicolon syntax is
evocative of the similar start-stop-step expressions of C's
'for' loop. JPython is tied to Java, which uses a similar
'for' loop syntax.

The '$' character currently has no place in a Python index,
so its new interpretation will not break existing code. We
chose it over other unused symbols because the usage roughly
corresponds to its meaning in the Python library's regular
expression module.

We expect use of the $0, $1, $2 ... syntax to be rare;
nevertheless, it has a Pythonic consistency. Thanks to Paul
Rubin for advocating it over the inferior multiple-$ syntax
that this author initially proposed.

Backwards Compatibility

To avoid braking code, we use new syntax that is currently
illegal. The new syntax more-or-less looks like current
Python, which may help Python programmers adjust.

User-defined classes that implement the sequence protocol
are likely to work, unchanged, with new-style slicing.
'Likely' is not certain; we've found one subtle issue (and
there may be others):

Currently, user-defined classes can implement Python
subscripting and slicing without implementing Python's len()
function. In our proposal, the '$' symbol stands for the
sequence's length, so classes must be able to report their
length in order for $ to work within their slices and
indexes.

Specifically, to support new-style slicing, a class that
accepts index or slice arguments to any of:

__getitem__
__setitem__
__delitem__
__getslice__
__setslice__
__delslice__

must also consistently implement:

__len__

Sane programmers already follow this rule.

Copyright:

This document has been placed in the public domain.

Paul Rubin

ungelesen,

30.08.2005, 05:10:2430.08.05

an

Bryan Olson <fakea...@nowhere.org> writes:
> Specifically, to support new-style slicing, a class that
> accepts index or slice arguments to any of:
>
> __getitem__
> __setitem__
> __delitem__
> __getslice__
> __setslice__
> __delslice__
>
> must also consistently implement:
>
> __len__
>
> Sane programmers already follow this rule.

It should be ok to use new-style slicing without implementing __len__
as long as you don't use $ in any slices. Using $ in a slice without
__len__ would throw a runtime error. I expect using negative
subscripts in old-style slices on objects with no __len__ also throws
an error.

Not every sequence needs __len__; for example, infinite sequences, or
sequences that implement slicing and subscripts by doing lazy
evaluation of iterators:

digits_of_pi = memoize(generate_pi_digits()) # 3,1,4,1,5,9,2,...
print digits_of_pi[5] # computes 6 digits and prints '9'
print digits_of_pi($-5) # raises exception

Antoon Pardon

ungelesen,

30.08.2005, 06:07:0630.08.05

an

Op 2005-08-30, Terry Reedy schreef <tjr...@udel.edu>:

>
> "Paul Rubin" <"http://phr.cx"@NOSPAM.invalid> wrote in message
> news:7xy86k3...@ruckus.brouhaha.com...
>
>> Really it's x[-1]'s behavior that should go, not find/rfind.
>
> I complete disagree, x[-1] as an abbreviation of x[len(x)-1] is extremely
> useful, especially when 'x' is an expression instead of a name.

I don't think the ability to easily index sequences from the right is
in dispute. Just the fact that negative numbers on their own provide
this functionality.

Because I sometimes find it usefull to have a sequence start and
end at arbitrary indexes, I have written a table class. So I
can have a table that is indexed from e.g. -4 to +6. So how am
I supposed to easily get at that last value?

--
Antoon Pardon

Robert Kern

ungelesen,

30.08.2005, 06:58:2030.08.05

an pytho...@python.org

Bryan Olson wrote:

> Currently, user-defined classes can implement Python
> subscripting and slicing without implementing Python's len()
> function. In our proposal, the '$' symbol stands for the
> sequence's length, so classes must be able to report their
> length in order for $ to work within their slices and
> indexes.
>
> Specifically, to support new-style slicing, a class that
> accepts index or slice arguments to any of:
>
> __getitem__
> __setitem__
> __delitem__
> __getslice__
> __setslice__
> __delslice__
>
> must also consistently implement:
>
> __len__
>
> Sane programmers already follow this rule.

Incorrect. Some sane programmers have multiple dimensions they need to
index.

from Numeric import *
A = array([[0, 1], [2, 3], [4, 5]])
A[$-1, $-1]

The result of len(A) has nothing to do with the second $.

Antoon Pardon

ungelesen,

30.08.2005, 07:18:0030.08.05

an

Op 2005-08-30, Robert Kern schreef <rk...@ucsd.edu>:

> Bryan Olson wrote:
>
>> Currently, user-defined classes can implement Python
>> subscripting and slicing without implementing Python's len()
>> function. In our proposal, the '$' symbol stands for the
>> sequence's length, so classes must be able to report their
>> length in order for $ to work within their slices and
>> indexes.
>>
>> Specifically, to support new-style slicing, a class that
>> accepts index or slice arguments to any of:
>>
>> __getitem__
>> __setitem__
>> __delitem__
>> __getslice__
>> __setslice__
>> __delslice__
>>
>> must also consistently implement:
>>
>> __len__
>>
>> Sane programmers already follow this rule.
>
> Incorrect. Some sane programmers have multiple dimensions they need to
> index.

I don't see how that contradicts Bryan's statement.

> from Numeric import *
> A = array([[0, 1], [2, 3], [4, 5]])
> A[$-1, $-1]
>
> The result of len(A) has nothing to do with the second $.

But that is irrelevant to the fact wether or not sane
programmes follow Bryan's stated rule. That the second
$ has nothing to do with len(A), doesn't contradict
__len__ has to be implemented nor that sane programers
already do.

--
Antoon Pardon

Bryan Olson

ungelesen,

30.08.2005, 07:56:2430.08.05

an

I think you have a good observation there, but I'll stand by my
correctness.

My initial post considered re-interpreting tuple arguments, but
I abandoned that alternative after Steven Bethard pointed out
how much code it would break. Modules/classes would remain free
to interpret tuple arguments in any way they wish. I don't think
my proposal breaks any sane existing code.

Going forward, I would advocate that user classes which
implement their own kind of subscripting adopt the '$' syntax,
and interpret it as consistently as possible. For example, they
could respond to __len__() by returning a type that supports the
"Emulating numeric types" methods from the Python Language
Reference 3.3.7, and also allows the class's methods to tell
that it stands for the length of the dimension in question.

--
--Bryan

Robert Kern

ungelesen,

30.08.2005, 07:54:3830.08.05

an pytho...@python.org

Except that the *consistent* implementation is supposed to support the
interpretation of $. It clearly can't for multiple dimensions.

Robert Kern

ungelesen,

30.08.2005, 08:24:5530.08.05

an pytho...@python.org

Bryan Olson wrote:
> Robert Kern wrote:

> > from Numeric import *
> > A = array([[0, 1], [2, 3], [4, 5]])
> > A[$-1, $-1]
> >
> > The result of len(A) has nothing to do with the second $.
>
> I think you have a good observation there, but I'll stand by my
> correctness.

len() cannot be used to determine the value of $ in the context of
multiple dimensions.

> My initial post considered re-interpreting tuple arguments, but
> I abandoned that alternative after Steven Bethard pointed out
> how much code it would break. Modules/classes would remain free
> to interpret tuple arguments in any way they wish. I don't think
> my proposal breaks any sane existing code.

What it does do is provide a second way to do indexing from the end that
can't be extended to multiple dimensions.

> Going forward, I would advocate that user classes which
> implement their own kind of subscripting adopt the '$' syntax,
> and interpret it as consistently as possible.

How? You haven't proposed how an object gets the information that
$-syntax is being used. You've proposed a syntax and some semantics; you
also need to flesh out the pragmatics.

> For example, they
> could respond to __len__() by returning a type that supports the
> "Emulating numeric types" methods from the Python Language
> Reference 3.3.7, and also allows the class's methods to tell
> that it stands for the length of the dimension in question.

I have serious doubts about __len__() returning anything but a bona-fide
integer. We shouldn't need to use incredible hacks like that to support
a core language feature.

phil hunt

ungelesen,

30.08.2005, 12:33:4830.08.05

an

On Tue, 30 Aug 2005 08:53:27 GMT, Bryan Olson <fakea...@nowhere.org> wrote:
> Specifically, to support new-style slicing, a class that
> accepts index or slice arguments to any of:
>
> __getitem__
> __setitem__
> __delitem__
> __getslice__
> __setslice__
> __delslice__
>
> must also consistently implement:
>
> __len__
>
> Sane programmers already follow this rule.

Wouldn't it be more sensible to have an abstract IndexedCollection
superclass, which imlements all the slicing stuff, then when someone
writes their own collection class they just have to implement
__len__ and __getitem__ and slicing works automatically?

--
Email: zen19725 at zen dot co dot uk

Steve Holden

ungelesen,

30.08.2005, 13:30:3630.08.05

an pytho...@python.org

Since you are clearly feeling pedantic enough to beat this one to death
with a 2 x 4 please let me substitute "usages" for "types".

In the case of a find() result -1 *isn't* a string index, it's a failure
flag. Which is precisely why it should be filtered out of any set of
indexes. once it's been inserted it can no longer be distinguished as a
failure indication.

>
>>I suppose I can't deny that people do things like that, myself included,
>
>
> It is not about what people do. If this was about someone implementing
> find himself and using -1 as an illegal index, I would certainly agree
> that it was inadvisable to do so. Yet when this is what python with
> its libary offers the programmer, you seem reluctant find fault with
> it.
>

I've already admitted that the choice of -1 as a return value wasn't
smart. However you appear to be saying that it's sensible to mix return
values from find() with general-case index values. I'm saying that you
should do so only with caution. The fact that the naiive user will often
not have the wisdom to apply such caution is what makes a change desirable.

>
>>but mixing data sets where -1 is variously an error flag and a valid
>>index is only going to lead to trouble when the combined data is used.
>
>
> Yet this is what python does. Using -1 variously as an error flag and
> a valid index and when people complain about that, you say it sounds like
> whining.
>

What I am trying to say is that this doesn't make sense: if you want to
combine find() results with general-case indexes (i.e. both positive and
negative index values) it behooves you to strip out the -1's before you
do so. Any other behaviour is asking for trouble.

Bengt Richter

ungelesen,

30.08.2005, 15:03:0130.08.05

an

On Tue, 30 Aug 2005 08:53:27 GMT, Bryan Olson <fakea...@nowhere.org> wrote:

[...]

>Specification
>
> We propose a new style of slicing and indexing for Python
> sequences. Instead of:
>
> sequence[start : stop : step]
>
> new-style slicing uses the syntax:
>
> sequence[start ; stop ; step]
>

I don't mind the semantics, but I don't like the semicolons ;-)

What about if when brackets trail as if attributes, it means
your-style slicing written with colons instead of semicolons?

sequence.[start : stop : step]

I think that would just be a tweak on the trailer syntax.
I just really dislike the semicolons ;-)

Regards,
Bengt Richter

Paul Rubin

ungelesen,

30.08.2005, 15:04:3330.08.05

an

bo...@oz.net (Bengt Richter) writes:
> What about if when brackets trail as if attributes, it means
> your-style slicing written with colons instead of semicolons?
>
> sequence.[start : stop : step]

This is nice. It gets rid of the whole $1,$2,etc syntax as well.

Bengt Richter

ungelesen,

30.08.2005, 15:17:4130.08.05

an

(OTTOMH ;-)
Perhaps the slice triple could be extended with a flag indicating
which of the other elements should have $ added to it, and $ would
take meaning from the subarray being indexed, not the whole. E.g.,

arr.[1:$-1, $-5:$-2]

would call arr.__getitem__((slice(1,-1,None,STOP), slice(-5,-2,None,START|STOP))

(Hypothesizing bitmask constants START and STOP)

Regards,
Bengt Richter

Bengt Richter

ungelesen,

30.08.2005, 15:23:1430.08.05

an

Give it a handy property? E.g.,

table.as_python_list[-1]

Regards,
Bengt Richter

Antoon Pardon

ungelesen,

31.08.2005, 03:13:1231.08.05

an

Op 2005-08-30, Steve Holden schreef <st...@holdenweb.com>:

But it's not my usage but python's usage.

> In the case of a find() result -1 *isn't* a string index, it's a failure
> flag. Which is precisely why it should be filtered out of any set of
> indexes. once it's been inserted it can no longer be distinguished as a
> failure indication.

Which is precisely why it was such a bad choice in the first place.

If I need to write code like this:

var = str.find('.')
if var == -1:
var = None

each time I want to store an index for later use, then surely '-1'
shouldn't have been used here.

>>>I suppose I can't deny that people do things like that, myself included,
>>
>>
>> It is not about what people do. If this was about someone implementing
>> find himself and using -1 as an illegal index, I would certainly agree
>> that it was inadvisable to do so. Yet when this is what python with
>> its libary offers the programmer, you seem reluctant find fault with
>> it.

> I've already admitted that the choice of -1 as a return value wasn't
> smart. However you appear to be saying that it's sensible to mix return
> values from find() with general-case index values.

I'm saying it should be possible without a problem. It is poor design
to return a legal value as an indication for an error flag.

> I'm saying that you
> should do so only with caution. The fact that the naiive user will often
> not have the wisdom to apply such caution is what makes a change desirable.

I don't think it is naive, if you expect that no legal value will be
returned as an error flag.

>>>but mixing data sets where -1 is variously an error flag and a valid
>>>index is only going to lead to trouble when the combined data is used.
>>
>>
>> Yet this is what python does. Using -1 variously as an error flag and
>> a valid index and when people complain about that, you say it sounds like
>> whining.
>>
> What I am trying to say is that this doesn't make sense: if you want to
> combine find() results with general-case indexes (i.e. both positive and
> negative index values) it behooves you to strip out the -1's before you
> do so. Any other behaviour is asking for trouble.

I would say that choosing this particular return value as an error flag
was asking for trouble. My impression is that you are putting more
blame on the programmer which fails to take corrective action, instead
of on the design of find, which makes that corrective action needed
in the first place.

--
Antoon Pardon

Antoon Pardon

ungelesen,

31.08.2005, 03:26:4831.08.05

an

Op 2005-08-30, Bengt Richter schreef <bo...@oz.net>:

Your missing the point, I probably didn't make it clear.

It is not about the possibilty of doing such a thing. It is
about python providing a frame for such things that work
in general without the need of extra properties in 'special'
cases.

--
Antoon Pardon

Bengt Richter

ungelesen,

31.08.2005, 03:54:3031.08.05

an

How about interpreting seq[i] as an abbreviation of seq[i%len(seq)] ?
That would give a consitent interpretation of seq[-1] and no errors
for any value ;-)

Regards,
Bengt Richter

Antoon Pardon

ungelesen,

31.08.2005, 04:01:1631.08.05

an

Op 2005-08-31, Bengt Richter schreef <bo...@oz.net>:

But the question was not about having a consistent interpretation for
-1, but about an easy way to get the last value.

But I like your idea. I just think there should be two differnt ways
to index. maybe use braces in one case.

seq{i} would be pure indexing, that throws exceptions if you
are out of bound

seq[i] would then be seq{i%len(seq)}

--
Antoon Pardon

Bryan Olson

ungelesen,

31.08.2005, 05:55:5931.08.05

an

Paul Rubin wrote:
> Not every sequence needs __len__; for example, infinite sequences, or
> sequences that implement slicing and subscripts by doing lazy
> evaluation of iterators:
>
> digits_of_pi = memoize(generate_pi_digits()) # 3,1,4,1,5,9,2,...
> print digits_of_pi[5] # computes 6 digits and prints '9'
> print digits_of_pi($-5) # raises exception

Good point. I like the memoize thing, so here is one:

class memoize (object):
""" Build a sequence from an iterable, evaluating as needed.
"""

def __init__(self, iterable):
self.it = iterable
self.known = []

def extend_(self, stop):
while len(self.known) < stop:
self.known.append(self.it.next())

def __getitem__(self, key):
if isinstance(key, (int, long)):
self.extend_(key + 1)
return self.known[key]
elif isinstance(key, slice):
start, stop, step = key.start, key.stop, key.step
stop = start + 1 + (stop - start - 1) // step * step
self.extend_(stop)
return self.known[start : stop : step]
else:
raise TypeError(_type_err_note), "Bad subscript type"

--
--Bryan

Kay Schluehr

ungelesen,

31.08.2005, 10:13:0931.08.05

an

Bengt Richter wrote:

> How about interpreting seq[i] as an abbreviation of seq[i%len(seq)] ?
> That would give a consitent interpretation of seq[-1] and no errors
> for any value ;-)

Cool, indexing becomes cyclic by default ;)

But maybe it's better to define it explicitely:

seq[!i] = seq[i%len(seq)]

Well, I don't like the latter definition very much because it
introduces special syntax for __getitem__. A better solution may be the
introduction of new syntax and arithmetics for positive and negative
infinite values. Sequencing has to be adapted to handle them.

The semantics follows that creating of limits of divergent sequences:

!0 = lim n
n->infinity

That enables consistent arithmetics:

!0+k = lim n+k -> !0
n->infinity

!0/k = lim n/k -> !0 for k>0,
n->infinity -!0 for k<0
ZeroDevisionError for k==0

etc.

In Python notation:

>>> !0
!0
>>> !0+1
!0
>>> !0>n # if n is int
True
>>> !0/!0
Traceback (...)
...
UndefinedValue
>>> !0 - !0
Traceback (...)
...
UndefinedValue
>>> -!0
-!0
>>> range(9)[4:!0] == range(9)[4:]
True
>>> range(9)[4:-!0:-1] == range(5)
True

Life can be simpler with unbound limits.

Kay

Kay Schluehr

ungelesen,

31.08.2005, 10:13:2631.08.05

an

Bengt Richter wrote:

> How about interpreting seq[i] as an abbreviation of seq[i%len(seq)] ?
> That would give a consitent interpretation of seq[-1] and no errors
> for any value ;-)

Cool, indexing becomes cyclic by default ;)

Ron Adam

ungelesen,

31.08.2005, 10:16:2831.08.05

an

Antoon Pardon wrote:

The problem with negative index's are that positive index's are zero
based, but negative index's are 1 based. Which leads to a non
symmetrical situations.

Note that you can insert an item before the first item using slices. But
not after the last item without using len(list) or some value larger
than len(list).

>>> a = list('abcde')
>>> a[len(a):len(a)] = ['end']
>>> a
['a', 'b', 'c', 'd', 'e', 'end']

>>> a[-1:-1] = ['last']
>>> a
['a', 'b', 'c', 'd', 'e', 'last', 'end'] # Second to last.

>>> a[100:100] = ['final']
>>> a
['a', 'b', 'c', 'd', 'e', 'last', 'end', 'final']

Cheers,
Ron

Bengt Richter

ungelesen,

31.08.2005, 11:13:5031.08.05

an

Interesting, but wouldn't that last line be
>>> range(9)[4:-!0:-1] == range(5)[::-1]

>Life can be simpler with unbound limits.

Hm, is "!0" a di-graph symbol for infinity?
What if we get full unicode on our screens? Should
it be rendered with unichr(0x221e) ? And how should
symbols be keyed in? Is there a standard mnemonic
way of using an ascii keyboard, something like typing
Japanese hiragana in some word processing programs?

I'm not sure about '!' since it already has some semantic
ties to negation and factorial and execution (not to mention
exclamation ;-) If !0 means infinity, what does !2 mean?

Just rambling ... ;-)

Regards,
Bengt Richter

Kay Schluehr

ungelesen,

31.08.2005, 14:25:5131.08.05

an

Bengt Richter wrote:

> >>>> range(9)[4:-!0:-1] == range(5)
> >True
> Interesting, but wouldn't that last line be
> >>> range(9)[4:-!0:-1] == range(5)[::-1]

Ups. Yes of course.

> >Life can be simpler with unbound limits.
> Hm, is "!0" a di-graph symbol for infinity?
> What if we get full unicode on our screens? Should
> it be rendered with unichr(0x221e) ? And how should
> symbols be keyed in? Is there a standard mnemonic
> way of using an ascii keyboard, something like typing
> Japanese hiragana in some word processing programs?

You can ask questions ;-)

> I'm not sure about '!' since it already has some semantic
> ties to negation and factorial and execution (not to mention
> exclamation ;-) If !0 means infinity, what does !2 mean?
>
> Just rambling ... ;-)

I'm not shure too. Probably Inf as a keyword is a much better choice.
The only std-library module I found that used Inf was Decimal where Inf
has the same meaning. Inf is quick to write ( just one more character
than !0 ) and easy to parse for human readers. Rewriting the above
statements/expressions leads to:

>>> Inf
Inf
>>> Inf+1
Inf
>>> Inf>n # if n is int
True
>>> Inf/Inf
Traceback (...)
...
UndefinedValue
>>> Inf - Inf
Traceback (...)
...
UndefinedValue
>>> -Inf
-Inf
>>> range(9)[4:Inf] == range(9)[4:]
True
>>> range(9)[4:-Inf:-1] == range(5)[::-1]
True

IMO it's still consice.

Kay

Stefan Rank

ungelesen,

01.09.2005, 07:29:0001.09.05

an pytho...@python.org

> [snipped alot from others about indexing, slicing problems,
> and the inadequacy of -1 as Not Found indicator]

on 31.08.2005 16:16 Ron Adam said the following:

> The problem with negative index's are that positive index's are zero
> based, but negative index's are 1 based. Which leads to a non
> symmetrical situations.

Hear, hear.

This is, for me, the root of the problem.

But changing the whole of Python to the (more natural and consistent)
one-based indexing style, for indexing from left and right, is...
difficult.

Fredrik Lundh

ungelesen,

01.09.2005, 08:37:0501.09.05

an pytho...@python.org

Ron Adam wrote:

> The problem with negative index's are that positive index's are zero
> based, but negative index's are 1 based. Which leads to a non
> symmetrical situations.

indices point to the "gap" between items, not to the items themselves.

positive indices start from the left end, negative indices from the righept end.

straight indexing returns the item just to the right of the given gap (this is
what gives you the perceived assymmetry), slices return all items between
the given gaps.

</F>

Terry Reedy

ungelesen,

01.09.2005, 11:36:4201.09.05

an pytho...@python.org

"Fredrik Lundh" <fre...@pythonware.com> wrote in message
news:df6slb$4n8$1...@sea.gmane.org...
> [slice] indices point to the "gap" between items, not to the items

> themselves.
>
> positive indices start from the left end, negative indices from the
> righept end.
>
> straight indexing returns the item just to the right of the given gap
> (this is
> what gives you the perceived assymmetry), slices return all items between
> the given gaps.

Well said. In some languages, straight indexing returns the item to the
left instead. The different between items and gaps in seen in old
terminals and older screens versus modern gui screens. Then, a cursur sat
on top of or under a character space. Now, a cursur sits between chars.

Terry J. Reedy

Terry Reedy

ungelesen,

01.09.2005, 12:13:4301.09.05

an pytho...@python.org

"Stefan Rank" <stefa...@ofai.at> wrote in message
news:4316E5FC...@ofai.at...

> on 31.08.2005 16:16 Ron Adam said the following:

>> The problem with negative index's are that positive index's are zero
>> based, but negative index's are 1 based. Which leads to a non
>> symmetrical situations.
>

> Hear, hear.
>
> This is, for me, the root of the problem.

The root of the problem is the misunderstanding of slice indexes and the
symmetry-breaking desire to denote an interval of length 1 by 1 number
instead of 2. Someday, I may look at the tutorial to see if I can suggest
improvements. In the meanwhile, see Fredrik's reply and my supplement
thereto and the additional explanation below.

> But changing the whole of Python to the (more natural and consistent)
> one-based indexing style, for indexing from left and right, is...
> difficult.

Consider a mathematical axis

|_|_|_|_|...
0 1 2 3 4

The numbers represent points separating unit intervals and representing the
total count of intervals from the left. Count 'up' to the right is
standard practice. Intervals of length n are denoted by 2 numbers, a:b,
where b-a = n.

Now consider the 'tick marks' to be gui cursor positions. Characters go in
the spaces *between* the cursor. (Fixed versus variable space
representations are irrelevant here.) More generally, one can put 'items'
or 'item indicators' in the spaces to form general sequences rather than
just char strings.

It seems convenient to indicate a single char or item with a single number
instead of two. We could use the average coordinate, n.5. But that is a
nuisance, and the one number representation is about convenience, so we
round down or up, depending on the language. Each choice has pluses and
minuses; Python rounds down.

The axis above and Python iterables are potentially unbounded. But actual
strings and sequences are finite and have a right end also. Python then
gives the option of counting 'down' from that right end and makes the count
negative, as is standard. (But it does not make the string/sequence
circular).

One can devise slight different sequence models, but the above is the one
used by Python. It is consistent and not buggy once understood. I hope
this clears some of the confusion seen in this thread.

Terry J. Reedy

Ron Adam

ungelesen,

01.09.2005, 20:23:1401.09.05

an

Fredrik Lundh wrote:
> Ron Adam wrote:
>
>
>>The problem with negative index's are that positive index's are zero
>>based, but negative index's are 1 based. Which leads to a non
>>symmetrical situations.
>
>
> indices point to the "gap" between items, not to the items themselves.

So how do I express a -0? Which should point to the gap after the last
item.

> straight indexing returns the item just to the right of the given gap (this is
> what gives you the perceived assymmetry), slices return all items between
> the given gaps.

If this were symmetrical, then positive index's would return the value
to the right and negative index's would return the value to the left.

Have you looked at negative steps? They also are not symmetrical.

All of the following get the center 'd' from the string.

a = 'abcdefg'
print a[3] # d 4 gaps from beginning
print a[-4] # d 5 gaps from end
print a[3:4] # d
print a[-4:-3] # d
print a[-4:4] # d
print a[3:-3] # d
print a[3:2:-1] # d These are symetric?!
print a[-4:-5:-1] # d
print a[3:-5:-1] # d
print a[-4:2:-1] # d

This is why it confuses so many people. It's a shame too, because slice
objects could be so much more useful for indirectly accessing list
ranges. But I think this needs to be fixed first.

Cheers,
Ron

Terry Reedy

ungelesen,

01.09.2005, 22:33:1701.09.05

an pytho...@python.org

"Ron Adam" <r...@ronadam.com> wrote in message
news:SXMRe.172$xl6...@tornado.tampabay.rr.com...

> Fredrik Lundh wrote:
>> Ron Adam wrote:
>>>The problem with negative index's are that positive index's are zero
>>>based, but negative index's are 1 based. Which leads to a non
>>>symmetrical situations.
>>
>> indices point to the "gap" between items, not to the items themselves.
>
> So how do I express a -0?

You just did ;-) but I probably do not know what you mean.

> Which should point to the gap after the last item.

The slice index of the gap after the last item is len(seq).

>> straight indexing returns the item just to the right of the given gap
>> (this is
>> what gives you the perceived assymmetry), slices return all items
>> between
>> the given gaps.
>
> If this were symmetrical, then positive index's would return the value
> to the right and negative index's would return the value to the left.

As I posted before (but perhaps it arrived after you sent this), one number
indexing rounds down, introducing a slight asymmetry.

> Have you looked at negative steps? They also are not symmetrical.

???

> All of the following get the center 'd' from the string.
>
> a = 'abcdefg'
> print a[3] # d 4 gaps from beginning
> print a[-4] # d 5 gaps from end

It is 3 and 4 gaps *from* the left and right end to the left side of the
'd'. You can also see the asymmetry as coming from rounding 3.5 and -3.5
down to 3 and down to -4.

> print a[3:4] # d
> print a[-4:-3] # d

These are is symmetric, as we claimed.

> print a[-4:4] # d

Here you count down past and up past the d.

> print a[3:-3] # d

Here you count up to and down to the d. The count is one more when you
cross the d than when you do not. You do different actions, you get
different counts. I would not recommend mixing up and down counting to a
beginner, and not down and up counting to anyone who did not absolutely
have to.

> print a[3:2:-1] # d These are symetric?!
> print a[-4:-5:-1] # d
> print a[3:-5:-1] # d
> print a[-4:2:-1] # d

The pattern seems to be: left-gap-index : farther-to-left-index : -1 is
somehow equivalent to left:right, but I never paid much attention to
strides and don't know the full rule.

Stride slices are really a different subject from two-gap slicing. They
were introduced in the early years of Python specificly and only for
Numerical Python. The rules were those needed specificly for Numerical
Python arrays. They was made valid for general sequence use only a few
years ago. I would say that they are only for careful mid-level to expert
use by those who actually need them for their code.

Terry J. Reedy

Paul Rubin

ungelesen,

01.09.2005, 23:02:5801.09.05

an

Ron Adam <r...@ronadam.com> writes:
> All of the following get the center 'd' from the string.
>
> a = 'abcdefg'
> print a[3] # d 4 gaps from beginning
> print a[-4] # d 5 gaps from end
> print a[3:4] # d
> print a[-4:-3] # d
> print a[-4:4] # d
> print a[3:-3] # d
> print a[3:2:-1] # d These are symetric?!
> print a[-4:-5:-1] # d
> print a[3:-5:-1] # d
> print a[-4:2:-1] # d

+1 QOTW

Fredrik Lundh

ungelesen,

02.09.2005, 03:59:0902.09.05

an pytho...@python.org

Ron Adam wrote:

>> indices point to the "gap" between items, not to the items themselves.
>
> So how do I express a -0? Which should point to the gap after the last
> item.

that item doesn't exist when you're doing plain indexing, so being able
to express -0 would be pointless.

when you're doing slicing, you express it by leaving the value out, or by
using len(seq) or (in recent versions) None.

>> straight indexing returns the item just to the right of the given gap (this is
>> what gives you the perceived assymmetry), slices return all items between
>> the given gaps.
>
> If this were symmetrical, then positive index's would return the value
> to the right and negative index's would return the value to the left.

the gap addressing is symmetrical, but indexing always picks the item to
the right.

> Have you looked at negative steps? They also are not symmetrical.

> print a[3:2:-1] # d These are symetric?!

the gap addressing works as before, but to understand exactly what characters
you'll get, you have to realize that the slice is really a gap index generator. when
you use step=1, you can view slice as a "cut here and cut there, and return what's
in between". for other step sizes, you have to think in gap indexes (for which the
plain indexing rules apply).

and if you know range(), you already know how the indexes are generated for
various step sizes.

from the range documentation:

... returns a list of plain integers [start, start + step, start + 2 * step, ...].
If step is positive, the last element is the largest start + i * step less than
stop; if step is negative, the last element is the largest start + i * step
greater than stop.

or, in sequence terms (see http://docs.python.org/lib/typesseq.html )

(3) If i or j is negative, the index is relative to the end of the string: len(s) + i
or len(s) + j is substituted.

...

(5) The slice of s from i to j with step k is defined as the sequence of items
with index x = i + n*k for n in the range(0,(j-i)/k). In other words, the
indices are i, i+k, i+2*k, i+3*k and so on, stopping when j is reached
(but never including j).

so in this case, you get

>>> 3 + 0*-1
3
>>> 3 + 1*-1
2 # which is your stop condition

so a[3:2:-1] is the same as a[3].

> print a[-4:-5:-1] # d

same as a[-4]

> print a[3:-5:-1] # d

now you're mixing addressing modes, which is a great way to confuse
yourself. if you normalize the gap indexes (rule 3 above), you'll get
a[3:2:-1] which is the same as your earlier example. you can use the
"indices" method to let Python do this for you:

>>> slice(3,-5,-1).indices(len(a))
(3, 2, -1)
>>> range(*slice(3,-5,-1).indices(len(a)))
[3]

> print a[-4:2:-1] # d

same problem here; indices will tell you what that really means:

>>> slice(-4,2,-1).indices(len(a))
(3, 2, -1)
>>> range(*slice(-4,2,-1).indices(len(a)))
[3]

same example again, in other words. and same result.

> This is why it confuses so many people. It's a shame too, because slice
> objects could be so much more useful for indirectly accessing list
> ranges. But I think this needs to be fixed first.

as everything else in Python, if you use the wrong mental model, things
may look "assymmetrical" or "confusing" or "inconsistent". if you look at
how things really work, it's usually extremely simple and more often than
not internally consistent (since the designers have the "big picture", and
knows what they're tried to be consistent with; when slice steps were
added, existing slicing rules and range() were the obvious references).

it's of course pretty common that people who didn't read the documentation
very carefully and therefore adopted the wrong model will insist that Python
uses a buggy implementation of their model, rather than a perfectly consistent
implementation of the actual model. slices with non-standard step sizes are
obviously one such thing, immutable/mutable objects and the exact behaviour
of for-else, while-else, and try-else are others. as usual, being able to reset
your brain is the only thing that helps.

</F>