Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Negative array indicies and slice()

166 views
Skip to first unread message

andrew...@gmail.com

unread,
Oct 28, 2012, 11:12:10 PM10/28/12
to
The slice operator does not give any way (I can find!) to take slices from negative to positive indexes, although the range is not empty, nor the expected indexes out of range that I am supplying.

Many programs that I write would require introducing variables and logical statements to correct the problem which is very lengthy and error prone unless there is a simple work around.

I *hate* replicating code every time I need to do this!

I also don't understand why slice() is not equivalent to an iterator, but can replace an integer in __getitem__() whereas xrange() can't.


Here's an example for Linux shell, otherwise remove /bin/env...
{{{#!/bin/env python
a=[1,2,3,4,5,6,7,8,9,10]
print a[-4:3] # I am interested in getting [7,8,9,10,1,2] but I get [].
}}}

Ian Kelly

unread,
Oct 28, 2012, 11:42:50 PM10/28/12
to Python
For a sequence of length 10, "a[-4:3]" is equivalent to "a[6:3]",
which is an empty slice since index 6 is after index 3.

If you want it to wrap around, then take two slices and concatenate
them with "a[-4:] + a[:3]".

MRAB

unread,
Oct 28, 2012, 11:45:58 PM10/28/12
to pytho...@python.org
If the stride is positive (if omitted it defaults to 1), the slice is
from the start index to one before the end index, and a negative index
counts from the end.

a[-4:3] is equivalent to a[len(a)-4:3], which is an empty list if
len(a)-4 >= 3.

It doesn't wrap around.

Message has been deleted
Message has been deleted

Andrew

unread,
Oct 29, 2012, 12:09:56 AM10/29/12
to Python
On Sunday, October 28, 2012 8:43:30 PM UTC-7, Ian wrote:
Hi Ian,
Well, no it really isn't equivalent; although Python implements it as equivalent.

Consider a programmer who writes:
xrange(-4,3)

They clearly *want* [-4,-3,-2,-1,0,1,2]

That is the "idea" of a range; So, for what reason would anyone want -4 to +3 to be 6:3??? Can you show me some code where this is desirable??

I do agree that the data held in -4 is equivalent to the data in 6, but the index is not the same.

So: Why does python choose to convert them to positive indexes, and have slice operate differently than xrange -- for the slice() object can't possibly know the size of the array when it is passed in to __getitem__; They are totally separate classes.

I realize I can concat. two slice ranges, BUT, the ranges do not always span from negative to positive.

eg: a line in my program reads:
a[x-5:x]

if x is 7, then this is a positive index to a positive index.
So, there is no logic to using two slices concatd !

I use this arbitrary range code *often* so I need a general purpose solution.
I looked up slice() but the help is of no use, I don't even know how I might overload it to embed some logic to concatenate ranges of data; nor even if it is possible.
Message has been deleted

Ian Kelly

unread,
Oct 29, 2012, 12:25:28 AM10/29/12
to Python
On Sun, Oct 28, 2012 at 10:00 PM, <andrew...@gmail.com> wrote:
> Hi Ian,
> Well, no it really isn't equivalent.
> Consider a programmer who writes:
> xrange(-4,3) *wants* [-4,-3,-2,-1,0,1,2]
>
> That is the "idea" of a range; for what reason would anyone *EVER* want -4 to +3 to be 6:3???

That is what ranges do, but your question was about slices, not ranges.

> So: Why does python choose to convert them to positive indexes, and have slice operate differently than xrange -- for the slice() object can't possibly know the size of the array when it is passed in to __getitem__; They are totally separate classes.

Ranges can contain negative integers. However, sequences do not have
negative indices. Therefore, negative indices in slices are used to
count from the end instead of from the start. As stated in the
language docs, "If either bound is negative, the sequence’s length is
added to it." Therefore, "a[-4:3]" does not wrap around the end of
the sequence because "a[6:3]" does not wrap around the end of the
sequence.

> I realize I can concat. two slice ranges, BUT, the ranges do not always span from negative to positive.

def wrapping_slice(seq, start, stop):
start, stop, _ = slice(start, stop).indices(len(seq))
if start <= stop:
return seq[start:stop]
else:
return seq[start:] + seq[:stop]

You'll have to decide for yourself whether you want it to return an
empty list or the entire list if start == stop.

alex23

unread,
Oct 29, 2012, 12:44:56 AM10/29/12
to
On Oct 29, 2:09 pm, Andrew <andrewr3m...@gmail.com> wrote:
> I use this arbitrary range code *often* so I need a general purpose solution.
> I looked up slice() but the help is of no use, I don't even know how I might
> overload it to embed some logic to concatenate ranges of data; nor even if
> it is possible.

Slices are passed in if provided to __getitem__/__setitem__/
__delitem__, so you'd need to override it at the list level:

class RangedSlicer(list):
def __getitem__(self, item):
# map item.start, .stop and .step to your own semantics

Then wrap your lists with your RangedSlicer class as needed.

Paul Rubin

unread,
Oct 29, 2012, 1:14:03 AM10/29/12
to
Andrew <andrew...@gmail.com> writes:
> So: Why does python choose to convert them to positive indexes, and
> have slice operate differently than xrange

There was a thread a few years back, I think started by Bryan Olson,
that made the case that slice indexing is a Python wart for further
reasons than the above, and suggesting a notation like x[$-5] to denote
what we now call x[-5] (i.e. $ is the length of the string). So your
example x[$-4:3] would clearly be the same thing as x[6:3] and not give
any suggestion that it might wrap around.

Andrew

unread,
Oct 29, 2012, 3:54:29 AM10/29/12
to Python
On Sunday, October 28, 2012 9:26:01 PM UTC-7, Ian wrote:
> On Sun, Oct 28, 2012 at 10:00 PM, Andrew wrote:
>
> > Hi Ian,
>
> > Well, no it really isn't equivalent.
>
> > Consider a programmer who writes:
>
> > xrange(-4,3) *wants* [-4,-3,-2,-1,0,1,2]
>
> >
>
> > That is the "idea" of a range; for what reason would anyone *EVER* want -4 to +3 to be 6:3???
>
>
>
> That is what ranges do, but your question was about slices, not ranges.

Actually, I said in the OP:

"I also don't understand why slice() is not equivalent to an iterator, but can replace an integer in __getitem__() whereas xrange() can't."

=========================

Thank you for the code snippet; I don't think it likely that existing programs depend on nor use a negative index and a positive index expecting to take a small chunk in the center... hence, I would return the whole array; Or if someone said [-len(listX) : len(listX)+1 ] I would return the whole array twice.
That's the maximum that is possible.
If someone could show me a normal/reasonable script which *would* expect the other behavior, I'd like to know; compatibility is important.

=========================

My intended inferences about the iterator vs. slice question was perhaps not obvious to you; Notice: an iterator is not *allowed* in __getitem__().

The slice class when passed to __getitem__() was created to merely pass two numbers and a stride to __getitem__; As far as I know slice() itself does *nothing* in the actual processing of the elements. So, it's *redundant* functionality, and far worse, it's restrictive.

The philosophy of Python is to have exactly one way to do something when possible; so, why create a stand alone class that does nothing an existing class could already do, and do it better ?

A simple list of three values would be just as efficient as slice()!
xrange is more flexible, and can be just as efficient.

So, Have I misunderstood the operation of slice()? I think I might have... but I don't know.

In 'C', where Python is written, circularly linked lists -- and arrays are both very efficient ways of accessing data. Arrays can, in fact, have negative indexes -- perhaps contrary to what you thought. One merely defines a variable to act as the base pointer to the array and initialize it to the *end* of the array. Nor is the size of the data elements an issue, since in Python all classes are accessed by pointers which are of uniform size. I routinely do this in C.

Consider, also, that xrange() does not actually create a list -- but merely an iterator generating integers which is exactly what __getitem__ works on.
So, xrange() does not need to incur a memory or noticeable time penalty.

From micro-python, it's clear that their implementation of xrange() is at the 'C' level; which is extremely fast.
Message has been deleted

andrew...@gmail.com

unread,
Oct 29, 2012, 4:08:28 AM10/29/12
to
On Sunday, October 28, 2012 10:14:03 PM UTC-7, Paul Rubin wrote:
I'm getting very frustrated with the editor provided for this group... It keeps posting prematurely, and putting my email in even when I tell it not to each time; and there is no way to edit a post... but deleting is ok...

I think Olson makes a good point. The len() operator is so ubiquitous that it would be very useful to have a shorthand like that.

I'll have to look for his thread.

I'm thinking that I might just patch my version of Python 3.x, in C, to allow iterators to be passed to __getitem__; I haven't ever seen someone wanting to use mixed sign indexes to extract a small chunk of an array in the middle; so I don't think my patch will break existing code.

The snippets of code given by other posters in the thread might also be used to make a compatibility wrapper; I'll have to study it closer; so that distributed code would still work on unpatched python, albeit much slower.

Chris Rebert

unread,
Oct 29, 2012, 4:18:32 AM10/29/12
to Andrew, Python
On Mon, Oct 29, 2012 at 12:54 AM, Andrew <andrew...@gmail.com> wrote:
> On Sunday, October 28, 2012 9:26:01 PM UTC-7, Ian wrote:
>> On Sun, Oct 28, 2012 at 10:00 PM, Andrew wrote:
<snip>
> The slice class when passed to __getitem__() was created to merely pass two numbers and a stride to __getitem__; As far as I know slice() itself does *nothing* in the actual processing of the elements. So, it's *redundant* functionality, and far worse, it's restrictive.
>
> The philosophy of Python is to have exactly one way to do something when possible; so, why create a stand alone class that does nothing an existing class could already do, and do it better ?
>
> A simple list of three values would be just as efficient as slice()!
> xrange is more flexible, and can be just as efficient.
>
> So, Have I misunderstood the operation of slice()? I think I might have... but I don't know.

`slice` is intentionally lenient about the types of the start, stop, and step:
>>> class Foo:
... def __getitem__(self, slice_):
... print(slice_)
... return 42
...
>>> Foo()["a":"b":"c"]
slice('a', 'b', 'c')
42
>>>
Thus, the thing being sliced is free to interpret the parts of the
slice however it wishes; hence, slice() is unable to contain the
"processing" you speak of.
By contrast, xrange() limits itself to integers.
To support the more general case, the slice syntax thus produces a
`slice` rather than an `xrange`.
Doubtlessly, there are also historical issues involved. As implied by
the ugliness of its name, `xrange` was added to the language
relatively later.

Cheers,
Chris
Message has been deleted

Chris Rebert

unread,
Oct 29, 2012, 4:26:59 AM10/29/12
to andrew...@gmail.com, pytho...@python.org
On Mon, Oct 29, 2012 at 1:08 AM, <andrew...@gmail.com> wrote:
> On Sunday, October 28, 2012 10:14:03 PM UTC-7, Paul Rubin wrote:
>> Andrew writes:
<snip>
> I'm getting very frustrated with the editor provided for this group... It keeps posting prematurely, and putting my email in even when I tell it not to each time; and there is no way to edit a post... but deleting is ok...

This is a Usenet newsgroup[1], not a web forum. There are noteworthy
differences between the two.
FWICT, you happen to be accessing us via Google Groups, which is
widely acknowledged to suck. We are not hosted *by* Google Groups;
they just happen to carry our posts.
Personally, I'd suggest using our mailing list mirror instead:
http://mail.python.org/mailman/listinfo/python-list
Or use some other, better newsgroup provider that also carries us.

[1]: http://en.wikipedia.org/wiki/Usenet

Regards,
Chris

Chris Rebert

unread,
Oct 29, 2012, 4:37:47 AM10/29/12
to andrew...@gmail.com, pytho...@python.org
On Mon, Oct 29, 2012 at 1:24 AM, <andrew...@gmail.com> wrote:
> On Sunday, October 28, 2012 9:44:56 PM UTC-7, alex23 wrote:
>> On Oct 29, 2:09 pm, Andrew <andrewr3m...@gmail.com> wrote:
<snip>
>> class RangedSlicer(list):
<snip>
>> Then wrap your lists with your RangedSlicer class as needed.
>
> Hmmm...
>
> I began a test in an interactive shell:
>>>> class RangedSlicer(list):
> ... def __getitem__(self,item):
> ... print item
> …

This just defines a class; it doesn't modify in-place the normal
behavior of plain lists. You have to actually *use* the class.

>>>> a=[1,2,3,4,5]

You never wrapped `a` in a RangedSlicer or otherwise made use of RangedSlicer!
You wanted:
a = RangedSlicer([1,2,3,4,5])

>>>> a.__getitem__( slice(1,5) )
> [2, 3, 4, 5]
>
> Very odd... I would have expected [1,2,3,4]

"[2, 3, 4, 5]" is the return value from `a.__getitem__( slice(1,5) )`
(or, equivalently, from `[1,2,3,4,5][1:5]`). It is not the result of
"print item"; that line of code is never executed since you never used
the RangedSlicer class at all.

Regards,
Chris

andrew...@gmail.com

unread,
Oct 29, 2012, 4:59:06 AM10/29/12
to andrew...@gmail.com, pytho...@python.org
On Monday, October 29, 2012 1:38:04 AM UTC-7, Chris Rebert wrote:
> On Mon, Oct 29, 2012 at 1:24 AM,
>
> > On Sunday, October 28, 2012 9:44:56 PM UTC-7, alex23 wrote:
>
> >> On Oct 29, 2:09 pm, Andrew < wrote:
>
> You never wrapped `a` in a RangedSlicer or otherwise made use of RangedSlicer!
>
> You wanted:
>
> a = RangedSlicer([1,2,3,4,5])
>
>
>
> >>>> a.__getitem__( slice(1,5) )
>
> > [2, 3, 4, 5]
>
> >
>
> > Very odd... I would have expected [1,2,3,4]
>
>
>
> "[2, 3, 4, 5]" is the return value from `a.__getitem__( slice(1,5) )`
>
> (or, equivalently, from `[1,2,3,4,5][1:5]`). It is not the result of
>
> "print item"; that line of code is never executed since you never used
>
> the RangedSlicer class at all.
>
>
>
> Regards,
>
> Chris

My apology --- I deleted that post; yet it didn't delete... I saw my mistake seconds after posting.

***** gmail.

Note: I subscribed to the python-list, and am able to recieve e-mails, but I don't see how to write a post for this particular thread nor subscribe to this particular thread...

A brief suggestion, or link to a howto would be *much* appreciated.

Message has been deleted

Mark Lawrence

unread,
Oct 29, 2012, 5:36:01 AM10/29/12
to pytho...@python.org
On 29/10/2012 08:59, andrew...@gmail.com wrote:
>
> Note: I subscribed to the python-list, and am able to recieve e-mails, but I don't see how to write a post for this particular thread nor subscribe to this particular thread...
>
> A brief suggestion, or link to a howto would be *much* appreciated.
>

Get yourself a decent email client. I read all the Python lists that
I'm interested in using Thunderbird on Windows via gmane.

--
Cheers.

Mark Lawrence.

Andrew Robinson

unread,
Oct 28, 2012, 10:31:57 PM10/28/12
to pytho...@python.org
Ok, hopefully this is better. I love my own e-mail editor...

I can see that the slice() function can pass in arbitrary arguments.
I'm not sure for lists, which is what the range is applied to, why an
argument like "a" would be part of a slice.
I *really* don't see what the advantage of a slice class is over a mere
list in the order of start, stop, step eg: [ 1,4,9 ]

In a dictionary, where "a" could be a key -- I wasn't aware that there
was a defined order that the idea of slice could apply to.

When I look at the documentation,
http://www.python.org/doc//current/c-api/slice

The only thing that slice has which is special, is that the the length
of the sequence can be given -- and the start and stop index are either
trimmed or an error (exception???) is thrown.

Where is the information on the more general case of slice()? :-\

I am thinking, can one use the 'super' type of access, to override --
within the list object itself -- the __getitem__ method, and after
pre-processing -- call the shadowed method with the modified
parameters? That would allow me to use the normal a[-4:6] notation,
without having to write a wrapper class that must be explicitly called.

I'm thinking something like,

PyListObject.__getitem__= lambda self, slice: ....

--Andrew.



Mark Lawrence

unread,
Oct 29, 2012, 6:10:45 AM10/29/12
to pytho...@python.org
I suggest that you go back and read the tutorial about slicing. I say
this because we've started with negative array indicies and slice() (but
Python arrays haven't been mentioned :), then moved onto (x)range and
now lists, dictionaries and the C API for slices.

An alternative is to tell us precisely what you're trying to achieve.
The odds are that there's a simple answer waiting in the wings for a
simple question.

--
Cheers.

Mark Lawrence.

Steven D'Aprano

unread,
Oct 29, 2012, 6:34:37 AM10/29/12
to
On Mon, 29 Oct 2012 01:59:06 -0700, andrewr3mail wrote:

> Note: I subscribed to the python-list, and am able to recieve e-mails,
> but I don't see how to write a post for this particular thread nor
> subscribe to this particular thread...

The beauty of email is that you don't have to subscribe to a thread. Once
you subscribe to the mailing list, email is delivered into your inbox. To
reply to it, just reply to it. To ignore it, throw it in the trash.

Gmail should have a button or three that say "Reply to email" or similar.
You want the button that says "Reply to All" or "Reply to List". Make
sure that the reply includes "pytho...@python.org" as a recipient.

Delete bits of the quoted email (the lines that start with > characters)
that are no longer relevant to the conversation. Type your reply. Double
check that the reply is going to python-list. Then hit Send.

(P.S. when you signed up for pytho...@python.org, if you selected the
option to receive a single daily digest instead of individual emails,
you're going to have a bad time. Do yourself a favour -- and the rest of
us -- and change back to individual emails.)



--
Steven

Steven D'Aprano

unread,
Oct 29, 2012, 7:19:38 AM10/29/12
to
On Mon, 29 Oct 2012 00:54:29 -0700, Andrew wrote:

> Actually, I said in the OP:
>
> "I also don't understand why slice() is not equivalent to an iterator,
> but can replace an integer in __getitem__() whereas xrange() can't."

Slices and iterators have different purposes and therefore have not been
made interchangeable. Yes, there are certain similarities between a slice
and xrange, but there are also significant differences.


> Thank you for the code snippet; I don't think it likely that existing
> programs depend on nor use a negative index and a positive index
> expecting to take a small chunk in the center...

On the contrary. That is the most straightforward and useful idea of
slicing, to grab a contiguous slice of items.

Why would you want to grab a slice from the end of the list, and a slice
from the start of the list, and swap them around? Apart from simulating
card shuffles and cuts, who does that?


> hence, I would return
> the whole array; Or if someone said [-len(listX) : len(listX)+1 ] I
> would return the whole array twice.

Well, that's one possible interpretation, but it is not the one that
Python uses. When you create your own language, you can choose whatever
interpretation seems most useful to you too.



> That's the maximum that is possible.
> If someone could show me a normal/reasonable script which *would* expect
> the other behavior, I'd like to know; compatibility is important.

I'm not entirely sure I understand what you are asking here.


> My intended inferences about the iterator vs. slice question was perhaps
> not obvious to you; Notice: an iterator is not *allowed* in
> __getitem__().

Actually, you can write __getitem__ for your own classes to accept
anything you like.

py> class Test:
... def __getitem__(self, index):
... return index
...
py> t = Test()
py> t["Hello world"]
'Hello world'
py> t[{'x': None}]
{'x': None}


> The slice class when passed to __getitem__() was created to merely pass
> two numbers and a stride to __getitem__; As far as I know slice()
> itself does *nothing* in the actual processing of the elements. So,
> it's *redundant* functionality, and far worse, it's restrictive.

You say that as if it were a bad thing.


> The philosophy of Python is to have exactly one way to do something when
> possible; so, why create a stand alone class that does nothing an
> existing class could already do, and do it better ?

What existing class is that? It certainly isn't xrange.

Because xrange represents a concrete sequence of numbers, all three of
start, end and stride must be concrete, known, integers:

py> xrange(4, None, 2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: an integer is required

Whereas slices can trivially include blanks that get filled in only when
actually used:

py> "hello world"[aslice]
'owrd'
py> "NOBODY expects the Spanish Inquisition!"[aslice]
'D xet h pns nusto!'


So, no, xrange is no substitute for slices. Not even close.


> A simple list of three values would be just as efficient as slice()!

On the contrary, a simple list of three values not only could not do
everything a slice does, but it's over twice the size!

py> sys.getsizeof([1, 2, 3])
44
py> sys.getsizeof(slice(1, 2, 3))
20


> xrange is more flexible, and can be just as efficient.

Less flexible, less efficient.


[snip]
> In 'C', where Python is written,

That's a popular misapprehension. Python is written in Java, or Lisp, or
Haskell, or CLR (dot Net), or RPython, or Ocaml, or Parrot. Each of those
languages have, or had, at least one Python implementation. Oh, there's
also a version written in C, or so I have heard.


--
Steven

Chris Angelico

unread,
Oct 29, 2012, 7:32:05 AM10/29/12
to pytho...@python.org
On Mon, Oct 29, 2012 at 10:19 PM, Steven D'Aprano
<steve+comp....@pearwood.info> wrote:
>> In 'C', where Python is written,
>
> That's a popular misapprehension. Python is written in Java, or Lisp, or
> Haskell, or CLR (dot Net), or RPython, or Ocaml, or Parrot. Each of those
> languages have, or had, at least one Python implementation. Oh, there's
> also a version written in C, or so I have heard.

And that's not including the human-brain implementation, perhaps the
most important of all. Although the current port of Python to my brain
isn't quite a complete implementation, lacking a few bits that I
should probably get to at some point, but even so, it's as useful to
me as firing up IDLE.

I wonder if what the OP is looking for is not slicing, but something
more akin to map. Start with a large object and an iterator that
produces keys, and create an iterator/list of their corresponding
values. Something like:

a=[1,2,3,4,5,6,7,8,9,10]
b=[a[i] for i in xrange(-4,3)]

It's not strictly a slice operation, but it's a similar sort of thing,
and it can do the wraparound quite happily.

ChrisA

Andrew Robinson

unread,
Oct 29, 2012, 12:52:02 AM10/29/12
to pytho...@python.org
On 10/29/2012 04:32 AM, Chris Angelico wrote:
> I wonder if what the OP is looking for is not slicing, but something
> more akin to map. Start with a large object and an iterator that
> produces keys, and create an iterator/list of their corresponding
> values. Something like: a=[1,2,3,4,5,6,7,8,9,10] b=[a[i] for i in
> xrange(-4,3)] It's not strictly a slice operation, but it's a similar
> sort of thing, and it can do the wraparound quite happily. ChrisA

A list comprehension ?
That does do what I am interested in, *very* much so. Quite a gem, Chris!

:-\
I am curious as to how quickly it constructs the result compared to a
slice operation.

Eg:
a[1:5]
vs.
[ a[i] for i in xrange[1:5] ]

But, unless it were grossly slower -- so that if/then logic and slices
were generally faster -- I will use it.
Thanks.

--Andrew.

Chris Angelico

unread,
Oct 29, 2012, 8:40:53 AM10/29/12
to pytho...@python.org
On Mon, Oct 29, 2012 at 3:52 PM, Andrew Robinson
<and...@r3dsolutions.com> wrote:
> I am curious as to how quickly it constructs the result compared to a slice
> operation.
>
> Eg:
> a[1:5]
> vs.
> [ a[i] for i in xrange[1:5] ]

For the most part, don't concern yourself with performance. Go with
functionality and readability. In the trivial case shown here, the
slice is WAY clearer, so it should definitely be the one used; in
other cases, the slice might simply be insufficient, so you go with
whatever achieves your goal. Performance will usually be "good
enough", even if there's a marginally faster way.

ChrisA

Andrew Robinson

unread,
Oct 29, 2012, 2:01:45 AM10/29/12
to pytho...@python.org
On 10/29/2012 04:19 AM, Steven D'Aprano wrote:
> On Mon, 29 Oct 2012 00:54:29 -0700, Andrew wrote:
>
> Slices and iterators have different purposes and therefore have not been
> made interchangeable. Yes, there are certain similarities between a slice
> and xrange, but there are also significant differences.
Aha, now were getting to the actual subject.

> [snip]
>> In 'C', where Python is written,
> That's a popular misapprehension. Python is written in Java, or Lisp, or
> Haskell, or CLR (dot Net), or RPython, or Ocaml, or Parrot. Each of those
> languages have, or had, at least one Python implementation. Oh, there's
> also a version written in C, or so I have heard.
:-P
I didn't say it was only written in "C", but in "C" where it is
implemented.
I will be porting Python 3.xx to a super low power embedded processor
(MSP430), both space and speed are at a premium.
Running Python on top of Java would be a *SERIOUS* mistake. .NET won't
even run on this system. etc.

>
>
>> Thank you for the code snippet; I don't think it likely that existing
>> programs depend on nor use a negative index and a positive index
>> expecting to take a small chunk in the center...
> On the contrary. That is the most straightforward and useful idea of
> slicing, to grab a contiguous slice of items.
Show me an example where someone would write a slice with a negative and
a positive index (both in the same slice);
and have that slice grab a contiguous slice in the *middle* of the list
with orientation of lower index to greater index.
I have asked before; It's not that I don't think it possible -- it's
that I can't imagine a common situation.

> Why would you want to grab a slice from the end of the list, and a slice
> from the start of the list, and swap them around? Apart from simulating
> card shuffles and cuts, who does that?
Advanced statistics programmers using lookup tables that are
symmetrical. Try Physicists too -- but they're notably weird.

>> My intended inferences about the iterator vs. slice question was perhaps
>> not obvious to you; Notice: an iterator is not *allowed* in
>> __getitem__().
> Actually, you can write __getitem__ for your own classes to accept
> anything you like.
Yes, I realize that.
But, why can't I just overload the existing __getitem__ for lists and
not bother writing an entire class?
Everything in Python is supposed to be an object and one of the big
supposed "selling" points is the ability to overload "any" object's methods.
The lists aren't special -- they're just a bunch of constant decimal
numbers, typically given as a large tuple.

>
> py> class Test:
> ... def __getitem__(self, index):
> ... return index
> ...
Better:
>>> class Test:
... def __getitem__( self, *index ):
... return index

No extra curlies required...

> You say that as if it were a bad thing.

hmmm... and you as if sarcastic? :-)
It is a bad thing to have any strictly un-necessary and non-code saving
objects where memory is restricted.

> What existing class is that? It certainly isn't xrange.
>
> Because xrange represents a concrete sequence of numbers, all three of
> start, end and stride must be concrete, known, integers:
>

Let's up the ante. I'll admit xrange() won't do "later" fill in the
blank -- BUT --
xrange() is a subclass of an *existing* class called iterator.
Iterators are very general. They can even be made random.

>> The philosophy of Python is to have exactly one way to do something when
>> possible; so, why create a stand alone class that does nothing an
>> existing class could already do, and do it better ?
> py> xrange(4, None, 2)
> Traceback (most recent call last):
> File "<stdin>", line 1, in<module>
> TypeError: an integer is required

Hmmm..
Let's try your example exactly as shown...

"hello world"[aslice]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'aslice' is not defined

WOW. Cool.
Where did the blanks *actually* get filled in? Or HOW WILL they in your
next post?
>
> On the contrary, a simple list of three values not only could not do
> everything a slice does, but it's over twice the size!
Yeah, There is a definite issue there. But the case isn't decided by
that number alone.
A slice is storing three integers -- and an integer is size is 12.
So, slices don't use integers. If the type that *IS* used happens to be
a real Python type, we may merely typecast integers to that type --
insert them in a tuple and by definition, they must be the same size.

Looking at some of the online programming notes -- a slice apparently
doesn't use an integer storage variable that is capable of arbitrary
expansion. =-O -- and hence, won't work for very large sized lists.
That actually explains some crashes I have noted in the past when
working with 20 million element lists that I wanted a slice of. I had
*plenty* of ram on that system.
Besides: The program code to implement slice() is undoubtedly larger
than 12 bytes of savings!
How many slices() are typically found in memory simultaneously?

You're on the way to convincing me -- but you need to show off this fill
in the blank issue.
That's a difference of kind -- and not of size.
An iterator, could be an object with methods... So what common
application uses this "fill" in the blank stuff?




Roy Smith

unread,
Oct 29, 2012, 9:52:50 AM10/29/12
to
In article <mailman.3009.1351516...@python.org>,
Andrew Robinson <and...@r3dsolutions.com> wrote:

> Show me an example where someone would write a slice with a negative and
> a positive index (both in the same slice);
> and have that slice grab a contiguous slice in the *middle* of the list
> with orientation of lower index to greater index.

It's possible in bioinformatics. Many organisms have circular
chromosomes. It's a single DNA molecule spliced into a loop. There's
an "origin", but it's more a convenience thing for people to assign some
particular base-pair to be location 0. From the organism's point of
view, the origin isn't anything special (but there *is* a fixed
orientation).

It's entirely plausible for somebody to want to extract the sub-sequence
from 100 bp (base-pairs) before the origin to 100 bp after the origin.
If you were storing the sequence in Python string (or list), the most
convenient way to express this would be seq[-100:100]. Likewise, if you
wanted the *other* fragment, you would write seq[100:-100].

There is a minor confounding factor here in that biologists number
sequences starting with 1, not 0. At least that was the way when I was
doing this stuff mumble years ago. I don't know what the current
convention is.

Chris Angelico

unread,
Oct 29, 2012, 9:53:13 AM10/29/12
to pytho...@python.org
On Mon, Oct 29, 2012 at 5:01 PM, Andrew Robinson
<and...@r3dsolutions.com> wrote:
> Looking at some of the online programming notes -- a slice apparently
> doesn't use an integer storage variable that is capable of arbitrary
> expansion. =-O -- and hence, won't work for very large sized lists. That
> actually explains some crashes I have noted in the past when working with 20
> million element lists that I wanted a slice of. I had *plenty* of ram on
> that system.

Can you provide links to these notes? I'm looking at
cpython/Include/sliceobject.h that has this comment:

/*

A slice object containing start, stop, and step data members (the
names are from range). After much talk with Guido, it was decided to
let these be any arbitrary python type. Py_None stands for omitted values.
*/

Also, the code for slice objects in CPython works with Py_ssize_t (a
signed quantity of the same length as size_t), which will allow at
least 2**31 for an index. I would guess that your crashes were nothing
to do with 20 million elements and slices.

ChrisA

Ian Kelly

unread,
Oct 29, 2012, 1:09:01 PM10/29/12
to Python
On Oct 29, 2012 7:10 AM, "Andrew Robinson" <and...@r3dsolutions.com> wrote:
> I will be porting Python 3.xx to a super low power embedded processor (MSP430), both space and speed are at a premium.
> Running Python on top of Java would be a *SERIOUS* mistake. .NET won't even run on this system. etc.

If that's the case, then running Python at all is probably a mistake.
You know the interpreter alone has an overhead of nearly 6 MB?

> Yes, I realize that.
> But, why can't I just overload the existing __getitem__ for lists and not bother writing an entire class?

You can just overload that one method in a subclass of list. Being
able to monkey-patch __getitem__ for the list class itself would not
be advisable, as it would affect all list slicing anywhere in your
program and possibly lead to some unexpected behaviors.

> Hmmm..
> Let's try your example exactly as shown...
>
> "hello world"[aslice]
>
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> NameError: name 'aslice' is not defined
>
> WOW. Cool.
> Where did the blanks *actually* get filled in? Or HOW WILL they in your next post?

It appears that Steven omitted the definition of aslice by mistake.
It looks like it should have been:

aslice = slice(4, None, 2)

> Looking at some of the online programming notes -- a slice apparently doesn't use an integer storage variable that is capable of arbitrary expansion. =-O -- and hence, won't work for very large sized lists. That actually explains some crashes I have noted in the past when working with 20 million element lists that I wanted a slice of. I had *plenty* of ram on that system.

20 million is nothing. On a 32-bit system, sys.maxsize == 2 ** 31 -
1. If the error you were seeing was MemoryError, then more likely you
were running into dynamic allocation issues due to fragmentation of
virtual memory.

Ian Kelly

unread,
Oct 29, 2012, 1:27:47 PM10/29/12
to Python
On Mon, Oct 29, 2012 at 1:54 AM, Andrew <andrew...@gmail.com> wrote:
> My intended inferences about the iterator vs. slice question was perhaps not obvious to you; Notice: an iterator is not *allowed* in __getitem__().

Yes, I misconstrued your question. I thought you wanted to change the
behavior of slicing to wrap around the end when start > stop instead
of returning an empty sequence. What you actually want is a new
sequence built from indexes supplied by an iterable. Chris has
already given you a list comprehension solution to solve that. You
could also use map for this:

new_seq = list(map(old_seq.__getitem__, iterable))

Since you seem to be concerned about performance, I'm not sure in this
case whether the map or the list comprehension will be faster. I'll
leave you to test that on your intended hardware.

> In 'C', where Python is written, circularly linked lists -- and arrays are both very efficient ways of accessing data. Arrays can, in fact, have negative indexes -- perhaps contrary to what you thought. One merely defines a variable to act as the base pointer to the array and initialize it to the *end* of the array. Nor is the size of the data elements an issue, since in Python all classes are accessed by pointers which are of uniform size. I routinely do this in C.

I'm aware of what is possible in C with pointer arithmetic. This is
Python, though, and Python by design has neither pointers nor pointer
arithmetic. In any case, initializing the pointer to the end of the
array would still not do what you want, since the positive indices
would then extend past the end of the array.

Steven D'Aprano

unread,
Oct 29, 2012, 6:02:10 PM10/29/12
to
Slicing is about an order of magnitude faster:


[steve@ando ~]$ python2.7 -m timeit -s "x = range(100, 1000, 2)" "x
[20:40]"
1000000 loops, best of 3: 0.342 usec per loop
[steve@ando ~]$ python2.7 -m timeit -s "x = range(100, 1000, 2)" "[x[i]
for i in xrange(20, 40)]"
100000 loops, best of 3: 3.43 usec per loop


--
Steven

Steven D'Aprano

unread,
Oct 29, 2012, 6:14:41 PM10/29/12
to
On Mon, 29 Oct 2012 11:19:38 +0000, Steven D'Aprano wrote:

> Because xrange represents a concrete sequence of numbers, all three of
> start, end and stride must be concrete, known, integers:
>
> py> xrange(4, None, 2)
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> TypeError: an integer is required
>
> Whereas slices can trivially include blanks that get filled in only when
> actually used:
>
> py> "hello world"[aslice]
> 'owrd'
> py> "NOBODY expects the Spanish Inquisition!"[aslice]
> 'D xet h pns nusto!'

/me facepalms/


Argggh, I forgot to copy-and-paste the critical line defining aslice:

aslice = slice(4, None, 2)


Sorry about that.



--
Steven

Andrew Robinson

unread,
Oct 29, 2012, 11:20:26 AM10/29/12
to pytho...@python.org
On 10/29/2012 06:52 AM, Roy Smith wrote:
> Show me an example where someone would write a slice with a negative and
> a positive index (both in the same slice);
> and have that slice grab a contiguous slice in the *middle* of the list
> with orientation of lower index to greater index.
> It's possible in bioinformatics. ...
> eq[100:-100].
I decided to go to bed... I was starting to write very badly worded
responses. :)

Thanks, Roy, what you have just shown is another example that agrees
with what I am trying to do.
FYI: I was asking for a reason why Python's present implementation is
desirable...

I wonder, for example:

Given an arbitrary list:
a=[1,2,3,4,5,6,7,8,9,10,11,12]

Why would someone *want* to do:
a[-7,10]
Instead of saying
a[5:10] or a[-7:-2] ?

eg:
What algorithm would naturally *desire* the default behavior of slicing
when using *mixed* negative and positive indexes?
In the case of a bacterial circular DNA/RNA ring, asking for codons[
-10: 10 ] would logically desire codon[-10:] + codon[:10] not an empty
list, right?

I think your example is a very reasonable thing the scientific community
would want to do with Python.
:)

Andrew Robinson

unread,
Oct 29, 2012, 11:42:39 AM10/29/12
to pytho...@python.org
On 10/29/2012 10:09 AM, Ian Kelly wrote:
> On Oct 29, 2012 7:10 AM, "Andrew Robinson"<and...@r3dsolutions.com> wrote:
>> I will be porting Python 3.xx to a super low power embedded processor (MSP430), both space and speed are at a premium.
>> Running Python on top of Java would be a *SERIOUS* mistake. .NET won't even run on this system. etc.
> If that's the case, then running Python at all is probably a mistake.
> You know the interpreter alone has an overhead of nearly 6 MB?
There's already a version of the python interpreter which fits in under
100K:
http://code.google.com/p/python-on-a-chip/
It's not the 3.x series, though; and I don't want to redo this once 2.7
really does become obsolete.
>> Yes, I realize that.
>> But, why can't I just overload the existing __getitem__ for lists and not bother writing an entire class?
> You can just overload that one method in a subclass of list. Being
> able to monkey-patch __getitem__ for the list class itself would not
> be advisable, as it would affect all list slicing anywhere in your
> program and possibly lead to some unexpected behaviors.
That's what I am curious about.
What unexpected behaviors would a "monkey patch" typically cause?
If no one really uses negative and positive indexes in the same slice
operation, because there is no reason to do so...
It will only break the occasional esoteric application.

>
> 20 million is nothing. On a 32-bit system, sys.maxsize == 2 ** 31 -
> 1. If the error you were seeing was MemoryError, then more likely you
> were running into dynamic allocation issues due to fragmentation of
> virtual memory.
>
>

No, there was no error at all. Pthon just crashed & exited; not even an
exception that I can recall. It was if it exited normally!

The list was generated in a single pass by many .append() 's, and then
copied once -- the original was left in place; and then I attempted to
slice it.

I am able to routinely to 5 million length lists, copy, slice, cut,
append, and delete from them without this ever happening.
If fragmentation were the issue, I'd think the shorter lists would cause
the problem after many manipulations...

It may not be a bug in python itself, though, of course. There are
libraries it uses which might have a bug.

Chris Angelico

unread,
Oct 29, 2012, 6:55:23 PM10/29/12
to pytho...@python.org
On Tue, Oct 30, 2012 at 2:42 AM, Andrew Robinson
<and...@r3dsolutions.com> wrote:
> No, there was no error at all. Pthon just crashed & exited; not even an
> exception that I can recall. It was if it exited normally!

Can you create a reproducible test case? There's usually a cause to
these sorts of things.

ChrisA

Ian Kelly

unread,
Oct 29, 2012, 7:01:29 PM10/29/12
to Python
On Mon, Oct 29, 2012 at 9:20 AM, Andrew Robinson
<and...@r3dsolutions.com> wrote:
> FYI: I was asking for a reason why Python's present implementation is
> desirable...
>
> I wonder, for example:
>
> Given an arbitrary list:
> a=[1,2,3,4,5,6,7,8,9,10,11,12]
>
> Why would someone *want* to do:
> a[-7,10]
> Instead of saying
> a[5:10] or a[-7:-2] ?

A quick search of local code turns up examples like this:

if name.startswith('{') and name.endswith('}'):
name = name[1:-1]

If slices worked like ranges, then the result of that would be empty,
which is obviously not desirable.

I don't know of a reason why one might need to use a negative start
with a positive stop, though.

Ian Kelly

unread,
Oct 29, 2012, 7:07:53 PM10/29/12
to Python
On Mon, Oct 29, 2012 at 9:42 AM, Andrew Robinson
<and...@r3dsolutions.com> wrote:
> The list was generated in a single pass by many .append() 's, and then
> copied once -- the original was left in place; and then I attempted to slice
> it.

Note that if the list was generated by .appends, then it was copied
more than once. Python reserves a specific amount of space for the
list. When it grows past that, the list must be reallocated and
copied. It grows the list exponentially in order to keep the
amortized time complexity of append at O(1), but the point is that a
list of 20 million items is going to be resized and copied several
times before it is complete.

Roy Smith

unread,
Oct 29, 2012, 7:24:10 PM10/29/12
to
In article <mailman.3056.1351552...@python.org>,
I think you're missing the point of "amortized constant time". Yes, the
first item appended to the list will be copied lg(20,000,000) ~= 25
times, because the list will be resized that many times(*). But, on
average (I'm not sure if "average" is strictly the correct word here),
each item will be copied only once.

Still, it always stuck me as odd that there's no preallocate() method.
There are times when you really do know how many items you're going to
add to the list, and doing a single allocation would be a win. And it
doesn't cost anything to provide it. I suppose, however, if you're
adding enough items that preallocating would really matter, then maybe
you want an array instead.

(*) I don't know the exact implementation; I'm assuming each resize is a
factor of 2.

Ian Kelly

unread,
Oct 29, 2012, 7:43:03 PM10/29/12
to Python
On Mon, Oct 29, 2012 at 5:24 PM, Roy Smith <r...@panix.com> wrote:
> I think you're missing the point of "amortized constant time". Yes, the
> first item appended to the list will be copied lg(20,000,000) ~= 25
> times, because the list will be resized that many times(*). But, on
> average (I'm not sure if "average" is strictly the correct word here),
> each item will be copied only once.
>
> Still, it always stuck me as odd that there's no preallocate() method.
> There are times when you really do know how many items you're going to
> add to the list, and doing a single allocation would be a win. And it
> doesn't cost anything to provide it. I suppose, however, if you're
> adding enough items that preallocating would really matter, then maybe
> you want an array instead.
>
> (*) I don't know the exact implementation; I'm assuming each resize is a
> factor of 2.

The growth factor is approximately 1.125. "Approximately" because
there is also a small constant term. The average number of copies per
item converges on 8.

Steven D'Aprano

unread,
Oct 29, 2012, 8:02:25 PM10/29/12
to
On Mon, 29 Oct 2012 08:42:39 -0700, Andrew Robinson wrote:

>>> But, why can't I just overload the existing __getitem__ for lists and
>>> not bother writing an entire class?

You say that as if writing "an entire class" was a big complicated
effort. It isn't. It is trivially simple, a single line:

class MyList(list):
...


plus the __getitem__ definition, which you would have to write anyway. It
is a trivial amount of extra effort.


>> You can just overload that one method in a subclass of list. Being
>> able to monkey-patch __getitem__ for the list class itself would not be
>> advisable, as it would affect all list slicing anywhere in your program
>> and possibly lead to some unexpected behaviors.
>
> That's what I am curious about.
> What unexpected behaviors would a "monkey patch" typically cause?

What part of "unexpected" is unclear?

Monkey-patching is poor programming technique because it leads to
*unexpected* and *impossible to predict* interactions between *distant*
parts of the code. It leads to impossible to predict differences between
the source code on disk and the actual running code. It leads to
impossible to predict differences between documented behaviour and actual
behaviour.

Let me see if I can illustrate a flavour of the sort of things that can
happen if monkey-patching built-ins were allowed.

You create a list and print it:

# simulated output
py> x = [5, 2, 4, 1]
py> print(x)
[1, 2, 4, 5]

What? How did that happen? That's not the list you provided. The order
has been lost.

So you dig deep into your code, and you don't find anything. And you read
the Python documentation for lists, and don't find anything. And you
google the Internet, and don't find anything. And you ask for help, and
everybody says you're crazy because when they duplicate your code they
get the expected behaviour. And you report a bug in Python, and it gets
closed as "cannot replicate".

Finally you search deep into the libraries used in your code, and *five
days later* discover that your code uses library A which uses library B
which uses library C which uses library D which installs a harmless
monkey-patch to print, but only if library E is installed, and you just
happen to have E installed even though your code never uses it, AND that
monkey-patch clashes with a harmless monkey-patch to list.__getitem__
installed by library F. And even though each monkey-patch alone is
harmless, the combination breaks your code's output.



Python allows, but does not encourage, monkey-patching of code written in
pure Python, because it sometimes can be useful. It flat out prohibits
monkey-patching of builtins, because it is just too dangerous.

Ruby allows monkey-patching of everything. And the result was predictable:

http://devblog.avdi.org/2008/02/23/why-monkeypatching-is-destroying-ruby/


--
Steven

Oscar Benjamin

unread,
Oct 29, 2012, 8:04:22 PM10/29/12
to Ian Kelly, Python
On 29 October 2012 23:01, Ian Kelly <ian.g...@gmail.com> wrote:
> On Mon, Oct 29, 2012 at 9:20 AM, Andrew Robinson
> <and...@r3dsolutions.com> wrote:
>> FYI: I was asking for a reason why Python's present implementation is
>> desirable...
>>
>> I wonder, for example:
>>
>> Given an arbitrary list:
>> a=[1,2,3,4,5,6,7,8,9,10,11,12]
>>
>> Why would someone *want* to do:
>> a[-7,10]
>> Instead of saying
>> a[5:10] or a[-7:-2] ?
>
> A quick search of local code turns up examples like this:
>
> if name.startswith('{') and name.endswith('}'):
> name = name[1:-1]
>
> If slices worked like ranges, then the result of that would be empty,
> which is obviously not desirable.
>
> I don't know of a reason why one might need to use a negative start
> with a positive stop, though.

It's useful when getting a reversed slice:

>>> a = [1,2,3,4,5,6,7,8,9,10]
>>> a[-3:3:-1]
[8, 7, 6, 5]


Oscar

Ian Kelly

unread,
Oct 29, 2012, 8:05:27 PM10/29/12
to Python
On Mon, Oct 29, 2012 at 5:43 PM, Ian Kelly <ian.g...@gmail.com> wrote:
> The growth factor is approximately 1.125. "Approximately" because
> there is also a small constant term. The average number of copies per
> item converges on 8.

Of course, that is the *maximum* number of copies. The actual number
could be much less if realloc() performs well.

Roy Smith

unread,
Oct 29, 2012, 8:17:17 PM10/29/12
to
In article <mailman.3057.1351554...@python.org>,
Wow, that's surprising. It also makes it that much more surprising that
there's no way to pre-allocate.

Andrew Robinson

unread,
Oct 29, 2012, 2:00:08 PM10/29/12
to pytho...@python.org
On 10/29/2012 06:53 AM, Chris Angelico wrote:
> Can you provide links to these notes? I'm looking at
> cpython/Include/sliceobject.h that has this comment:
>
> /*
>
> A slice object containing start, stop, and step data members (the
> names are from range). After much talk with Guido, it was decided to
> let these be any arbitrary python type. Py_None stands for omitted values.
> */
>
> Also, the code for slice objects in CPython works with Py_ssize_t (a
> signed quantity of the same length as size_t), which will allow at
> least 2**31 for an index. I would guess that your crashes were nothing
> to do with 20 million elements and slices.
>
> ChrisA
Let's look at the source code rather than the web notes -- the source
must be the true answer anyhow.

I downloaded the source code for python 3.3.0, as the tbz;
In the directory "Python-3.3.0/Python", look at Python-ast.c, line 2089
& ff.

Clearly a slice is malloced for a slice_ty type.
It has four elements: kind, lower, upper, and step.

So, tracing it back to the struct definition...

"Include/Python-ast.h" has "typedef struct _slice *slice_ty;"

And, here's the answer!:

enum _slice_kind {Slice_kind=1, ExtSlice_kind=2, Index_kind=3};
struct _slice {
enum _slice_kind kind;
union {
struct {
expr_ty lower;
expr_ty upper;
expr_ty step;
} Slice;

struct {
asdl_seq *dims;
} ExtSlice;

struct {
expr_ty value;
} Index;

} v;
};


So, slice() does indeed have arbitrary python types included in it;
contrary to what I read elsewhere.
expr_ty is a pointer to an arbitrary expression, so the actual structure
is 4 pointers, at 32 bits each = 16 bytes.
The size of the structure itself, given in an earlier post, is 20 bytes
-- which means one more pointer is involved, perhaps the one pointing to
the slice structure itself.

Hmm...!

An empty tuple gives sys.getsizeof( () ) = 24.

But, I would expect a tuple to be merely a list of object pointers;
hence I would expect 4 bytes for len(), and then a head pointer 4 bytes,
and then a pointer for each object.
3 objects gives 12 bytes, + 8 = 16 bytes.

Then we need one more pointer so Python knows where the struct is...
So a Tuple of 3 objects ought to fit nicely into 20 bytes; the same size
as slice() --

but it's 24, even when empty...
And 36 when initialized...
What are the extra 16 bytes for?

All I see is:
typedef struct { object** whatever } PyTupleObject;







Chris Kaynor

unread,
Oct 29, 2012, 9:49:19 PM10/29/12
to pytho...@python.org
Every Python object requires two pieces of data, both of which are
pointer-sized (one is a pointer, one is an int the size of a pointer).
These are: a pointer to the object's type, and the object's reference
count.

A tuple actually does not need a head pointer: the head pointer is
merely an offset from the tuple's pointer. It merely has a ref count,
type, an item count, and pointers to its contents.

A slice has the same type pointer and reference count, then three
pointers to the start, stop, and step objects. This means a slice
object should be the same size as a two-item tuple: the tuple needs a
count, while that is fixed at 3 for a slice (though some items may be
unset).

NOTE: The above is taken from reading the source code for Python 2.6.
For some odd reason, I am getting that an empty tuple consists of 6
pointer-sized objects (48 bytes on x64), rather than the expected 3
pointer-sized (24 bytes on x64). Slices are showing up as the expected
5 pointer-sized (40 bytes on x64), and tuples grow at the expected 1
pointer (8 bytes on x64) per item. I imagine I am missing something,
but cannot figure out what that would be.

Andrew Robinson

unread,
Oct 29, 2012, 3:34:24 PM10/29/12
to pytho...@python.org
On 10/29/2012 05:02 PM, Steven D'Aprano wrote:
> On Mon, 29 Oct 2012 08:42:39 -0700, Andrew Robinson wrote:
>
>>>> But, why can't I just overload the existing __getitem__ for lists and
>>>> not bother writing an entire class?
> You say that as if writing "an entire class" was a big complicated
> effort. It isn't. It is trivially simple, a single line:
>
> class MyList(list):
> ...
No, I don't think it big and complicated. I do think it has timing
implications which are undesirable because of how *much* slices are used.
In an embedded target -- I have to optimize; and I will have to reject
certain parts of Python to make it fit and run fast enough to be useful.

>>> You can just overload that one method in a subclass of list. Being
>>> able to monkey-patch __getitem__ for the list class itself would not be
>>> advisable, as it would affect all list slicing anywhere in your program
>>> and possibly lead to some unexpected behaviors.
>> That's what I am curious about.
>> What unexpected behaviors would a "monkey patch" typically cause?
> What part of "unexpected" is unclear?
>
Ahh -- The I don't know approach! It's only unexpected if one is a bad
programmer...!
> Let me see if I can illustrate a flavour of the sort of things that can
> happen if monkey-patching built-ins were allowed.
>
> You create a list and print it:
>
> # simulated output
> py> x = [5, 2, 4, 1]
> py> print(x)
> [1, 2, 4, 5]
<snip>

Finally you search deep into the libraries used in your code, and *five
days later* discover that your code uses library A which uses library B
which uses library C which uses library D which installs a harmless
monkey-patch to print, but only if library E is installed, and you just
happen to have E installed even though your code never uses it, AND that
monkey-patch clashes with a harmless monkey-patch to list.__getitem__
installed by library F. And even though each monkey-patch alone is
harmless, the combination breaks your code's output.

Right, which means that people developing the libraries made
contradictory assumptions.

> Python allows, but does not encourage, monkey-patching of code written in
> pure Python, because it sometimes can be useful. It flat out prohibits
> monkey-patching of builtins, because it is just too dangerous.
>
> Ruby allows monkey-patching of everything. And the result was predictable:
>
> http://devblog.avdi.org/2008/02/23/why-monkeypatching-is-destroying-ruby/
>

I read that post carefully; and the author purposely notes that he is
exaggerating.
BUT Your point is still well taken.

What you are talking about is namespace preservation; and I am thinking
about it. I can preserve it -- but only if I disallow true Python
primitives in my own interpreter; I can't provide two sets in the memory
footprint I am using.

From my perspective, the version of Python that I compile will not be
supported by the normal python help; The predecessor which first forged
this path, Pymite, has the same problems -- however, the benefits
ought-weigh the disadvantages; and the experiment yielded useful
information on what is redundant in Python (eg: range is not supported)
and when that redundancy is important for some reason.

If someone had a clear explanation of the disadvantages of allowing an
iterator, or a tuple -- in place of a slice() -- I would have no qualms
dropping the subject. However, I am not finding that yet. I am finding
very small optimization issues...

The size of an object is at least 8 bytes. Hence, three numbers is
going to be at least 24 bytes; and that's 24 bytes in *excess* of the
size of slice() or tuple () which are merely containers. So -- There
*ARE* savings in memory when using slice(), but it isn't really 2x
memory -- its more like 20% -- once the actual objects are considered.

The actual *need* for a slice() object still hasn't been demonsrated. I
am thinking that the implementation of __getitem__() is very poor
probably because of legacy issues.

A tuple can also hold None, so ( 1, None, 2 ) is still a valid Tuple.
Alternately: An iterator, like xrange(), could be made which takes None
as a parameter, or a special value like 'inf'.
Since these two values would never be passed to xrange by already
developed code, allowing them would not break working code.

I am only aware of one possible reason that slice() was once thought to
be necessary; and that is because accessing the element of a tuple would
recursively call __getitem__ on the tuple. But, even that is easily
dismissed once the fixed integer indexes are considered.

Your thoughts? Do you have any show stopper insights?








Andrew Robinson

unread,
Oct 29, 2012, 6:39:42 PM10/29/12
to pytho...@python.org
On 10/29/2012 06:49 PM, Chris Kaynor wrote:
> Every Python object requires two pieces of data, both of which are
> pointer-sized (one is a pointer, one is an int the size of a pointer).
> These are: a pointer to the object's type, and the object's reference
> count. A tuple actually does not need a head pointer: the head pointer
> is merely an offset from the tuple's pointer. It merely has a ref
> count, type, an item count, and pointers to its contents. A slice has
> the same type pointer and reference count, then three pointers to the
> start, stop, and step objects. This means a slice object should be the
> same size as a two-item tuple: the tuple needs a count, while that is
> fixed at 3 for a slice (though some items may be unset). NOTE: The
> above is taken from reading the source code for Python 2.6. For some
> odd reason, I am getting that an empty tuple consists of 6
> pointer-sized objects (48 bytes on x64), rather than the expected 3
> pointer-sized (24 bytes on x64). Slices are showing up as the expected
> 5 pointer-sized (40 bytes on x64), and tuples grow at the expected 1
> pointer (8 bytes on x64) per item. I imagine I am missing something,
> but cannot figure out what that would be.
>> All I see is:
>> typedef struct { object** whatever } PyTupleObject;
>>
It's fairly straight forward in 3.2.0. I debugged the code with GDB and
watched.
Perhaps it is the same in 2.6 ?

In addition to those items you mention, of which the reference count is
not even *inside* the struct -- there is additional debugging
information not mentioned. Built in objects contain a "line number", a
"column number", and a "context" pointer. These each require a full
word of storage.

Also, built in types appear to have a "kind" field which indicates the
object "type" but is not a pointer. That suggests two "object" type
indicators, a generic pointer (probably pointing to "builtin"? somewhere
outside the struct) and a specific one (an enum) inside the "C" struct.

Inside the tuple struct, I count 4 undocumented words of information.
Over all, there is a length, the list of pointers, a "kind", "line",
"col" and "context"; making 6 pieces in total.

Although your comment says the head pointer is not required; I found in
3.3.0 that it is a true head pointer; The Tuple() function on line 2069
of Python-ast.c, (3.3 version) -- is passed in a pointer called *elts.
That pointer is copied into the Tuple struct.

How ironic, slices don't have debugging info, that's the main reason
they are smaller.
When I do slice(3,0,2), suprisingly "Slice()" is NOT called.
But when I do a[1:2:3] it *IS* called.




Michael Torrie

unread,
Oct 30, 2012, 1:53:28 AM10/30/12
to pytho...@python.org
On 10/29/2012 01:34 PM, Andrew Robinson wrote:
> No, I don't think it big and complicated. I do think it has timing
> implications which are undesirable because of how *much* slices are used.
> In an embedded target -- I have to optimize; and I will have to reject
> certain parts of Python to make it fit and run fast enough to be useful.

Since you can't port the full Python system to your embedded machine
anyway, why not just port a subset of python and modify it to suit your
needs right there in the C code. It would be a fork, yes, but any port
to this target will be a fork anyway. No matter how you cut it, it
won't be easy at all, and won't be easy to maintain. You'll basically
be writing your own implementation of Python (that's what
python-on-a-chip is, and that's why it's closer to Python 2.x than
Python 3). That's totally fine, though. I get the impression you think
you will be able to port cPython as is to your target. Without a libc,
an MMU on the CPU, and a kernel, it's not going to just compile and run.

Anyway, the only solution, given your constraints, is to implement your
own python interpreter to handle a subset of Python, and modify it to
suit your tastes. What you want with slicing behavior changes has no
place in the normal cPython implementation, for a lot of reasons. The
main one is that it is already possible to implement what you are
talking about in your own python class, which is a fine solution for a
normal computer with memory and CPU power available.

Ian Kelly

unread,
Oct 30, 2012, 1:55:00 AM10/30/12
to Python
On Mon, Oct 29, 2012 at 12:00 PM, Andrew Robinson
<and...@r3dsolutions.com> wrote:
> I downloaded the source code for python 3.3.0, as the tbz;
> In the directory "Python-3.3.0/Python", look at Python-ast.c, line 2089 &
> ff.

Python-ast.c is part of the compiler code. That's not the struct used
to represent the object at runtime, but the struct used to represent
the AST node while compiling.

For the runtime definition of slices, look at sliceobject.h and
sliceobject.c. Slices are represented as:

typedef struct {
PyObject_HEAD
PyObject *start, *stop, *step; /* not NULL */
} PySliceObject;

"PyObject_HEAD" is a macro that incorporates the object type pointer
and the reference count. Hence, 5 words.

Andrew Robinson

unread,
Oct 29, 2012, 7:33:17 PM10/29/12
to pytho...@python.org
Hi Ian,

There are several interesting/thoughtful things you have written.
I like the way you consider a problem before knee jerk answering.

The copying you mention (or realloc) doesn't re-copy the objects on the
list.
It merely re-copies the pointer list to those objects. So lets see what
it would do...

I have seen doubling as the supposed re-alloc method, but I'll assume
1.25 --
so, 1.25**x = 20million, is 76 copies (max).

The final memory copy would leave about a 30MB hole.
And my version of Python operates initially with a 7MB virtual footprint.

Sooo.... If the garbage collection didn't operate at all, the copying
would waste around:

>>> z,w = 30e6,0
>>> while (z>1): w,z = w+z, z/1.25
...
>>> print(w)
149999995.8589521

eg: 150MB cummulative.
The doubles would amount to 320Megs max.

Not enough to fill virtual memory up; nor even cause a swap on a 2GB
memory machine.
It can hold everything in memory at once.

So, I don't think Python's memory management is the heart of the problem,
although memory wise-- it does require copying around 50% of the data.

As an implementation issue, though, the large linear array may cause
wasteful caching/swapping loops, esp, on smaller machines.

On 10/29/2012 10:27 AM, Ian Kelly wrote:
> Yes, I misconstrued your question. I thought you wanted to change the
> behavior of slicing to wrap around the end when start> stop instead
> of returning an empty sequence. ... Chris has
> already given ... You
> could also use map for this:
>
> new_seq = list(map(old_seq.__getitem__, iterable))
MMM... interesting.

I am not against changing the behavior, but I do want solutions like you
are offering.
As I am going to implement a python interpreter, in C, being able to do
things differently could significantly reduce the interpreter's size.

However, I want to break existing scripts very seldom...

> I'm aware of what is possible in C with pointer arithmetic. This is
> Python, though, and Python by design has neither pointers nor pointer
> arithmetic. In any case, initializing the pointer to the end of the
> array would still not do what you want, since the positive indices
> would then extend past the end of the array.

Yes, *and* if you have done assembly language programming -- you know
that testing for sign is a trivial operation. It doesn't even require a
subtraction. Hence, at the most basic machine level -- changing the
base pointer *once* during a slice operation is going to be far more
efficient than performing multiple subtractions from the end of an
array, as the Python API defines.
I'll leave out further gory details... but it is a Python interpreter
built in "C" issue.

Ian Kelly

unread,
Oct 30, 2012, 2:51:19 AM10/30/12
to Python
On Mon, Oct 29, 2012 at 4:39 PM, Andrew Robinson
<and...@r3dsolutions.com> wrote:
> In addition to those items you mention, of which the reference count is not
> even *inside* the struct -- there is additional debugging information not
> mentioned. Built in objects contain a "line number", a "column number", and
> a "context" pointer. These each require a full word of storage.
>
> Also, built in types appear to have a "kind" field which indicates the
> object "type" but is not a pointer. That suggests two "object" type
> indicators, a generic pointer (probably pointing to "builtin"? somewhere
> outside the struct) and a specific one (an enum) inside the "C" struct.
>
> Inside the tuple struct, I count 4 undocumented words of information.
> Over all, there is a length, the list of pointers, a "kind", "line", "col"
> and "context"; making 6 pieces in total.
>
> Although your comment says the head pointer is not required; I found in
> 3.3.0 that it is a true head pointer; The Tuple() function on line 2069 of
> Python-ast.c, (3.3 version) -- is passed in a pointer called *elts. That
> pointer is copied into the Tuple struct.

As above, you're looking at the compiler code, which is why you're
finding things like "line" and "column". The tuple struct is defined
in tupleobject.h and stores tuple elements in a tail array.

> How ironic, slices don't have debugging info, that's the main reason they
> are smaller.
> When I do slice(3,0,2), suprisingly "Slice()" is NOT called.
> But when I do a[1:2:3] it *IS* called.

Because compiling the latter involves parsing slicing syntax, and
compiling the former does not. :-)

Andrew Robinson

unread,
Oct 29, 2012, 7:54:41 PM10/29/12
to pytho...@python.org
On 10/29/2012 04:01 PM, Ian Kelly wrote:
> On Mon, Oct 29, 2012 at 9:20 AM, Andrew Robinson
> <and...@r3dsolutions.com> wrote:
>> FYI: I was asking for a reason why Python's present implementation is
>> desirable...
>>
>> I wonder, for example:
>>
>> Given an arbitrary list:
>> a=[1,2,3,4,5,6,7,8,9,10,11,12]
>>
>> Why would someone *want* to do:
>> a[-7,10]
>> Instead of saying
>> a[5:10] or a[-7:-2] ?
> A quick search of local code turns up examples like this:
>
> if name.startswith('{') and name.endswith('}'):
> name = name[1:-1]
Which is done to avoid explicitly calling the len() operator.
> If slices worked like ranges, then the result of that would be empty,
> which is obviously not desirable.
Yes, and that's an excellent point -- but note what I am showing in the
example. It is that example, which I am specifying. There are only two
cases where I think the default behavior of Python gives undesirable
results:

The step is positive, and the pair of indexes goes from negative to
positive.
Likewise, If the pair went from positive to negative, and the step was
negative.

In all other combinations, the default behavior of python ought to
remain intact.
I apologize for not making this crystal clear -- I thought you would
focus on the specific example I gave.

> I don't know of a reason why one might need to use a negative start
> with a positive stop, though.
I've already given several examples; and another poster did too -- eg:
Gene sequences for bacteria. It's not uncommon to need this. If I do
some digging, I can also show some common graphics operations that
benefit greatly from this ability -- NOTE: in another thread I just
showed someone how to operate on RGBA values... Slicing becomes THE
major operation done when converting, or blitting, graphics data. etc.

Another example -- Jpeg, for example, uses discrete cosines -- which are
a naturally cyclic data type. They repeat with a fixed period. I know
there are "C" libraries already made for Jpeg -- but that doesn't mean
many other applications with no "C" library aren't plagued by this problem.

I don't know how to make this point more clear. There really *ARE*
applications that uses cyclic lists of data; or which can avoid extra
logic to fix problems encountered from linear arrays which *end* at a
particular point.

sometimes it is desirable for a truncation to occur, sometimes it's
NOT. The sign convention test I outlined, I believe, clearly detects
when a cyclic data set is desired. If there are normal examples where my
tests fail -- that's what's important to me.


Andrew Robinson

unread,
Oct 29, 2012, 8:04:43 PM10/29/12
to pytho...@python.org
On 10/29/2012 10:53 PM, Michael Torrie wrote:
> On 10/29/2012 01:34 PM, Andrew Robinson wrote:
>> No, I don't think it big and complicated. I do think it has timing
>> implications which are undesirable because of how *much* slices are used.
>> In an embedded target -- I have to optimize; and I will have to reject
>> certain parts of Python to make it fit and run fast enough to be useful.
> Since you can't port the full Python system to your embedded machine
> anyway, why not just port a subset of python and modify it to suit your
> needs right there in the C code. It would be a fork, yes,
You're exactly right; That's what I *know* I am faced with.

> Without a libc, an MMU on the CPU, and a kernel, it's not going to just compile and run.
I have libc. The MMU is a problem; but the compiler implements the
standard "C" math library; floats, though, instead of doubles. That's
the only problem -- there.
> What you want with slicing behavior changes has no
> place in the normal cPython implementation, for a lot of reasons. The
> main one is that it is already possible to implement what you are
> talking about in your own python class, which is a fine solution for a
> normal computer with memory and CPU power available.
If the tests I outlined in the previous post inaccurately describe a
major performance improvement and at least a modest code size reduction;
or will *often* introduce bugs -- I *AGREE* with you.

Otherwise, I don't. I don't think wasting extra CPU power is a good
thing -- Extra CPU power can always be used by something else....

I won't belabor the point further. I'd love to see a counter example to
the specific criteria I just provided to IAN -- it would end my quest;
and be a good reference to point others to.


Andrew Robinson

unread,
Oct 29, 2012, 8:17:07 PM10/29/12
to pytho...@python.org
On 10/29/2012 11:51 PM, Ian Kelly wrote:
> On Mon, Oct 29, 2012 at 4:39 PM, Andrew Robinson
>
> As above, you're looking at the compiler code, which is why you're
> finding things like "line" and "column". The tuple struct is defined
> in tupleobject.h and stores tuple elements in a tail array.
>

If you re-check my post to chris, I listed the struct you mention.
The C code is what is actually run (by GDB breakpoint test) when a tuple
is instantiated.
If the tuple were stripped of the extra data -- then it ought to be as
small as slice().
But it's not as small -- so either the sys.getsizeof() is lying -- or
the struct you mention is not complete.

Which?

--Andrew.

Ian Kelly

unread,
Oct 30, 2012, 3:21:46 AM10/30/12
to Python
On Mon, Oct 29, 2012 at 7:49 PM, Chris Kaynor <cka...@zindagigames.com> wrote:
> NOTE: The above is taken from reading the source code for Python 2.6.
> For some odd reason, I am getting that an empty tuple consists of 6
> pointer-sized objects (48 bytes on x64), rather than the expected 3
> pointer-sized (24 bytes on x64). Slices are showing up as the expected
> 5 pointer-sized (40 bytes on x64), and tuples grow at the expected 1
> pointer (8 bytes on x64) per item. I imagine I am missing something,
> but cannot figure out what that would be.

I'm likewise seeing 4 extra words in tuples in 32-bit Python 3.3.
What I've found is that for tuples and other collection objects, the
garbage collector tacks on an extra header before the object in
memory. That header looks like this:

typedef union _gc_head {
struct {
union _gc_head *gc_next;
union _gc_head *gc_prev;
Py_ssize_t gc_refs;
} gc;
long double dummy; /* force worst-case alignment */
} PyGC_Head;

gc_next and gc_prev implement a doubly-linked list that the garbage
collector uses to explicitly track this object. gc_refs is used for
counting references during a garbage collection and stores the GC
state of the object otherwise.

I'm not entirely certain why collection objects get this special
treatment, but there you have it.

Ian Kelly

unread,
Oct 30, 2012, 3:32:44 AM10/30/12
to Python
On Mon, Oct 29, 2012 at 6:17 PM, Andrew Robinson
<and...@r3dsolutions.com> wrote:
> If you re-check my post to chris, I listed the struct you mention.
> The C code is what is actually run (by GDB breakpoint test) when a tuple is
> instantiated.

When you were running GDB, were you debugging the interactive
interpreter or a precompiled script? The interactive interpreter does
a compilation step for every line entered.

> If the tuple were stripped of the extra data -- then it ought to be as small
> as slice().
> But it's not as small -- so either the sys.getsizeof() is lying -- or the
> struct you mention is not complete.

As just explained, the extra 16 bytes are added by the garbage collector.

Ian Kelly

unread,
Oct 30, 2012, 4:15:23 AM10/30/12
to Python
On Mon, Oct 29, 2012 at 5:54 PM, Andrew Robinson
<and...@r3dsolutions.com> wrote:
>> I don't know of a reason why one might need to use a negative start
>> with a positive stop, though.
>
> I've already given several examples; and another poster did too

I meant that I don't know of a reason to do that given the existing
semantics, which is what you were asking for. :-)

I understand and agree that there are potential applications for
allowing slices to wrap around from negative to positive. What I'm
not convinced of is that these applications need or should be handled
by the slicing operation -- which is more commonly understood as
producing subsequences -- especially since they already can be handled
relatively easily by iteration. I think that more users would find
the behavior surprising than useful.

Steven D'Aprano

unread,
Oct 30, 2012, 4:17:01 AM10/30/12
to
By the way Andrew, the timestamps on your emails appear to be off, or
possibly the time zone. Your posts are allegedly arriving before the
posts you reply to, at least according to my news client.


On Mon, 29 Oct 2012 12:34:24 -0700, Andrew Robinson wrote:

> On 10/29/2012 05:02 PM, Steven D'Aprano wrote:
>> On Mon, 29 Oct 2012 08:42:39 -0700, Andrew Robinson wrote:
>>
>>>>> But, why can't I just overload the existing __getitem__ for lists
>>>>> and not bother writing an entire class?
>> You say that as if writing "an entire class" was a big complicated
>> effort. It isn't. It is trivially simple, a single line:
>>
>> class MyList(list):
>> ...
> No, I don't think it big and complicated. I do think it has timing
> implications which are undesirable because of how *much* slices are
> used. In an embedded target -- I have to optimize; and I will have to
> reject certain parts of Python to make it fit and run fast enough to be
> useful.

Then I look forward to seeing your profiling results that show that the
overhead of subclassing list is the bottleneck in your application.

Until then, you are making the classic blunder of the premature optimizer:

"More computing sins are committed in the name of efficiency (without
necessarily achieving it) than for any other single reason — including
blind stupidity." — W.A. Wulf


I am not impressed by performance arguments when you have (apparently)
neither identified the bottlenecks in your code, nor even measured the
performance. You are essentially *guessing* where the bottlenecks are,
and *hoping* that some suggested change will be an optimization rather
than a pessimization.

Of course I may be wrong, and you have profiled your code and determined
that the overhead of inheritance is a problem. If so, that's a different
ball game. But your posts so far suggest to me that you're trying to
predict performance optimizations rather than measure them.


>>>> You can just overload that one method in a subclass of list. Being
>>>> able to monkey-patch __getitem__ for the list class itself would not
>>>> be advisable, as it would affect all list slicing anywhere in your
>>>> program and possibly lead to some unexpected behaviors.
>>> That's what I am curious about.
>>> What unexpected behaviors would a "monkey patch" typically cause?
>> What part of "unexpected" is unclear?
>>
> Ahh -- The I don't know approach! It's only unexpected if one is a bad
> programmer...!

No, it is unexpected because once you start relying on monkey-patching,
and the libraries you install are monkey-patching, you have a
combinational explosion of interactions. Any piece of code, anywhere,
could monkey-patch any other piece of code -- it is a form of action-at-a-
distance coding, like GOTOs and global variables. Use it with caution, in
moderation.


>> Let me see if I can illustrate a flavour of the sort of things that can
>> happen if monkey-patching built-ins were allowed.
[...]
> Right, which means that people developing the libraries made
> contradictory assumptions.

Not necessarily. Not only can monkey-patches conflict, but they can
combine in bad ways. It isn't just that Fred assumes X and Barney assumes
not-X, but also that Fred assumes X and Barney assumes Y and *nobody*
imagined that there was some interaction between X and Y.



[...]
>> Ruby allows monkey-patching of everything. And the result was
>> predictable:
>>
>> http://devblog.avdi.org/2008/02/23/why-monkeypatching-is-destroying-
ruby/
>>
>>
> I read that post carefully; and the author purposely notes that he is
> exaggerating.

Not carefully enough. He notes that he was using a provocative title and
that he doesn't actually think that Ruby is being destroyed. But the
actual harm he describes is real, e.g. bugs that take months to track
down.


> What you are talking about is namespace preservation;

I haven't mentioned namespaces. Nothing I have said has anything to do
with namespaces. I remember Apple monkey-patching routines in ROM back in
the mid 1980s, long before there was anything corresponding to namespaces
in Apple's programming model.


> and I am thinking
> about it. I can preserve it -- but only if I disallow true Python
> primitives in my own interpreter; I can't provide two sets in the memory
> footprint I am using.

If you want to write a language that is not Python, go right ahead.


> If someone had a clear explanation of the disadvantages of allowing an
> iterator, or a tuple -- in place of a slice() -- I would have no qualms
> dropping the subject. However, I am not finding that yet. I am finding
> very small optimization issues...

Python has certain public interfaces. If your language does not support
those public interfaces, then it might be an awesome language, but it is
not Python.

Python slices have certain features:

* they can be used repeatedly;

* they have public attributes start, step and stop;

* The stop attribute can be None, and the slice will default to the
length of the thing being sliced, which is not known until call-time.

* Slices have a public indices method.

These are known characteristics of slices, and there is Python code that
relies on it. If your language does not support these, then it is not
Python.

Iterators cannot replace slices, because once you have looked up an
iterator value, that value is gone, never to be seen again.

Tuples cannot replace slices, because tuples do not have start, step and
stop attributes; most tuples have no need of them, and if you care about
memory footprint, the last thing you want is to give every tuple three
unused or meaningless named attributes. Besides, what values should they
take in an empty tuple?

xrange cannot replaces slices, because there is no way to create an xrange
object with an empty or missing end; besides, xrange objects have no
public start/stop/step attributes.

And none of them have a public indices method.


> The actual *need* for a slice() object still hasn't been demonsrated.

Hell, the actual *need* for slicing hasn't been demonstrated! Or Python!
Since C doesn't have slicing, nobody needs it! Am I right?

Needed or not, that is what Python has, so if your language doesn't have
it, it isn't Python.



> I
> am thinking that the implementation of __getitem__() is very poor
> probably because of legacy issues.

You haven't demonstrated that it is very poor.


> A tuple can also hold None, so ( 1, None, 2 ) is still a valid Tuple.

Sure. And a tuple can also be () or (1, "x", [], None, None, None, None).
Now your list.__getitem__ code has to deal with those instead.


> Alternately: An iterator, like xrange(), could be made which takes None
> as a parameter, or a special value like 'inf'.

So you want to get rid of slices because you want to save every byte of
memory you can, and your way to do that is to introduce a new, *heavier*
type that does more than slices?

Oh man, that's funny.


--
Steven

Ian Kelly

unread,
Oct 30, 2012, 4:46:46 AM10/30/12
to Python
On Tue, Oct 30, 2012 at 1:21 AM, Ian Kelly <ian.g...@gmail.com> wrote:
> I'm not entirely certain why collection objects get this special
> treatment, but there you have it.

Thinking about it some more, this makes sense. The GC header is there
to support garbage collection for the object. Atomic types like ints
do not need this header because they do not reference other objects
and so cannot be involved in reference cycles. For those types,
reference counting is sufficient. For types like collections that do
reference other objects, garbage collection is needed.

Expanding on this, I suspect it is actually a bug that slice objects
are not tracked by the garbage collector. The following code appears
to result in a memory leak:

import gc
gc.disable()
while True:
for i in range(100):
l = []
s = slice(l)
l.append(s)
del s, l
_ = gc.collect()

Try running that and watch your Python memory usage climb and climb.
For contrast, replace the slice with a list and observe that memory
usage does *not* climb. On each iteration, the code constructs a
reference cycle between a slice and a list. It seems that because
slices are not tracked by the garbage collector, it is unable to break
these cycles.

Ian Kelly

unread,
Oct 30, 2012, 2:02:18 PM10/30/12
to Python
On Tue, Oct 30, 2012 at 10:14 AM, Ethan Furman <et...@stoneleaf.us> wrote:
> File a bug report?

Looks like it's already been wontfixed back in 2006:

http://bugs.python.org/issue1501180

Andrew Robinson

unread,
Oct 30, 2012, 10:21:02 AM10/30/12
to pytho...@python.org
Thanks, IAN, you've answered the first of my questions and have been a
great help.
(And yes, I was debugging interactive mode... I took a nap after writing
that post, as I realized I had reached my 1 really bad post for the day... )

I at least I finally know why Python chooses to implement slice() as a
separate object from tuple; even if I don't like the implications.

I think there are three main consequences of the present implementation
of slice():

1) The interpreter code size is made larger with no substantial
improvement in functionality, which increases debugging effort.
2) No protection against perverted and surprising (are you surprised?! I
am) memory operation exists.
3) There is memory savings associated with not having garbage collection
overhead.

D'Apriano mentioned the named values, start, stop, step in a slice()
which are an API and legacy issue; These three names must also be
stored in the interpreter someplace. Since slice is defined at the "C"
level as a struct, have you already found these names in the source code
(hard-coded), or are they part of a .py file associated with the
interface to the "C" code?

Ethan Furman

unread,
Oct 30, 2012, 5:21:33 PM10/30/12
to pytho...@python.org
Andrew Robinson wrote:
> I can see that the slice() function can pass in arbitrary arguments.
> I'm not sure for lists, which is what the range is applied to, why an
> argument like "a" would be part of a slice.

Well, in my dbf.py Record class you can use the names of fields as the slice arguments, instead of
having to remember the offsets. record['full_name':'zip4'] returns a tuple (or a list, I don't
remember) of about 13 fields -- this is especially useful as that block of fields might not be in
the same place in each table.

~Ethan~

Mark Lawrence

unread,
Oct 30, 2012, 5:33:32 PM10/30/12
to pytho...@python.org
On 30/10/2012 18:02, Ian Kelly wrote:
> On Tue, Oct 30, 2012 at 10:14 AM, Ethan Furman <et...@stoneleaf.us> wrote:
>> File a bug report?
>
> Looks like it's already been wontfixed back in 2006:
>
> http://bugs.python.org/issue1501180
>

Absolutely bloody typical, turned down because of an idiot. Who the
hell is Tim Peters anyway? :)

--
Cheers.

Mark Lawrence.

Ian Kelly

unread,
Oct 30, 2012, 5:47:09 PM10/30/12
to Python
I don't really disagree with him, anyway. It is a rather obscure bug
-- is it worth increasing the memory footprint of slice objects by 80%
in order to fix it?

Ian Kelly

unread,
Oct 30, 2012, 5:55:41 PM10/30/12
to Python
On Tue, Oct 30, 2012 at 8:21 AM, Andrew Robinson
<and...@r3dsolutions.com> wrote:
> D'Apriano mentioned the named values, start, stop, step in a slice() which
> are an API and legacy issue; These three names must also be stored in the
> interpreter someplace. Since slice is defined at the "C" level as a struct,
> have you already found these names in the source code (hard-coded), or are
> they part of a .py file associated with the interface to the "C" code?

You mean the mapping of Python attribute names to C struct members?
That's in sliceobject.c:

static PyMemberDef slice_members[] = {
{"start", T_OBJECT, offsetof(PySliceObject, start), READONLY},
{"stop", T_OBJECT, offsetof(PySliceObject, stop), READONLY},
{"step", T_OBJECT, offsetof(PySliceObject, step), READONLY},
{0}
};

Chris Angelico

unread,
Oct 30, 2012, 6:00:56 PM10/30/12
to pytho...@python.org
Bug report: If I take this gun, aim it at my foot, and pull the
trigger, sometimes a hole appears in my foot.

This is hardly normal use of slice objects. And the penalty isn't a
serious one unless you're creating cycles repeatedly.

ChrisA

Ian Kelly

unread,
Oct 30, 2012, 6:02:06 PM10/30/12
to Python
Note that the slice API also includes the slice.indices method.

They also implement rich comparisons, but this appears to be done by
copying the data to tuples and comparing the tuples, which is actually
a bit ironic considering this discussion. :-)

Andrew Robinson

unread,
Oct 30, 2012, 11:47:25 AM10/30/12
to pytho...@python.org
On 10/30/2012 01:17 AM, Steven D'Aprano wrote:
> By the way Andrew, the timestamps on your emails appear to be off, or
> possibly the time zone. Your posts are allegedly arriving before the
> posts you reply to, at least according to my news client.
:D -- yes, I know about that problem. Every time I reboot it shows up
again...
It's a distribution issue, my hardware clock is in local time -- but
when the clock is read by different scripts in my distribution, some
refuse to accept that the system clock is not UTC.
I'll be upgrading in a few weeks -- so I'm just limping along until
then. My apology.

> Then I look forward to seeing your profiling results that show that the
> overhead of subclassing list is the bottleneck in your application.
>
> Until then, you are making the classic blunder of the premature optimizer:
>
> "More computing sins are committed in the name of efficiency (without
> necessarily achieving it) than for any other single reason — including
> blind stupidity." — W.A. Wulf

I'm sure that's true. Optimization, though, is a very general word.

On a highway in my neighborhood -- the government keeps trying to put
more safety restrictions on it, because it statistically registers as
the "highest accident rate road" in the *entire* region.

Naturally, the government assumes that people in my neighborhood are
worse drivers than usual and need to be policed more -- but the truth
is, that highway is the *ONLY* access road in the region for dozens of
miles in any direction for a densely populated area, so if there is
going to be an accident it will happen there; the extra safety
precautions are not necessary when the accident rate is looked at from a
per-capita perspective of those driving the highway.

I haven't made *the* blunder of the premature optimizer because I
haven't implemented anything yet. Premature optimizers don't bother to
hold public conversation and take correction.
OTOH: people who don't ever optimize out of fear, pay an increasing
bloat price with time.

> I am not impressed by performance arguments when you have (apparently)
> neither identified the bottlenecks in your code, nor even measured the
> performance.
Someone else already did a benchmark between a discrete loop and a slice
operation.
The difference in speed was an order of magnitude different.
I bench-marked a map operation, which was *much* better -- but also
still very slow in comparison.

Let's not confound an issue here -- I am going to implement the python
interpreter; and am not bound by optimization considerations of the
present python interpreter -- There are things I can do which as a
python programmer -- you can't. I have no choice but to re-implement
and optimize the interpreter -- the question is merely how to go about it.

> You are essentially *guessing* where the bottlenecks are,
> and *hoping* that some suggested change will be an optimization rather
> than a pessimization.
>
> Of course I may be wrong, and you have profiled your code and determined
> that the overhead of inheritance is a problem. If so, that's a different
> ball game. But your posts so far suggest to me that you're trying to
> predict performance optimizations rather than measure them.
Not really; Inheritance itself and it's timing aren't my main concern.
Even if the time was *0* that wouldn't change my mind.

There are man hours in debugging time caused by not being able to wrap
around in a slice. (I am not ignoring the contrary man hours of an API
change's bugs).

Human psychology is important; and it's a double edged sword.

I would refer you to a book written by Steve Maguire, Writing Solid
Code; Chapter 5; Candy machine interfaces.

He uses the "C" function "realloc()" as an excellent example of a bad
API; but still comments on one need that it *does* fulfill -- "I've
found it better to have one function that both shrinks and expands
blocks so that I don't have to write *ifs* constructs every time I need
to resize memory. True, I give up some extra argument checking, but
this is offset by the *ifs* that I no longer need to write (*and
possibly mess up*).

* Extra steps that a programmer must take to achieve a task are places
where bugs get introduced.

* API's which must be debugged to see what particular operation it is
performing rather than knowing what that operation is from looking at
the un-compiled code are places where bugs get introduced.

These two points are not friendly with each other -- they are in fact,
generally in conflict.
>> Right, which means that people developing the libraries made
>> contradictory assumptions.
> Not necessarily. Not only can monkey-patches conflict, but they can
> combine in bad ways. It isn't just that Fred assumes X and Barney assumes
> not-X, but also that Fred assumes X and Barney assumes Y and *nobody*
> imagined that there was some interaction between X and Y.
They *STILL* made contradictory assumptions; each of them assumed the
interaction mechanism would not be applied in a certain way -- and then
used it based on that assumption.
> Not carefully enough. He notes that he was using a provocative title and
> that he doesn't actually think that Ruby is being destroyed. But the
> actual harm he describes is real, e.g. bugs that take months to track
> down.
Yes. I read that *carefully*.
BTW: I'm not planning on monkey patching -- I did say your comments
were well taken.

>> What you are talking about is namespace preservation;
> I haven't mentioned namespaces. Nothing I have said has anything to do
> with namespaces.
keywords are a namespace. xrange is a keyword, etc.
You don't want pre-defined API method name to have its operation
altered; you want the original names preserved. Overloading pollutes
namespaces, etc.

> If you want to write a language that is not Python, go right ahead.

That's not the only possibility. It will still be called Python and for
good reason.

>
>> If someone had a clear explanation of the disadvantages of allowing an
>> iterator, or a tuple -- in place of a slice() -- I would have no qualms
>> dropping the subject. However, I am not finding that yet. I am finding
>> very small optimization issues...
> Python has certain public interfaces. If your language does not support
> those public interfaces, then it might be an awesome language, but it is
> not Python.
I didn't say I wouldn't support it. How do you mean this?

> Iterators cannot replace slices, because once you have looked up an
> iterator value, that value is gone, never to be seen again.
Yes they can replace slices, but not always.
For example; Lets say an iterator were allowed to replace a slice;

then a[ 1:3 ] would still be a slice, but a[ xrange(1,3) ] would not.
The second piece of code does not deny that slices exist, it just allows
an iterator to be passed to __getitems__.

> Tuples cannot replace slices, because tuples do not have start, step and
> stop attributes;
I refer you to my comments to IAN.

>
> And none of them have a public indices method.
Hmm.. a new point. Do you have a link?
>
>> The actual *need* for a slice() object still hasn't been demonsrated.
> Hell, the actual *need* for slicing hasn't been demonstrated! Or Python!
> Since C doesn't have slicing, nobody needs it! Am I right?
I asked about the need for slices() not the others, in the OP.
Thanks for your comments on the names, etc. That's helpful -- and I
hope I haven't driven your blood pressure up too much unintentionally.

> Needed or not, that is what Python has, so if your language doesn't have
> it, it isn't Python.
Fine. But if mine still has these, but only as an *optional* wrapper
function -- it still *RUNS* all python progams.

> Sure. And a tuple can also be () or (1, "x", [], None, None, None, None).
> Now your list.__getitem__ code has to deal with those instead.
>
And the present __getitem__ can be passed a perverted slice() as IAN
demonstrated and the bug-list examined as "goofy". These strange things
simply cause an exception.

> So you want to get rid of slices because you want to save every byte
> of memory you can, and your way to do that is to introduce a new,
> *heavier* type that does more than slices? Oh man, that's funny.

I'm not Introducing a new type !
Besides; tuples will likely be lighter on memory in my target and I
don't recall saying a tuple does more than slices. I did say that
*iterators* were more *flexible* -- is that what you are thinking of?

Mark Lawrence

unread,
Oct 30, 2012, 7:30:59 PM10/30/12
to pytho...@python.org
On 30/10/2012 21:47, Ian Kelly wrote:
> On Tue, Oct 30, 2012 at 3:33 PM, Mark Lawrence <bream...@yahoo.co.uk> wrote:
>> On 30/10/2012 18:02, Ian Kelly wrote:
>>>
>>> On Tue, Oct 30, 2012 at 10:14 AM, Ethan Furman <et...@stoneleaf.us> wrote:
>>>>
>>>> File a bug report?
>>>
>>>
>>> Looks like it's already been wontfixed back in 2006:
>>>
>>> http://bugs.python.org/issue1501180
>>>
>>
>> Absolutely bloody typical, turned down because of an idiot. Who the hell is
>> Tim Peters anyway? :)
>
> I don't really disagree with him, anyway. It is a rather obscure bug
> -- is it worth increasing the memory footprint of slice objects by 80%
> in order to fix it?
>

Thinking about it I entirely agree. An 80% increase in memory foorprint
where the slice objects are being used with Python 3.3.0 Unicode would
have disastrous consequences given the dire state of said Unicode, which
is why some regular contributors here are giving up with Python and
using Go.

Oh gosh look at the time, I'm just going for a walk so I can talk with
the Pixies at the bottom of my garden before they go night nights.

--
Cheers.

Mark Lawrence.

Mark Lawrence

unread,
Oct 30, 2012, 7:48:41 PM10/30/12
to pytho...@python.org
On 30/10/2012 15:47, Andrew Robinson wrote:
>
> I would refer you to a book written by Steve Maguire, Writing Solid
> Code; Chapter 5; Candy machine interfaces.
>

The book that took a right hammering here
http://accu.org/index.php?module=bookreviews&func=search&rid=467 ?

--
Cheers.

Mark Lawrence.

Message has been deleted

Michael Torrie

unread,
Oct 31, 2012, 1:29:39 AM10/31/12
to pytho...@python.org
On 10/30/2012 09:47 AM, Andrew Robinson wrote:
> Let's not confound an issue here -- I am going to implement the python
> interpreter; and am not bound by optimization considerations of the
> present python interpreter -- There are things I can do which as a
> python programmer -- you can't. I have no choice but to re-implement
> and optimize the interpreter -- the question is merely how to go about it.

As this is the case, why this long discussion? If you are arguing for a
change in Python to make it compatible with what this fork you are going
to create will do, this has already been fairly thoroughly addressed
earl on, and reasons why the semantics will not change anytime soon have
been given.

Ian Kelly

unread,
Oct 31, 2012, 3:43:56 AM10/31/12
to Python
On Tue, Oct 30, 2012 at 4:25 PM, Andrew Robinson
<and...@r3dsolutions.com> wrote:
> Ian,
>
>> Looks like it's already been wontfixed back in 2006:
>
>> http://bugs.python.org/issue1501180
>
> Absolutely bloody typical, turned down because of an idiot. Who the hell is
> Tim Peters anyway?
>
>> I don't really disagree with him, anyway. It is a rather obscure bug
>> -- is it worth increasing the memory footprint of slice objects by 80%
>> in order to fix it?
>
> :D
>
> In either event, a *bug* does exist (at *least* 20% of the time.) Tim
> Peters could have opened the *appropriate* bug complaint if he rejected the
> inappropriate one.

Where are you getting that 20% figure from? Reference cycles
involving slice objects would be extremely rare, certainly far less
than 20%.

> The API ought to have either 1) included the garbage collection, or 2)
> raised an exception anytime dangerous/leaky data was supplied to slice().

How would you propose detecting the latter? At the time data is
supplied to slice() it cannot refer to the slice, as the slice does
not exist yet. The cycle has to be created after.

> If it is worth getting rid of the 4 words of extra memory required for the
> GC -- on account of slice() refusing to support data with sub-objects; then
> I'd also point out that a very large percentage of the time, tuples also
> contain data (typically integers or floats,) which do not further
> sub-reference objects. Hence, it would be worth it there too.

I disagree. The proportion of the time that a tuple contains other
collection objects is *much* greater. This happens regularly. OTOH,
if I had to hazard a guess at the frequency with which non-atomic
objects are used in slices, it would be a fraction of a fraction of a
fraction of a percent.

> I came across some unexpected behavior in Python 3.2 when experimenting with
> ranges and replacement....
>
> Consider, xrange is missing, BUT:

More accurately, range is gone, and xrange has been renamed range.

>>>> a=range(1,5,2)
>>>> a[1]
> 3
>>>> a[2]
> 5
>>>> a[1:2]
> range(3, 5, 2)
>
> Now, I wondered if it would still print the array or not; eg: if this was a
> __str__ issue vs. __repr__.
>
>>>> print( a[1:2] ) # Boy, I have to get used to the print's parenthesis
> range(3, 5, 2)
>
> So, the answer is *NOPE*.

I'm not sure why you would expect it to print a list here, without an
explicit conversion. The result of calling range in Python 3 is a
range object, not a list.

Steven D'Aprano

unread,
Oct 31, 2012, 6:07:36 AM10/31/12
to
I see your smiley, but for the benefit of those who actually don't know
who Tim Peters, a.k.a. the Timbot, is, he is one of the gurus of Python
history. He invented Python's astonishingly excellent sort routine,
Timsort, and popularised the famous adverbial phrase signoffs you will
see in a lot of older posts.

Basically, he is in the pantheon of early Python demigods.


stop-me-before-i-start-gushing-over-the-timbot-ly y'rs,



--
Steven

Mark Lawrence

unread,
Oct 31, 2012, 12:01:41 PM10/31/12
to pytho...@python.org
4 / 10, must try harder, the omission of the Zen of Python is considered
a very serious matter :)

--
Cheers.

Mark Lawrence.

Robert Kern

unread,
Nov 1, 2012, 7:13:48 AM11/1/12
to pytho...@python.org
On 10/31/12 8:16 PM, Andrew Robinson wrote:
> On 10/31/2012 02:20 PM, Ian Kelly wrote:
>> On Wed, Oct 31, 2012 at 7:42 AM, Andrew Robinson wrote:
>>
>> Then; I'd note: The non-goofy purpose of slice is to hold three data
>> values; They are either numbers or None. These *normally* encountered
>> values can't create a memory loop.
>> So, FOR AS LONG, as the object representing slice does not contain an
>> explicit GC pair; I move that we mandate (yes, in the current python
>> implementation, even as a *fix*) that its named members may not be
>> assigned any objects other than None or numbers....
>>
>> eg: Lists would be forbidden....
>>
>> Since functions, and subclasses, can be test evaluated by int(
>> the_thing_to_try ) and *[] can too,
>> generality need not be lost for generating nothing or numbers.
>>
>>
>> PEP 357 requires that anything implementing the __index__ special method be
>> allowed for slicing sequences (and also that __index__ be used for the
>> conversion). For the most part, that includes ints and numpy integer types,
>> but other code could be doing esoteric things with it.
>
> I missed something... (but then that's why we're still talking about it...)
>
> Reading the PEP, it notes that *only* integers (or longs) are permitted in slice
> syntax.
> (Overlooking None, of course... which is strange...)
>
> The PEP gives the only exceptions as objects with method "__index__".
>
> Automatically, then, an empty list is forbidden (in slice syntax).
> However, What you did, was circumvent the PEP by passing an empty list directly
> to slice(), and avoiding running it through slice syntax processing.

Why do you think it is forbidden by the syntax?

[~]
|1> class A(object):
..> def __getitem__(self, key):
..> return key
..>

[~]
|2> a = A()

[~]
|3> a[[]:]
slice([], None, None)


The PEP is a little unclear and refers to a state of the Python interpreter that
no longer exists. At the time, I think __getslice__() was still not deprecated,
and it did require ints (or after the PEP, __index__able objects).
__getslice__() is now deprecated in favor of __getitem__() where you can
interpret slice objects with arbitrary objects to your heart's content.
Arbitrary objects *are* definitely allowed by the slice syntax (how could the
syntax know what is an int and what is another kind of object?). Most objects
that interpret slices, especially the builtin sequence types, do require
__index__able objects (or None).

> So, what's the psychology behind allowing slice() to hold objects which are not
> converted to ints/longs in the first place?

In numpy, we (ab)use this freedom for some convenient notation in special
objects. We have a couple of grid-making convenience objects:

[~]
|5> numpy.mgrid[1.5:2.5:0.1]
array([ 1.5, 1.6, 1.7, 1.8, 1.9, 2. , 2.1, 2.2, 2.3, 2.4])


This syntax uses the start:stop:step notation to make a float range. If we use
an imaginary integer in the "step" slot, mgrid will interpret it as the number
of items requested instead of the step.

[~]
|6> numpy.mgrid[1.5:2.5:11j]
array([ 1.5, 1.6, 1.7, 1.8, 1.9, 2. , 2.1, 2.2, 2.3, 2.4, 2.5])

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

Ethan Furman

unread,
Nov 1, 2012, 10:12:15 AM11/1/12
to pytho...@python.org
Andrew Robinson wrote:
> On 10/31/2012 02:20 PM, Ian Kelly wrote:
>> On Wed, Oct 31, 2012 at 7:42 AM, Andrew Robinson wrote:
>>> Then; I'd note: The non-goofy purpose of slice is to hold three
>>> data values; They are either numbers or None. These *normally*
>>> encountered values can't create a memory loop.
>>> So, FOR AS LONG, as the object representing slice does not contain
>>> an explicit GC pair; I move that we mandate (yes, in the current
>>> python implementation, even as a *fix*) that its named members may
>>> not be assigned any objects other than None or numbers....
>>>
>>> eg: Lists would be forbidden....
>>>
>>> Since functions, and subclasses, can be test evaluated by int(
>>> the_thing_to_try ) and *[] can too,
>>> generality need not be lost for generating nothing or numbers.
>>
>>
>> PEP 357 requires that anything implementing the __index__ special
>> method be allowed for slicing sequences (and also that __index__ be
>> used for the conversion). For the most part, that includes ints and
>> numpy integer types, but other code could be doing esoteric things
>> with it.
>
> I missed something... (but then that's why we're still talking about it...)
>
> Reading the PEP, it notes that *only* integers (or longs) are permitted
> in slice syntax.

Keep in mind that PEPs represent Python /at that time/ -- as Python
moves forward, PEPs are not updated (this has gotten me a couple times).


>> The change would be backward-incompatible in any case, since there is
>> certainly code out there that uses non-numeric slices -- one example
>> has already been given in this thread.
>
> Hmmm.....
>
> Now, I'm thinking -- The purpose of index(), specifically, is to notify
> when something which is not an integer may be used as an index; You've
> helpfully noted that index() also *converts* those objects into numbers.
>
> Ethan Fullman mentioned that he used the names of fields, "instead of
> having to remember the _offsets_"; Which means that his values _do
> convert_ to offset numbers

Furman, actually. :)

And my values do *not* convert to indices (at least, not automatically).
My __getitem__ code looks like:

elif isinstance(item, slice):
sequence = []
if isinstance(item.start, (str, unicode)) \
or isinstance(item.stop, (str, unicode)):
field_names = dbf.field_names(self)
start, stop, step = item.start, item.stop, item.step
if start not in field_names or stop not in field_names:
raise MissingFieldError(
"Either %r or %r (or both) are not valid field names"
% (start, stop))
if step is not None and not isinstance(step, (int, long)):
raise DbfError(
"step value must be an int or long, not %r"
% type(step))
start = field_names.index(start)
stop = field_names.index(stop) + 1
item = slice(start, stop, step)
for index in self._meta.fields[item]:
sequence.append(self[index])
return sequence

In other words, the slice contains the strings, and my code calculates
the offsets -- Python doesn't do it for me.


> His example was actually given in slice syntax notation [::].
> Hence, his objects must have an index() method, correct?.

Nope.

~Ethan~

Chris Angelico

unread,
Nov 1, 2012, 10:40:22 AM11/1/12
to pytho...@python.org
On Fri, Nov 2, 2012 at 1:12 AM, Ethan Furman <et...@stoneleaf.us> wrote:
> In other words, the slice contains the strings, and my code calculates
> the offsets -- Python doesn't do it for me.

That's correct, but you're still translating those strings into
numeric indices. You can slice a database record based on column names
(though personally I would recommend against it - creates too much
dependence on column order, which I prefer to treat as
non-significant), but you can't, for instance, slice a dictionary by
keys:

foo = {"asdf":123,"qwer":234,"zxcv":345,"1234":456}
foo["qwer":"1234"] # What should this return?

I suppose conceptually you could slice any iterable by discarding till
you match the start, then yielding till you match the stop, then
returning (it'd function like itertools.islice but using non-numeric
indices - somehow). But it still depends on there being a dependable
order.

(Incidentally, isinstance(X, (str, unicode)) can become isinstance(X,
basestring) - they both inherit from that.)

ChrisA

Ethan Furman

unread,
Nov 1, 2012, 12:15:14 PM11/1/12
to pytho...@python.org
Chris Angelico wrote:
> On Fri, Nov 2, 2012 at 1:12 AM, Ethan Furman <et...@stoneleaf.us> wrote:
>> In other words, the slice contains the strings, and my code calculates
>> the offsets -- Python doesn't do it for me.
>
> That's correct, but you're still translating those strings into
> numeric indices.

True, but the point is that the /slice/ contains a data type that is
neither a number, nor directly translatable into a number (that is, no
__index__ method), and my code would cease to function should that
change to slices be made.

~Ethan~

Chris Angelico

unread,
Nov 1, 2012, 2:53:54 PM11/1/12
to pytho...@python.org
On Thu, Nov 1, 2012 at 10:32 PM, Andrew Robinson
<and...@r3dsolutions.com> wrote:
> presently slice() allows memory leaks through GC loops.

Forgive me if I've missed something here, but isn't it only possible
to make a refloop by decidedly abnormal behaviour? Stuff like:

a=[]; a.append(slice(a))

Seriously, who does this? First you have to have a reference to a
container in place of an index, and then you have to retain the slice
object inside that same container as well. Neither operation is normal
use of a slice. Where is the problem?

ChrisA

Ian Kelly

unread,
Nov 1, 2012, 3:07:32 PM11/1/12
to Python
On Thu, Nov 1, 2012 at 5:32 AM, Andrew Robinson
<and...@r3dsolutions.com> wrote:
> Hmmmm.... was that PEP the active state of Python, when Tim rejected the bug report?

Yes. The PEP was accepted and committed in March 2006 for release in
Python 2.5. The bug report is from June 2006 has a version
classification of Python 2.5, although 2.5 was not actually released
until September 2006.

> Pep 357 merely added cruft with index(), but really solved nothing. Everything index() does could be implemented in __getitem__ and usually is.

No. There is a significant difference between implementing this on
the container versus implementing it on the indexes. Ethan
implemented his string-based slicing on the container, because the
behavior he wanted was specific to the container type, not the index
type. Custom index types like numpy integers on the other hand
implement __index__ on the index type, because they apply to all
sequences, not specific containers. This must be separate from
standard int conversion, because standard int conversion is too
general for indexing.

> slice is also a full blown object, which implements a trivial method to dump the contents of itself to a tuple.

slice.indices() does not trivially dump its contents as given. It
takes a sequence length and adjusts its indices to that length. The C
implementation of this is around 60 lines of code.

> Don't bother to fix the bug; allow Python to crash with a subtle bug that often take weeks to track down by the very small minority doing strange things (Equivalent to the "monkey patch" syndrome of D'Aprano; BTW: The longer the bug is left unfixed, the more people will invent "uses" for it )

It's been 6 years already. AFAIK nobody has invented any uses that
are actually at risk of invoking the GC bug.

Ethan Furman

unread,
Nov 1, 2012, 4:55:46 PM11/1/12
to Python
Ian Kelly wrote:
> On Thu, Nov 1, 2012 at 5:32 AM, Andrew Robinson wrote:
>> Don't bother to fix the bug; allow Python to crash with a subtle bug that often take weeks to track down by the very small minority doing strange things (Equivalent to the "monkey patch" syndrome of D'Aprano; BTW: The longer the bug is left unfixed, the more people will invent "uses" for it )
>
> It's been 6 years already. AFAIK nobody has invented any uses that
> are actually at risk of invoking the GC bug.

The bug is not that slice allows non-numbers, but that slice objects aren't tracked by gc; I'm not
seeing an issue with not fixing the bug.

~Ethan~

Andrew Robinson

unread,
Nov 1, 2012, 6:25:51 PM11/1/12
to pytho...@python.org
On 11/01/2012 12:07 PM, Ian Kelly wrote:
> On Thu, Nov 1, 2012 at 5:32 AM, Andrew Robinson
> <and...@r3dsolutions.com> wrote:
>> Hmmmm.... was that PEP the active state of Python, when Tim rejected the bug report?
> Yes. The PEP was accepted and committed in March 2006 for release in
> Python 2.5. The bug report is from June 2006 has a version
> classification of Python 2.5, although 2.5 was not actually released
> until September 2006.
That explain's Peter's remark. Thank you. He looks *much* smarter now.

>
>> Pep 357 merely added cruft with index(), but really solved nothing. Everything index() does could be implemented in __getitem__ and usually is.
> No. There is a significant difference between implementing this on
> the container versus implementing it on the indexes. Ethan
> implemented his string-based slicing on the container, because the
> behavior he wanted was specific to the container type, not the index
> type. Custom index types like numpy integers on the other hand
> implement __index__ on the index type, because they apply to all
> sequences, not specific containers.

Hmmm...
D'Aprano didn't like the monkey patch;and sub-classing was his fix-all.

Part of my summary is based on that conversation with him,and you
touched on one of the unfinished points; I responded to him that I
thought __getitem__ was under-developed. The object slice() has no
knowledge of the size of the sequence; nor can it get that size on it's
own, but must passively wait for it to be given to it.

The bottom line is: __getitem__ must always *PASS* len( seq ) to
slice() each *time* the slice() object is-used. Since this is the case,
it would have been better to have list, itself, have a default member
which takes the raw slice indicies and does the conversion itself. The
size would not need to be duplicated or passed -- memory savings, &
speed savings...

I'm just clay pidgeoning an idea out here....
Let's apply D'Aprano 's logic to numpy; Numpy could just have subclassed
*list*; so let's ignore pure python as a reason to do anything on the
behalf on Numpy:

Then, lets' consider all thrid party classes; These are where
subclassing becomes a pain -- BUT: I think those could all have been
injected.

>>> class ThirdParty( list ): # Pretend this is someone else's...
... def __init__(self): return
... def __getitem__(self,aSlice): return aSlice
...

We know it will default work like this:
>>> a=ThirdParty()
>>> a[1:2]
slice(1, 2, None)

# So, here's an injection...
>>> ThirdParty.superOnlyOfNumpy__getitem__ = MyClass.__getitem__
>>> ThirdParty.__getitem__ = lambda self,aSlice: ( 1, 3,
self.superOnlyOfNumpy__getitem__(aSlice ).step )
>>> a[5:6]
(1, 3, None)

Numpy could have exported a (workable) function that would modify other
list functions to affect ONLY numpy data types (eg: a filter). This
allows user's creating their own classes to inject them with Numpy's
filter only when they desire;

Recall Tim Peter's "explicit is better than implicit" Zen?

Most importantly normal programs not using Numpy wouldn't have had to
carry around an extra API check for index() *every* single time the
heavily used [::] happened. Memory & speed both.

It's also a monkey patch, in that index() allows *conflicting*
assumptions in violation of the unexpected monkey patch interaction worry.

eg: Numpy *CAN* release an index() function on their floats -- at which
point a basic no touch class (list itself) will now accept float as an
index in direct contradiction of PEP 357's comment on floats... see?

My point isn't that this particular implementation I have shown is the
best (or even really safe, I'd have to think about that for a while).
Go ahead and shoot it down...

My point is that, the methods found in slice(), and index() now have
moved all the code regarding a sequence *out* of the object which has
information on that sequence. It smacks of legacy.

The Python parser takes values from many other syntactical constructions
and passes them directly to their respective objects -- but in the case
of list(), we have a complicated relationship; and not for any reason
that can't be handled in a simpler way.

Don't consider the present API legacy for a moment, I'm asking
hypothetical design questions:

How many users actually keep slice() around from every instance of [::]
they use?
If it is rare, why create the slice() object in the first place and
constantly be allocating and de-allocating memory, twice over? (once for
the original, and once for the repetitive method which computes dynamic
values?) Would a single mutable have less overhead, since it is
destroyed anyway?

Ian Kelly

unread,
Nov 1, 2012, 8:32:36 PM11/1/12
to Python
On Thu, Nov 1, 2012 at 4:25 PM, Andrew Robinson
<and...@r3dsolutions.com> wrote:
> The bottom line is: __getitem__ must always *PASS* len( seq ) to slice()
> each *time* the slice() object is-used. Since this is the case, it would
> have been better to have list, itself, have a default member which takes the
> raw slice indicies and does the conversion itself. The size would not need
> to be duplicated or passed -- memory savings, & speed savings...

And then tuple would need to duplicate the same code. As would deque.
And str. And numpy.array, and anything else that can be sliced,
including custom sequence classes.

> Let's apply D'Aprano 's logic to numpy; Numpy could just have subclassed
> *list*;

Numpy arrays are very different internally from lists. They are
basically fancy wrappers of C arrays, whereas lists are a higher-level
abstraction. They allow for multiple dimensions, which lists do not.
Slices of numpy arrays produce views, whereas slices of lists produce
brand new lists. And they certainly do not obey the Liskov
Substitution Principle with respect to lists.

>>>> class ThirdParty( list ): # Pretend this is someone else's...
> ... def __init__(self): return
> ... def __getitem__(self,aSlice): return aSlice
> ...
>
> We know it will default work like this:
>>>> a=ThirdParty()
>>>> a[1:2]
> slice(1, 2, None)
>
> # So, here's an injection...
>>>> ThirdParty.superOnlyOfNumpy__getitem__ = MyClass.__getitem__
>>>> ThirdParty.__getitem__ = lambda self,aSlice: ( 1, 3,
>>>> self.superOnlyOfNumpy__getitem__(aSlice ).step )
>>>> a[5:6]
> (1, 3, None)

I'm not understanding what this is meant to demonstrate. Is "MyClass"
a find-replace error of "ThirdParty"? Why do you have __getitem__
returning slice objects instead of items or subsequences? What does
this example have to do with numpy?

> Numpy could have exported a (workable) function that would modify other list
> functions to affect ONLY numpy data types (eg: a filter). This allows
> user's creating their own classes to inject them with Numpy's filter only
> when they desire;
>
> Recall Tim Peter's "explicit is better than implicit" Zen?

We could also require the user to explicitly declare when they're
performing arithmetic on variables that might not be floats. Then we
can turn off run-time type checking unless the user explicitly
requests it, all in the name of micro-optimization and explicitness.

Seriously, whether x is usable as a sequence index is a property of x,
not a property of the sequence. Users shouldn't need to pick and
choose *which* particular sequence index types their custom sequences
are willing to accept. They should even be able to accept sequence
index types that haven't been written yet.

> Most importantly normal programs not using Numpy wouldn't have had to carry
> around an extra API check for index() *every* single time the heavily used
> [::] happened. Memory & speed both.

The O(1) __index__ check is probably rather inconsequential compared
to the O(n) cost of actually performing the slicing.

> It's also a monkey patch, in that index() allows *conflicting* assumptions
> in violation of the unexpected monkey patch interaction worry.
>
> eg: Numpy *CAN* release an index() function on their floats -- at which
> point a basic no touch class (list itself) will now accept float as an index
> in direct contradiction of PEP 357's comment on floats... see?

Such a change would only affect numpy floats, not all floats, so it
would not be a monkey-patch. In any case, that would be incorrect
usage of __index__. We're all consenting adults here; we don't need
supervision to protect us from buggy third-party code.

88888 Dihedral

unread,
Nov 1, 2012, 9:08:33 PM11/1/12
to
andrew...@gmail.com於 2012年10月29日星期一UTC+8上午11時12分11秒寫道:
> The slice operator does not give any way (I can find!) to take slices from negative to positive indexes, although the range is not empty, nor the expected indexes out of range that I am supplying.
>
>
>
> Many programs that I write would require introducing variables and logical statements to correct the problem which is very lengthy and error prone unless there is a simple work around.
>
>
>
> I *hate* replicating code every time I need to do this!
>
>
>
> I also don't understand why slice() is not equivalent to an iterator, but can replace an integer in __getitem__() whereas xrange() can't.
>
>
>
>
>
> Here's an example for Linux shell, otherwise remove /bin/env...
>
> {{{#!/bin/env python
>
> a=[1,2,3,4,5,6,7,8,9,10]
>
> print a[-4:3] # I am interested in getting [7,8,9,10,1,2] but I get [].
>
> }}}
I'll suggest to use the reverse method
to get what you want.

Of course, the reverse method is not efficient for
a list of a huge number of objects in python.

Steven D'Aprano

unread,
Nov 1, 2012, 10:14:55 PM11/1/12
to
On Thu, 01 Nov 2012 15:25:51 -0700, Andrew Robinson wrote:

> On 11/01/2012 12:07 PM, Ian Kelly wrote:

>>> Pep 357 merely added cruft with index(), but really solved nothing.
>>> Everything index() does could be implemented in __getitem__ and
>>> usually is.
>>
>> No. There is a significant difference between implementing this on the
>> container versus implementing it on the indexes. Ethan implemented his
>> string-based slicing on the container, because the behavior he wanted
>> was specific to the container type, not the index type. Custom index
>> types like numpy integers on the other hand implement __index__ on the
>> index type, because they apply to all sequences, not specific
>> containers.
>
> Hmmm...
> D'Aprano didn't like the monkey patch;and sub-classing was his fix-all.

I pointed out that monkey-patching is a bad idea, even if it worked. But
it doesn't work -- you simply cannot monkey-patch built-ins in Python.
Regardless of whether "I like" the m-p or not, *you can't use it* because
you patch built-in list methods.

The best you could do is subclass list, then shadow the built-in name
"list" with your subclass. But that gives all sorts of problems too, in
some ways even worse than monkey-patching.

You started this thread with a question about slicing. You believe that
one particular use-case for slicing, which involves interpreting lists as
circular rather than linear, is the use-case that built-in list slicing
should have supported.

Fine, you're entitled to your option. But that boat has sailed about 20
years ago. Python didn't make that choice, and it won't change now. If
you write up a PEP, you could aim to have the built-in behaviour changed
for Python 4 in perhaps another 10-15 years or so. But for the time
being, that's not what lists, tuples, strings, etc. do. If you want that
behaviour, if you want a circular list, then you have to implement it
yourself, and the easiest way to do so is with a subclass.

That's not a "fix-all". I certainly don't believe that subclassing is the
*only* way to fix this, nor that it will fix "all" things. But it might
fix *some* things, such as you wanting a data type that is like a
circular list rather than a linear list.

If you prefer to create a circular-list class from scratch, re-
implementing all the list-like behaviour, instead of inheriting from an
existing class, then by all means go right ahead. If you have a good
reason to spend days or weeks writing, testing, debugging and fine-tuning
your new class, instead of about 15 minutes with a subclass, then I'm
certainly not going to tell you not to.


> Part of my summary is based on that conversation with him,and you
> touched on one of the unfinished points; I responded to him that I
> thought __getitem__ was under-developed. The object slice() has no
> knowledge of the size of the sequence; nor can it get that size on it's
> own, but must passively wait for it to be given to it.

That's because the slice object is independent of the sequence. As I
demonstrated, you can pass a slice object to multiple sequences. This is
a feature, not a bug.


> The bottom line is: __getitem__ must always *PASS* len( seq ) to
> slice() each *time* the slice() object is-used.

The bottom line is: even if you are right, so what?

The slice object doesn't know what the length of the sequence is. What
makes you think that __getitem__ passes the length to slice()? Why would
it need to recreate a slice object that already exists?

It is the *sequence*, not the slice object, that is responsible for
extracting the appropriate items when __getitem__ is called. __getitem__
gets a slice object as argument, it doesn't create one. It no more
creates the slice object than mylist[5] creates the int 5.


> Since this is the case,

But it isn't.


> it would have been better to have list, itself, have a default member
> which takes the raw slice indicies and does the conversion itself. The
> size would not need to be duplicated or passed -- memory savings, &
> speed savings...

We have already demonstrated that slice objects are smaller than (x)range
objects and three-item tuples. In Python 3.3:

py> sys.getsizeof(range(1, 10, 2)) # xrange remained in Python 3
24
py> sys.getsizeof((1, 10, 2))
36
py> sys.getsizeof(slice(1, 10, 2))
20


It might help you to be taken seriously if you base your reasoning on
Python as it actually is, rather than counter-factual assumptions.



> I'm just clay pidgeoning an idea out here.... Let's apply D'Aprano 's
> logic to numpy; Numpy could just have subclassed *list*;

Sure they could have, if numpy arrays were intended to be a small
variation on Python lists. But they weren't, so they didn't.


> so let's ignore
> pure python as a reason to do anything on the behalf on Numpy:
>
> Then, lets' consider all thrid party classes; These are where
> subclassing becomes a pain -- BUT: I think those could all have been
> injected.
>
> >>> class ThirdParty( list ): # Pretend this is someone else's...
> ... def __init__(self): return
> ... def __getitem__(self,aSlice): return aSlice
> ...

Strange and bizarre semantics for slicing, but okay.


> We know it will default work like this:
> >>> a=ThirdParty()
> >>> a[1:2]
> slice(1, 2, None)
>
> # So, here's an injection...
> >>> ThirdParty.superOnlyOfNumpy__getitem__ = MyClass.__getitem__
> >>> ThirdParty.__getitem__ = lambda self,aSlice: ( 1, 3,
> self.superOnlyOfNumpy__getitem__(aSlice ).step )
> >>> a[5:6]
> (1, 3, None)
>
> Numpy could have exported a (workable) function that would modify other
> list functions to affect ONLY numpy data types (eg: a filter). This
> allows user's creating their own classes to inject them with Numpy's
> filter only when they desire;

Sure, the numpy people could have done this, if they were smoking crack.

Have you actually programmed before? Judging from the techniques you seem
to prefer for everyday use (monkey-patching other classes) and techniques
you seem to hate (subclassing), I'm getting the impression you've read
about bleeding edge programming hacks but never actually written code.
Sort of like somebody who has never driven a car, but fantasises about
doing the sort of extreme stunt driving that kills people in real life
and occasionally even stunt drivers. And now you are *insisting* that
everyone should drive like that, *all the time*, because stopping at
traffic lights is so inefficient.

Of course, I could be wrong. Maybe you've been programming for years and
know exactly what you are doing. But if so, you are coming across as
exactly the kind of cowboy coder that I pray to all the gods I never have
deal with in real life.


[...]
> Don't consider the present API legacy for a moment, I'm asking
> hypothetical design questions:
>
> How many users actually keep slice() around from every instance of [::]
> they use?

Does it matter? It is supported behaviour, so even *one* user is enough.


> If it is rare, why create the slice() object in the first place and
> constantly be allocating and de-allocating memory, twice over? (once for
> the original, and once for the repetitive method which computes dynamic
> values?)

Huh? As opposed to what? Creating an xrange() object, and constantly
allocating and de-allocating memory? Or a tuple? Same again. Some sort of
object has to be created.

And I have no idea what you are talking about "twice over". What
"repetitive method which computers dynamic values"?

In any case, I return to my comment earlier in this thread: if you have
profiled your application and have hard evidence that creating slice
objects is a bottleneck, then we can talk about optimizing the slice
objects. Until then, you are wasting your time and ours by prematurely
optimizing the wrong parts of your code.


> Would a single mutable have less overhead, since it is
> destroyed anyway?

What? This question makes no sense. Why do you think that mutable objects
have "less overhead" than immutable ones?


--
Steven

Andrew Robinson

unread,
Nov 2, 2012, 4:57:11 AM11/2/12
to pytho...@python.org
Hi Ian,

I apologize for trying your patience with the badly written code
example. All objects were meant to be ThirdParty(), the demo was only
to show how a slice() filter could have been applied for the reasons
PEP357 made index() to exist.
eg: because numpy items passed to __getitems__ via slice syntax [::]
were illegal values.
PEP 357 is the one who specifically mentioned Numpy types -- which is
the only reason I used the name in the example; I could have just as
well used a string.

I am fully aware of what numpy does -- I have used it; modified the
fortran interfaces underneath, etc.

The index() method, however, affects *all* list objects in Python, not
just Numpy's -- correct?

I'll write a working piece of code tomorrow to demonstrate the filter
very clearly rather than a skeleton, and test it before posting.

Robert Kern

unread,
Nov 2, 2012, 6:48:50 AM11/2/12
to pytho...@python.org
On 11/2/12 8:57 AM, Andrew Robinson wrote:
> Hi Ian,
>
> I apologize for trying your patience with the badly written code example. All
> objects were meant to be ThirdParty(), the demo was only to show how a slice()
> filter could have been applied for the reasons PEP357 made index() to exist.
> eg: because numpy items passed to __getitems__ via slice syntax [::] were
> illegal values.
> PEP 357 is the one who specifically mentioned Numpy types -- which is the only
> reason I used the name in the example; I could have just as well used a string.
>
> I am fully aware of what numpy does -- I have used it; modified the fortran
> interfaces underneath, etc.
>
> The index() method, however, affects *all* list objects in Python, not just
> Numpy's -- correct?

Please forget that PEP 357 mentions slices at all. The motivation for the
__index__() method (not index()) goes far beyond slices. I'm not really sure why
they are given such a prominent place in the PEP. Let me try to lay out the
motivation more clearly.

numpy has objects that represent integers but cannot be subclasses of the Python
int or long objects because their internal representations are different. These
are the width-specific types: uint8, int16, int64, etc. Before __index__() was
introduced, all indexing operations in the builtin Python sequence types
strictly checked for int or long objects and rejected other objects. We wanted
to provide a generic method that third party types could implement to say, "Yes,
I really am an integer, here is my value in a canonical representation you can
understand." We could not use __int__() for this purpose because it has
additional semantics, namely conversion from not-integers to integers. This is
why floats are mentioned; they do not generally represent integers but they do
define an __int__() method for their conversion to ints via the floor()
function. Generally, they should be rejected as indices. With the __index__()
method, we have a solution: int16 and the rest get __index__() methods and float
doesn't.

This is used where an integer index or offset is needed, not just in slices.
List indices, file.seek(), mmap.mmap(), etc. The change to use PyIndex_Check()
instead of PyInt_Check() was not very difficult or extensive. Even if you were
to change the slicing API for your other reasons, __index__() would still be needed.
0 new messages