Wishlist item: itertools.flatten

8 views
Skip to first unread message

Ville Vainio

unread,
Mar 11, 2005, 6:32:45 AM3/11/05
to
For quick-and-dirty stuff, it's often convenient to flatten a sequence
(which perl does, surprise surprise, by default):

[1,2,[3,"hello",[[4]]]] ->

[1, 2, 3, 'hello', 4]

One such implementation is at

http://aspn.activestate.com/ASPN/Mail/Message/python-tutor/2302348

but something like this would be handy in itertools as well.

It seems trivial, but I managed to screw up several times when trying
to produce my own implementation (infinite recursion).

--
Ville Vainio http://tinyurl.com/2prnb

Christos TZOTZIOY

unread,
Mar 11, 2005, 8:36:10 AM3/11/05
to
On 11 Mar 2005 13:32:45 +0200, rumours say that Ville Vainio
<vi...@spammers.com> might have written:

See Python Library Reference, "5.16.3 Recipes". Now that all and any (also
presented as recipes there) are considered to be included, perhaps flatten gets
a chance too.

This is just a personal opinion, but I detest restraints on library (itertools
module in this case) expansion when talking about such useful *building blocks*.
What happened to "batteries included"?
--
TZOTZIOY, I speak England very best.
"Be strict when sending and tolerant when receiving." (from RFC1958)
I really should keep that in mind when talking with people, actually...

Ville Vainio

unread,
Mar 11, 2005, 5:44:39 PM3/11/05
to
>>>>> "Christos" == TZOTZIOY <Christos> writes:

>> For quick-and-dirty stuff, it's often convenient to flatten a sequence
>> (which perl does, surprise surprise, by default):
>>
>> [1,2,[3,"hello",[[4]]]] ->
>>
>> [1, 2, 3, 'hello', 4]

Christos> See Python Library Reference, "5.16.3 Recipes". Now
Christos> that all and any (also

The recipe is:

def flatten(listOfLists):
return list(chain(*listOfLists))

That one is trivial, because it only flattens one level. The
flattening I'm talking about involves flattening to arbitrary depth to
get a single sequence of "atoms". The itertools implementation might
also be able to avoid recursion by using a stack.

Christos> This is just a personal opinion, but I detest restraints
Christos> on library (itertools module in this case) expansion
Christos> when talking about such useful *building blocks*.

Yeah - esp. in the case of flattening. If it was deemed useful enough
to be the default behavior in perl (which is admittedly braindamaged),
it should surely warrant being included as a single function in the
stdlib.

Christos TZOTZIOY

unread,
Mar 11, 2005, 5:55:38 PM3/11/05
to
On 12 Mar 2005 00:44:39 +0200, rumours say that Ville Vainio
<vi...@spammers.com> might have written:

> Christos> This is just a personal opinion, but I detest restraints


> Christos> on library (itertools module in this case) expansion
> Christos> when talking about such useful *building blocks*.
>
>Yeah - esp. in the case of flattening. If it was deemed useful enough
>to be the default behavior in perl (which is admittedly braindamaged),
>it should surely warrant being included as a single function in the
>stdlib.

Or a window function, which I have needed enough times in *separate* occasions
to add it in one of my personal "stdlib" modules (hinting it could be part of
itertools):

window('hello', 2) => 'he', 'el', 'll', 'lo'

Michael Spencer

unread,
Mar 11, 2005, 6:04:41 PM3/11/05
to pytho...@python.org
Ville Vainio wrote:
>>>>>>"Christos" == TZOTZIOY <Christos> writes:
>
>
> >> For quick-and-dirty stuff, it's often convenient to flatten a sequence
> >> (which perl does, surprise surprise, by default):
> >>
> >> [1,2,[3,"hello",[[4]]]] ->
> >>
> >> [1, 2, 3, 'hello', 4]
>
> Christos> See Python Library Reference, "5.16.3 Recipes". Now
> Christos> that all and any (also
>
> The recipe is:
>
> def flatten(listOfLists):
> return list(chain(*listOfLists))
>
> That one is trivial, because it only flattens one level. The
> flattening I'm talking about involves flattening to arbitrary depth to
> get a single sequence of "atoms". The itertools implementation might
> also be able to avoid recursion by using a stack.

Here's a non-recursive implementation. There are lots around. One issue is
specifying iterable types which should be atomic (notably strings). This uses a
simple hardwired test for that.

def flatten(iterable):
stack = []
iterator = iter(iterable)
while 1:
try:
item = iterator.next()
if hasattr(item,"__iter__"): # Avoids iterating over strings
stack.append(iterator)
iterator = iter(item)
else:
yield item
except StopIteration:
if stack:
iterator = stack.pop()
else:
raise StopIteration

>>> list(flatten([[[1,2,[3,[4,5]],[6,7,[8,[9],(10,11,12)]]]]]))
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
>>> list(flatten(["abc","def",["ghi", "jkl"]]))
['abc', 'def', 'ghi', 'jkl']

Michael

Jack Diederich

unread,
Mar 11, 2005, 7:29:28 PM3/11/05
to pytho...@python.org
On Sat, Mar 12, 2005 at 12:55:38AM +0200, Christos TZOTZIOY Georgiou wrote:
> On 12 Mar 2005 00:44:39 +0200, rumours say that Ville Vainio
> <vi...@spammers.com> might have written:
>
> > Christos> This is just a personal opinion, but I detest restraints
> > Christos> on library (itertools module in this case) expansion
> > Christos> when talking about such useful *building blocks*.
> >
> >Yeah - esp. in the case of flattening. If it was deemed useful enough
> >to be the default behavior in perl (which is admittedly braindamaged),
> >it should surely warrant being included as a single function in the
> >stdlib.
>
> Or a window function, which I have needed enough times in *separate* occasions
> to add it in one of my personal "stdlib" modules (hinting it could be part of
> itertools):
>
> window('hello', 2) => 'he', 'el', 'll', 'lo'

This was considered for 2.4, and I put in a patch if you want a C-level
implementation[1]. The patch wasn't much faster than doing it in python (two
times, IIRC), the python version is trivial, and almost no one wanted it. So
Hettinger and I agreed it should be left out.

-Jack

[1] http://sourceforge.net/tracker/?group_id=5470&atid=305470&func=detail&aid=756253

Ville Vainio

unread,
Mar 12, 2005, 10:31:05 AM3/12/05
to
>>>>> "Michael" == Michael Spencer <ma...@telcopartners.com> writes:

Michael> Here's a non-recursive implementation.

Thanks.

Michael> There are lots around.

Yet another fact that suggest the inclusion in stdlib would make sense
;-).

gene...@gmail.com

unread,
Mar 12, 2005, 2:02:33 PM3/12/05
to
window / cons / fencepost / slice functions: +1

(with a flag to say if you want to truncate or pad incomplete tuples at
end of input sequence.

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/303279
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/303060
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/347689

Probably more recipes in there, (and not CPAN-ish yet) but multiple
submissions bespeak a certain need, i think.

Raymond Hettinger

unread,
Mar 13, 2005, 1:52:40 AM3/13/05
to
[Ville Vainio]

> >For quick-and-dirty stuff, it's often convenient to flatten a sequence
> >(which perl does, surprise surprise, by default):
. . .

> >but something like this would be handy in itertools as well.
> >
> >It seems trivial, but I managed to screw up several times when trying
> >to produce my own implementation (infinite recursion).

[Christos TZOTZIOY Georgiou]


> See Python Library Reference, "5.16.3 Recipes". Now that all and any (also
> presented as recipes there) are considered to be included, perhaps flatten
gets
> a chance too.
>
> This is just a personal opinion, but I detest restraints on library (itertools
> module in this case) expansion when talking about such useful *building
blocks*.
> What happened to "batteries included"?

FWIW, requests for additions to the itertools module have not fallen on deaf
ears. There are no arbitrary restraints on building out this module. Each
request has gotten careful thought and a couple of them were accepted in Py2.4
(itertools.tee and itertools.groupby). If you would like to know the reasoning
behind any particular acceptance, rejection, or deferral, then just ask.

itertools.window() with n=2 got rejected. Almost all proposed uses had better
solutions (such as an accumulator variable or fibonacci sequence style logic:
a, b = b, a+b). Writing it in C afforded only small speed advantage over a
solution using izip() and tee().

itertools.window() with n>2 was also rejected. While use cases arise in markov
chains, statistics (moving averages, etc), and cryptanalysis (trigraph
analysis), there were almost always better solutions. window() spent most of
its time creating new tuples and shifting each of the common elements by one
position. A solution using collections.deque is generally superior because the
popleft() and append() operations do not entail moving all the common elements.
It was instructive to examine a use case with a large n, sliding window
compression -- the Right Solution (tm) does *not* entail continuously shifting
all of the data elements. IOW, providing a window() function in itertools is a
mistake because it leads people away from better solutions.

The jury is still out on flatten(). The principal implementation problem is not
recursing into iterable objects that the caller wants to treat as atomic. While
that can be done, there doesn't seem to be a clear winner among the solutions
that have arisen. Also, the solutions to that problem make the resulting
function more difficult to learn, remember, review, etc. The nature of
flattening is such that a C implementation doesn't offer any special advantage
over the various competing pure python versions. And, there is also the issue
of use cases. It appears to be much more fun to toy around with developing
flatten() recipes than it is to work on applications that require it. That is
not to say that it doesn't come-up or that it isn't helpful in Mathematica;
however, it is somewhat rare and not always the right solution even when it
could be used.

itertools.queue() was rejected because it didn't fit naturally into
applications -- you had to completely twist your logic around just to
accommodate it. Besides, it is already simple to iterate over a list while
appending items to it as needed.

itertools.roundrobin() was too specialized (being the engine behind a
multi-input task server) and that specialty often demanded more complex
capabilities than offered by roundrobin. For many situations, collections.deque
offers a better solution.

itertools.accumulate() returned the successive steps of a reduce() operation.
It too had precedents in APL and Mathematica. However, it loses its appeal with
functions more complex than operator.add or operator.mul. The effort to write a
more complex function is better spent writing a simple generator.
Alternatively, many potential applications were better served by in-lining the
function. The other issue was usability. accumulate() suffered from the same
complexities as reduce() -- it took too much thought to read, write, and review
(quick which variable comes first, the cumulative value or the new data element;
does it fold left or fold right; is there a default initial value; yada yada).

itertools.multigen() transforms single shot generators into something
re-iterable. The offered use cases were not compelling. The potential for
misuse is high and the logic behind the idea doesn't appear to be
self-consistent.

itertools.remove_value() is like a lisp/scheme multirember. Yawn. Just use a
genexp: (elem for elem in iterable if elem != value).

The jury is still out on itertools.eq() which compares any two iterables for
equality. Sometimes you want to compare [1,2,3] to (1,2,3) and consider only
the contents of the container rather than the type of the container. It is
somewhat handy and obvious; however, someone is bound to misuse it and apply it
to arbitrarily ordered containers such as sets and dictionaries; apply it to
infinite iterators; or just inadvertently exhaust an iterator prior to actually
needing its contents.

itertools.consume() would run work like map() with no arguments and no return
value (equivalent to list(it) where your throw-away the result list). This is
more clearly coded in pure python and it runs only slightly faster. Also, it is
only useful with functions that have side-effects and that is at odds with the
ideas of functional programming where itertools have their roots.

itertools.ilines() would iterate over a buffer and yield upon hitting a
universal newline -- essentially this is an in-memory version of what the file
iterator does with text files. This kind of operation is more appropriately
added to StringIO.

In addition to the above, people routinely request that all sorts of random
ideas be put in itertools. "That shouldn't be a builtin; stick it in
itertools." We likely need some other module for reduction functions like
any(), all(), no(), quantify(), take(), etc. In general, the itertools module
would be ill served by becoming a dumping ground.

Another thought is that it takes some time and skill to learn to use the tools
and how to combine them. The time and skill seems to rise exponentially with
the number of tools in the module. So, it would be a mistake to add a few
obscure, rarely used tools because that would impact the usability of the
existing toolset.

'nuff said, mister whatever happened to batteries included ;-)

Raymond Hettinger


P.S. It's not an accident that the recipes in the itertools docs are in a form
that is directly cut and pastable into a working application.


bearoph...@lycos.com

unread,
Mar 13, 2005, 8:15:51 AM3/13/05
to
Thank you for your very good and interesting answer Raymond. In the
Itertool library there are functions not really simple to use/remember,
but a flatten() and a partition() can probably be easy to remember
(your window() function is probably a sliding window, so it's not a
partition(), I presume).


>Also, the solutions to that problem make the resulting function more
difficult to learn, remember, review, etc.<

I agree, the basic flatten function can be easy (in Mathematica there
isn't the difference between tuples and lists, and you cannot flatten a
string), but it can become less simple as an useful implementation for
Python. This was my suggestion for a possible flatten():

flatten(sequence, level=-1, tuples=True, strings=False, safe=False)
- tuples=True then it flattens tuples too.
- strings=True then it flattens strings with len(s)>1 too.
- safe if True it cheeks (with something like an iterative isrecursive)
for
recursive references inside the sequence.
- level allows to specify the mapping level:
level=0 no flattening.
level=1 the flattening is applied to the first level only.
level=2 the flattening is applied to the first and second level only.
level=m where m>=actual depth. This is as level=-1.
Etc.
And like in the indexing of lists:
level=-1 or None (default) means the flattening is applied up to the
leaves.
level=-2 flattens up to pre-leaves.
Etc.


>The nature of flattening is such that a C implementation doesn't offer
any special advantage over the various competing pure python versions.<

Even a well tuned (non recursive) standard python version can be okay,
it avoids people to design lots of slower\wrong functions by
themselves.


>And, there is also the issue of use cases. It appears to be much more
fun to toy around with developing flatten() recipes than it is to work
on applications that require it.<

It's not easy to define "require" because usually there are many ways
to solve every problem.

There are two situations that I've found can make use of the
flatten/partition, but you can probably find better ways to do the same
thing (and I can appreciate suggestions):

1)
The function f returns two values, but you need a flat list as result:
def f(x): return x, x**2
r = flatten( f(i) for i in range(10) )
print r

Alternative:
def f(x): return x, x**2
r = []
for i in range(10): r.extend( f(i) )
print r
(You can also use two append())


2)
A file with columns of numbers separated by a space/tab:
n n n n
n n n n
n n n n
...

ll = open("namefile").read().splitlines()
r = [map(float, l.split()) for l in ll]

Alternative:
ll = open("namefile").read().split()
r = partition(map(float, ll), 4)

This second version can be a little faster, but it has the disadvantage
that the column number 4 is hard coded.


>We likely need some other module for reduction functions like any(),
all(), no(), quantify(), take(), etc.<

Okay.


Your itertools.consume() looks like the Mathematica Scan[] function.

Bye,
Bearophile

[Remove HUGS if you want to mail me directly]

Terry Reedy

unread,
Mar 13, 2005, 3:44:42 PM3/13/05
to pytho...@python.org

"Raymond Hettinger" <vze4...@verizon.net> wrote in message
news:YqRYd.1381$qN3.1209@trndny01...

> FWIW, requests for additions to the itertools module have not fallen on
> deaf
> ears. There are no arbitrary restraints on building out this module.
> Each
> request has gotten careful thought and a couple of them were accepted in
> Py2.4
> (itertools.tee and itertools.groupby). If you would like to know the
> reasoning
> behind any particular acceptance, rejection, or deferral, then just ask.

[nice rundown snipped]

Is this list of reasonings permanently posted somewhere, like on the wiki?
It's too good to just disappear.

TJR

Raymond Hettinger

unread,
Mar 14, 2005, 12:19:33 AM3/14/05
to
[bearophile]

> This was my suggestion for a possible flatten():
>
> flatten(sequence, level=-1, tuples=True, strings=False, safe=False)
> - tuples=True then it flattens tuples too.
> - strings=True then it flattens strings with len(s)>1 too.
> - safe if True it cheeks (with something like an iterative isrecursive)
> for
> recursive references inside the sequence.
> - level allows to specify the mapping level:
> level=0 no flattening.
> level=1 the flattening is applied to the first level only.
> level=2 the flattening is applied to the first and second level only.
> level=m where m>=actual depth. This is as level=-1.
> Etc.
> And like in the indexing of lists:
> level=-1 or None (default) means the flattening is applied up to the
> leaves.
> level=-2 flattens up to pre-leaves.
> Etc.

My suggestion is to stop smoking Crack and check into rehab ;-)

Each one of the options listed is a reason that flatten() shouldn't be an
itertool. It fails tests of obviousness, learnability, complexity of
implementation, and simplicity of API. The options also suggest that the
abstraction is not as basic or universal as we would hope.

> >And, there is also the issue of use cases. It appears to be much more
> fun to toy around with developing flatten() recipes than it is to work
> on applications that require it.<
>
> It's not easy to define "require" because usually there are many ways
> to solve every problem.

Perhaps "require" was the wrong word. The issue is that appear to be very few
real situations where flatten() would be the tool of choice.


> There are two situations that I've found can make use of the
> flatten/partition, but you can probably find better ways to do the same
> thing (and I can appreciate suggestions):
>
> 1)
> The function f returns two values, but you need a flat list as result:
> def f(x): return x, x**2
> r = flatten( f(i) for i in range(10) )
> print r

This is not a good way to store the function's results. It unnecessarily throws
away structure. Unless creating some type of serialization function, it likely
the wrong thing to do. Also, the example appears to be contrived and not
representative of real code. If there is actually a need to treat multiple
return values as being undifferentiated, then your alternate solution with
extend() is the most appropriate solution.


> 2)
> A file with columns of numbers separated by a space/tab:

. . .


> ll = open("namefile").read().splitlines()
> r = [map(float, l.split()) for l in ll]

If you need that to be flattened one level, it would have been better to do all
the splits at once:

# split on tabs, spaces, and newlines
r = map(float, open('namefile').read().split())

Generalizing the two results, it may be fair to say that the desire to flatten
is a code smell indicating that structure is being unnecessarily destroyed or
that earlier processing introduced unwanted structure. Let the data guide the
programming.

Raymond Hettinger


Ville Vainio

unread,
Mar 14, 2005, 3:26:15 AM3/14/05
to
>>>>> "Raymond" == Raymond Hettinger <vze4...@verizon.net> writes:

Raymond> Each one of the options listed is a reason that flatten()
Raymond> shouldn't be an itertool. It fails tests of obviousness,
Raymond> learnability, complexity of implementation, and
Raymond> simplicity of API. The options also suggest that the
Raymond> abstraction is not as basic or universal as we would
Raymond> hope.

A simpler API:

def flatten(sequence, atomic_test = lambda o: isinstance(o,basestring)):
""" don't recurse into iterables if atomic_test -> True """

I believe speaking of the "levels" of flattening is contorted here.

Raymond> Perhaps "require" was the wrong word. The issue is that
Raymond> appear to be very few real situations where flatten()
Raymond> would be the tool of choice.

Suppose that I get a very complex data structure involving lists of
tuples of tuples [....] of strings. I just want to quickly search the
sequence for valid file names, without going through elaborate
unpacking. Then I just do

files = (f fof f in flatten(monster_data_struct) if os.path.isfile(str(f)))

Yep, this is a real use case (ipython + some of my own data munging
tools).

Raymond> Generalizing the two results, it may be fair to say that
Raymond> the desire to flatten is a code smell indicating that
Raymond> structure is being unnecessarily destroyed or that
Raymond> earlier processing introduced unwanted structure. Let
Raymond> the data guide the programming.

You are looking the problem from a specific mindset, that of writing
good clean pythonic code. flatten is for situations when you need an
implementation 20 seconds ago (where someone might have recommended
perl in the past, and which is a perfectly valid niche for Python as
well).

It's not a matter of life & death for me, obviously (it's in my own
stdlib). I still can't see how its existence would make rest of
itertools magically harder to learn. When I come up with a problem
where I imagine itertools might come in handy, I check the docs to see
whether there is anything appropriate for the problem. I don't
memorize all the functions, just the fact that such functions
exist.

Also, the following itertool functions are not very useful anymore,
with the advent of genexps:

ifilter(pred, seq) --> elements of seq where pred(elem) is True
ifilterfalse(pred, seq) --> elements of seq where pred(elem) is False
imap(fun, p, q, ...) --> fun(p0, q0), fun(p1, q1), ...
starmap(fun, seq) --> fun(*seq[0]), fun(*seq[1]), ...

I don't believe a genuinely useful 'flatten' would increase the
cognitive load any more than these.

Leif K-Brooks

unread,
Mar 14, 2005, 6:50:35 AM3/14/05
to
Michael Spencer wrote:
> if hasattr(item,"__iter__"): # Avoids iterating over strings

That's probably the cleanest way to avoid strings, but it's
unfortunately not a good idea IMHO. Many objects (IndexedCatalog's
Result objects are what I'm concerned about, but there are undoubtedly
others) still rely entirely on the old-style sequence iteration protocol
even though they're proper containers, so checking for __iter__ wouldn't
work on them. Explicitly doing isinstance(item, basestring) is probably
the best option until older objects move to the new iteration protocol.

Steven Bethard

unread,
Mar 14, 2005, 10:25:45 AM3/14/05
to
Ville Vainio wrote:
> A simpler API:
>
> def flatten(sequence, atomic_test = lambda o: isinstance(o,basestring)):
> """ don't recurse into iterables if atomic_test -> True """

Yes, this is also the API I would have suggested. Simple, but flexible
enough to handle the odd cases with the occasional user-defined iterable
non-containers.

STeVe

Robert Brewer

unread,
Mar 14, 2005, 12:53:18 PM3/14/05
to pytho...@python.org

If there are two or three common atomic_test's they could also be placed
into itertools globals (with better names, of course ;):

def is_string(item):
return isinstance(item, basestring)

def flatten(seq, atomic_test = is_string):
...


Perhaps atomic_test could allow a tuple or list of tests and combine
them...?


Robert Brewer
MIS
Amor Ministries
fuma...@amor.org

Steven Bethard

unread,
Mar 14, 2005, 1:12:47 PM3/14/05
to

Did you have other atomic tests in mind? All the use cases I've ever
seen have always been with strings. (And I've never created any class
whose items are instances of the class.)

I don't see much benefit in complicating the API of flatten unless
there's a really good use case for multiple atomic tests (that can't
just as easily be solved by writing your own function that does the
appropriate complex atomicity test). I also have the feeling that any
complicated atomictiy test is more than a simple and-ing of several tests...

STeVe

Ville Vainio

unread,
Mar 14, 2005, 1:24:31 PM3/14/05
to
>>>>> "Steven" == Steven Bethard <steven....@gmail.com> writes: Steven> complex atomicity test). I also have the feeling that any Steven> complicated atomictiy test is more than a simple and-ing Steven> of several tests... I also have the feeling that if the atomicity criterion was any more complex in the API, the proposal would be shot down immediately on the grounds of not being fundamental enough as concept. Ville Vainio http://tinyurl.com/2prnb

Michael Spencer

unread,
Mar 14, 2005, 1:02:23 PM3/14/05
to pytho...@python.org
Sure, the preceding part of my post:

> One issue is specifying iterable types which should be atomic (notably strings). This uses a simple hardwired test for that.

...makes it clear that 'hasattr(item,"__iter__")' is not claiming to be a
general atomicity test.

Using something like:
if not isinstance(item, basestring):
try:
iterator = iter(item)
etc...

...defines a rule that works for your case, but it also means that any
non-basestring class that implements __getitem__ gets flattened. That may be
what you want...or not.

I think flatten is a special-case function, and it is therefore most practical
to define its atomicity rules on a case-by-case basis.

Michael


Raymond Hettinger

unread,
Mar 14, 2005, 1:52:21 PM3/14/05
to
> Steven> complex atomicity test). I also have the feeling that any
> Steven> complicated atomictiy test is more than a simple and-ing
> Steven> of several tests...

"Ville Vainio"


> I also have the feeling that if the atomicity criterion was any more
> complex in the API, the proposal would be shot down immediately on the
> grounds of not being fundamental enough as concept.

Would this meet your needs?

def flatten(iterable, atomic_iterable_types=(basestring,)):
iterstack = [iter(iterable)]
while iterstack:
for elem in iterstack[-1]:
if not isinstance(elem, atomic_iterable_types):
try:
it = iter(elem)
except TypeError:
pass
else:
iterstack.append(it)
break
yield elem
else:
iterstack.pop() # only remove iterator when it is exhausted


Raymond Hettinger


Ville Vainio

unread,
Mar 14, 2005, 2:30:17 PM3/14/05
to
>>>>> "Raymond" == Raymond Hettinger <vze4...@verizon.net> writes:

Steven> complex atomicity test). I also have the feeling that any
Steven> complicated atomictiy test is more than a simple and-ing
Steven> of several tests...

Raymond> "Ville Vainio"

>> I also have the feeling that if the atomicity criterion was any
>> more complex in the API, the proposal would be shot down
>> immediately on the grounds of not being fundamental enough as
>> concept.

Raymond> Would this meet your needs?

Raymond> def flatten(iterable, atomic_iterable_types=(basestring,)):

Yes, completely.

bearoph...@lycos.com

unread,
Mar 14, 2005, 2:46:30 PM3/14/05
to
Thank you for your answers, Raymond Hettinger.

>The options also suggest that the abstraction is not as basic or
universal as we would hope.<

I don't understand, but this is normal.


> ll = open("namefile").read().split()
> r = partition(map(float, ll), 4)

>If you need that to be flattened one level, it would have been better


to do all the splits at once:<

Uhm, my purpose was the opposite, as you can see from the alternative
version. It was an example of using partition(), that is kind of
opposite of flatten().


>Generalizing the two results, it may be fair to say that the desire to
flatten is a code smell indicating that structure is being
unnecessarily destroyed or that earlier processing introduced unwanted
structure.<

Probably for my programming style flatten() is useful, but this is a
very subjective thing (and I've already coded my flatten that I use not
much frequently, without the level parameter inspired by a similar
Matemathica one), I've also implemented little other things from the
Delphi and Logo language, for my own use in Python.
For me it's not easy to improve/extend Python, maybe I have to stop
trying it, and I have just to use this language as you people are
giving us...

Than you, a bearish hug,
Bearophile

Christos TZOTZIOY

unread,
Mar 16, 2005, 3:22:45 AM3/16/05
to
On Sun, 13 Mar 2005 06:52:40 GMT, rumours say that "Raymond Hettinger"
<vze4...@verizon.net> might have written:

[snip of lots of stuff]

>itertools.window() with n=2 got rejected. Almost all proposed uses had better
>solutions (such as an accumulator variable or fibonacci sequence style logic:
>a, b = b, a+b). Writing it in C afforded only small speed advantage over a
>solution using izip() and tee().

Speed or C implementation was not my objection. I don't care if C itertools
gets renamed to _itertools and some Python itertools imports _itertools first
thing a la heapq, and then the most probably useful doc recipes get defined in
Python; in fact, I would like that. Do you think that window is rarely used?
You didn't say that in your post, you just argued against implementing it in C.
I agree. Don't implement it in C. Implement it as in the recipe. Just make it
readily available. I volunteer for a patch if we agree on it[1].

For example, if speed is our main concern, why is there PEP 309 for partial
function application as a class? The tools are already there, and
new.instancemethod is our friend. In the case of binding only the first
argument, which is the most common, calling the resulting object is 2% faster
than calling the function itself (at least on my laptop). Of course, PEP 309 is
not about speed; it's about availability of /correct/ and /useful/
functionality. Kind of like recipes in the itertools documentation.

>'nuff said, mister whatever happened to batteries included ;-)

Cool extensive summarising reply --however, how relevant is it? None of the
itertools.(queue|roundrobin|accumulate|multi_gen|remove_value|eq|consume|ilines)
that you reference exists as a recipe in the docs.

Did I make you believe I cared about the fate of any function judged unworthy
even for the documentation? Because, if it is my fault, let me correct that: I
am talking about the recipes in the documentation (those found worthy enough to
be included in the docs, but not found worthy enough to be included in the lib),
and specifically for window and flatten: why aren't they considered as
batteries? About flatten, you say things are still warm. About window, you say
it's not really efficient, use izip and tee, or use a deque for n>2. Ok. I
never spoke about speed. I spoke about tool availability.

IIRC you were cooled down in Python-Dev from adding more functions into
itertools; it was kind of a surprise for me now to see /you/ of all people
defending this choice. FWIW, I like speed (who doesn't if nothing else is
affected?); I have an unpolished CPython pcode decompiler / compiler that offers
a decorator for function-level optimising and a function that modifies *.py[co]
files replacing patterns like the one you yourself lately worried about [2],
which module I leave as-is partly waiting the AST branch to be finished and
partly because now and then you implement some part of it in the peephole
optimiser. So no objections about speed, just note that I didn't mention it in
my previous post.

AFAIAC, in my own stdlib-supporting modules what I need is readily available;
possibly I'm just whining that putting recipes in the docs as an 'extended
toolset' instead of in a module is a joking compromise...

>P.S. It's not an accident that the recipes in the itertools docs are in a form
>that is directly cut and pastable into a working application.

[1] Like I said, I volunteer to copy-paste from the docs into a Python module
for the stdlib. Thanks for the eye-opening hint!-)

[2] STORE_<sth> x/LOAD_<sth> x into DUP/STORE_<sth> x, local binding of multiple
LOAD_sth x/LOAD_ATTR/LOAD_ATTR where x is an imported module into a LOAD_FAST or
a LOAD_CONST based on whether it's the modifier or the decorator,
LOAD_CONST*/BUILD_LIST into LOAD_CONST a_tuple/UNPACK_SEQUENCE/BUILD_TUPLE,
"a,b,c=d,e,f" reordering and dropping of the tuple building/unpacking (which is
a case already improved in C) etc.

Raymond Hettinger

unread,
Mar 17, 2005, 1:09:44 AM3/17/05
to
>itertools.window() with n=2 got rejected. Almost all proposed uses had better
> >solutions (such as an accumulator variable or fibonacci sequence style logic:
> >a, b = b, a+b). Writing it in C afforded only small speed advantage over a
> >solution using izip() and tee().

[Christos TZOTZIOY Georgiou]


> Speed or C implementation was not my objection.

. . .


> Just make it
> readily available. I volunteer for a patch if we agree on it[1].

No thanks. A patch was developed, discussed, and rejected long ago. The main
reason was that existing approaches were often better (at least in the context
of the use cases that we reviewed). Performance was mentioned because that is
sometimes used to justify a new tool. In this case, that justification was not
strong.


> None of the
>
itertools.(queue|roundrobin|accumulate|multi_gen|remove_value|eq|consume|ilines)
> that you reference exists as a recipe in the docs.

FWIW, the recipe for roundrobin is at:
http://docs.python.org/lib/deque-recipes.html .

The only other one worth further discussion is iterequals(). I held-off
publishing that one because the opportunites for error were too great (see the
previous post for details).

> Did I make you believe I cared about the fate of any function judged unworthy
> even for the documentation?

No. My note was mainly for the benefit of those who had an interest in what
type of ideas had been discussed and the reasoning behind their
inclusion/exclusion. It needed to be documented somewhere and the newsgroup
discussion on a couple of proposals provided an opportunity to put those notes
on record.

> I'm just whining that putting recipes in the docs as an 'extended
> toolset' instead of in a module is a joking compromise...

Not really. The recipes have several uses and none of them are compromises.

Firstly, they serve as a proving ground which helps inform further development.
As a recipe, they can be readily altered, improved, or removed without breaking
anything. However, once I include them as part of the module, the API gets
frozen, as do any accidents of implementation. Once released, the process of
making repairs or alterations becomes slow and painful. Over time, some of the
recipes will disappear. some will evolve, and some will work their way into a
module (witness any() and all() as recent examples).

By way of comparision, consider the evolution of set()/frozenset() which went
through stages as recipes, as a PEP, then as Python module, and finally as C
coded built-ins. That multi-stage process was deliberate and resulted in the
final version being excellent. Similarly, many new decorators are going to
start their lives as wiki entries or recipes. Ultimately, some will make it
into the standard library. It would be a mistake to make that transition too
quickly.

The other purpose of the itertool recipes is to serve as a teaching tool showing
how to combine the tools and how to integrate them with other Python code. IMO,
most of the recipes are more useful in this capacity than as immediate solutions
to particular problems.

Raymond Hettinger


Christos TZOTZIOY

unread,
Mar 17, 2005, 5:23:34 AM3/17/05
to
On Thu, 17 Mar 2005 06:09:44 GMT, rumours say that "Raymond Hettinger"
<vze4...@verizon.net> might have written:

[snip]

>> Did I make you believe I cared about the fate of any function judged unworthy
>> even for the documentation?
>
>No. My note was mainly for the benefit of those who had an interest in what
>type of ideas had been discussed and the reasoning behind their
>inclusion/exclusion. It needed to be documented somewhere and the newsgroup
>discussion on a couple of proposals provided an opportunity to put those notes
>on record.

In that case, I thank you too (like Terry) for the trouble writing down those
notes.

>> I'm just whining that putting recipes in the docs as an 'extended
>> toolset' instead of in a module is a joking compromise...
>
>Not really. The recipes have several uses and none of them are compromises.

Of course they aren't compromises. Who said they were? The subsentence "is a
joking compromise", to which your "Not really" applies, has "putting" as
subject, not "uses of recipes". It's possible that my lack of fluency in the
English language confused you.

[snip]

>By way of comparision, consider the evolution of set()/frozenset() which went
>through stages as recipes, as a PEP, then as Python module, and finally as C
>coded built-ins. That multi-stage process was deliberate and resulted in the
>final version being excellent. Similarly, many new decorators are going to
>start their lives as wiki entries or recipes. Ultimately, some will make it
>into the standard library. It would be a mistake to make that transition too
>quickly.

That (long cycle/excellent results) is surely true. And sets are *very* useful
as the majority would agree. Thanks about sets, too.

>The other purpose of the itertool recipes is to serve as a teaching tool showing
>how to combine the tools and how to integrate them with other Python code. IMO,
>most of the recipes are more useful in this capacity than as immediate solutions
>to particular problems.

Well, I have to respect your opinion and so I drop the subject... but with my
dying breath, re:

>to serve as a teaching tool showing
>>how to combine the tools and how to integrate them with other Python code.

, I cry "that's why we hint at people to /read/ the /source/ of the standard
library..." :)

Cheers.

Reply all
Reply to author
Forward
0 new messages