Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

flatten a list of list

5 views
Skip to first unread message

Terry

unread,
Aug 16, 2009, 5:47:42 AM8/16/09
to
Hi,

Is there a simple way (the pythonic way) to flatten a list of list?
rather than my current solution:

new_list=[]
for l in list_of_list:
new_list.extend(l)

or,

new_list=reduce(lambda x,y:x.extend(y), list_of_list)

br, Terry

Chris Rebert

unread,
Aug 16, 2009, 5:55:48 AM8/16/09
to Terry, pytho...@python.org

#only marginally better:
from operator import add
new_list = reduce(add, list_of_list)

#from the itertools recipes:
from itertools import chain
def flatten(listOfLists):
return list(chain.from_iterable(listOfLists))

Cheers,
Chris
--
http://blog.rebertia.com

Michael Fötsch

unread,
Aug 16, 2009, 6:03:53 AM8/16/09
to pytho...@python.org
Terry wrote:
> Is there a simple way (the pythonic way) to flatten a list of list?

This is probably the shortest it can get:

sum(list_of_lists, [])


Kind Regards,
M.F.

Steven D'Aprano

unread,
Aug 16, 2009, 6:49:59 AM8/16/09
to
On Sun, 16 Aug 2009 05:55:48 -0400, Chris Rebert wrote:

> On Sun, Aug 16, 2009 at 5:47 AM, Terry<terry....@gmail.com> wrote:
>> Hi,
>>
>> Is there a simple way (the pythonic way) to flatten a list of list?
>> rather than my current solution:
>>
>> new_list=[]
>> for l in list_of_list:
>>    new_list.extend(l)
>>
>> or,
>>
>> new_list=reduce(lambda x,y:x.extend(y), list_of_list)
>
> #only marginally better:
> from operator import add
> new_list = reduce(add, list_of_list)

Surely that's going to be O(N**2)?

I'd predict that would perform even worse than O(N**2) string
concatenation, on account that a string of size N has to allocate space
for N bytes, but a list of size N allocates space for N pointers (each 4
bytes, or 8 depending on the system), rounded up to the nearest power of
two. Also CPython can optimize string concatenation to O(N) under some
circumstances, but not lists.


>>> from timeit import Timer
>>> setup = """\\
... L = [ ([None]*5000) for x in xrange(%d) ]
... from operator import add
... """
>>> Timer("reduce(add, L)", setup % 4).repeat(number=1000)
[0.53808808326721191, 0.51404905319213867, 0.51157188415527344]
>>> Timer("reduce(add, L)", setup % 8).repeat(number=1000)
[2.1178171634674072, 3.8830602169036865, 4.72245192527771]
>>> Timer("reduce(add, L)", setup % 16).repeat(number=1000)
[17.190337896347046, 21.572744131088257, 21.584265947341919]


Looks like O(N**2) behaviour to me.

--
Steven

Chris Rebert

unread,
Aug 16, 2009, 6:59:52 AM8/16/09
to Steven D'Aprano, pytho...@python.org
On Sun, Aug 16, 2009 at 6:49 AM, Steven
D'Aprano<st...@remove-this-cybersource.com.au> wrote:
> On Sun, 16 Aug 2009 05:55:48 -0400, Chris Rebert wrote:
>> On Sun, Aug 16, 2009 at 5:47 AM, Terry<terry....@gmail.com> wrote:
>>> Hi,
>>>
>>> Is there a simple way (the pythonic way) to flatten a list of list?
>>> rather than my current solution:
>>>
>>> new_list=[]
>>> for l in list_of_list:
>>>    new_list.extend(l)
>>>
>>> or,
>>>
>>> new_list=reduce(lambda x,y:x.extend(y), list_of_list)
>>
>> #only marginally better:
>> from operator import add
>> new_list = reduce(add, list_of_list)
>
> Surely that's going to be O(N**2)?

The OP asked for "simple", not "best", "most proper", or "fastest". My
comment was intended to mean that the code was marginally *simpler*,
not faster.

Steven D'Aprano

unread,
Aug 16, 2009, 7:08:09 AM8/16/09
to
On Sun, 16 Aug 2009 12:03:53 +0200, Michael Fötsch wrote:

> Terry wrote:
>> Is there a simple way (the pythonic way) to flatten a list of list?
>
> This is probably the shortest it can get:
>
> sum(list_of_lists, [])


That's also O(N**2).


>>> from timeit import Timer
>>> setup = "L = [ ([None]*5000) for x in xrange(%d) ]"
>>> Timer("sum(L, [])", setup % 4).repeat(number=1000)
[0.6070549488067627, 0.54354715347290039, 0.54686999320983887]
>>> Timer("sum(L, [])", setup % 8).repeat(number=1000)
[2.1285719871520996, 3.6722278594970703, 4.0785009860992432]
>>> Timer("sum(L, [])", setup % 16).repeat(number=1000)
[18.370341062545776, 20.40509295463562, 21.871708869934082]

--
Steven

Terry

unread,
Aug 16, 2009, 7:22:00 AM8/16/09
to
On Aug 16, 6:59 pm, Chris Rebert <c...@rebertia.com> wrote:
> On Sun, Aug 16, 2009 at 6:49 AM, Steven
>
>
>
>
>
> D'Aprano<st...@remove-this-cybersource.com.au> wrote:
> > On Sun, 16 Aug 2009 05:55:48 -0400, Chris Rebert wrote:
> >> On Sun, Aug 16, 2009 at 5:47 AM, Terry<terry.yin...@gmail.com> wrote:
> >>> Hi,
>
> >>> Is there a simple way (the pythonic way) to flatten a list of list?
> >>> rather than my current solution:
>
> >>> new_list=[]
> >>> for l in list_of_list:
> >>>    new_list.extend(l)
>
> >>> or,
>
> >>> new_list=reduce(lambda x,y:x.extend(y), list_of_list)
>
> >> #only marginally better:
> >> from operator import add
> >> new_list = reduce(add, list_of_list)
>
> > Surely that's going to be O(N**2)?
>
> The OP asked for "simple", not "best", "most proper", or "fastest". My
> comment was intended to mean that the code was marginally *simpler*,
> not faster.
>
> Cheers,
> Chris
> --http://blog.rebertia.com

Well, if possible, I'd like not only to know a simple solution, but
also the 'best', the 'most proper' and the 'fastest':-)

If they are not the same.

br, Terry

Steven D'Aprano

unread,
Aug 16, 2009, 7:25:53 AM8/16/09
to
On Sun, 16 Aug 2009 02:47:42 -0700, Terry wrote:

> Hi,
>
> Is there a simple way (the pythonic way) to flatten a list of list?
> rather than my current solution:
>
> new_list=[]
> for l in list_of_list:
> new_list.extend(l)

I don't think that scales terribly well. In my testing, it performs about
as badly as the O(N**2) behaviours others have suggested (using sum or
reduce with add) -- perhaps a bit worse for small N, but not quite as
badly for large N.


> new_list=reduce(lambda x,y:x.extend(y), list_of_list)

That doesn't even work.

>>> list_of_list = [ [1,2,3], [2, 4, 8] ]
>>> new_list=reduce(lambda x,y:x.extend(y), list_of_list)
>>> new_list is None
True

Chris' suggestion using itertools seems pretty good:

>>> from timeit import Timer
>>> setup = """\\
... L = [ [None]*5000 for _ in xrange(%d) ]
... from itertools import chain
... """
>>> Timer("list(chain.from_iterable(L))", setup % 4).repeat(number=1000)
[0.61839914321899414, 0.61799716949462891, 0.62065696716308594]
>>> Timer("list(chain.from_iterable(L))", setup % 8).repeat(number=1000)
[1.2618398666381836, 1.3385050296783447, 3.9113419055938721]
>>> Timer("list(chain.from_iterable(L))", setup % 16).repeat(number=1000)
[3.1349358558654785, 4.8554730415344238, 5.4319999217987061]


--
Steven

Steven D'Aprano

unread,
Aug 16, 2009, 7:31:57 AM8/16/09
to
On Sun, 16 Aug 2009 06:59:52 -0400, Chris Rebert wrote:

>> Surely that's going to be O(N**2)?
>
> The OP asked for "simple", not "best", "most proper", or "fastest". My
> comment was intended to mean that the code was marginally *simpler*, not
> faster.

Fair enough, but he also asked for Pythonic, and while some people might
argue that "terrible performance" is Pythonic, I hope you wouldn't be one
of them! :)

If it soothes your ruffled sense of honour *wink*, I think your solution
with itertools.chain is probably the best so far.


--
Steven

Chris Rebert

unread,
Aug 16, 2009, 7:43:00 AM8/16/09
to Steven D'Aprano, pytho...@python.org
On Sun, Aug 16, 2009 at 7:31 AM, Steven
D'Aprano<st...@remove-this-cybersource.com.au> wrote:
> On Sun, 16 Aug 2009 06:59:52 -0400, Chris Rebert wrote:
>>> Surely that's going to be O(N**2)?
>>
>> The OP asked for "simple", not "best", "most proper", or "fastest". My
>> comment was intended to mean that the code was marginally *simpler*, not
>> faster.
>
> Fair enough, but he also asked for Pythonic, and while some people might
> argue that "terrible performance" is Pythonic, I hope you wouldn't be one
> of them! :)

Indeed not. :) I expected it would be worse performance-wise than the
OP's original due to all the intermediate lists that get produced; it
shouldn't be used in optimized production code.

> If it soothes your ruffled sense of honour *wink*, I think your solution
> with itertools.chain is probably the best so far.

Except it's not really my solution, it's whoever put it in the
itertools docs's. :(
But I'm glad to been able to help by pointing the recipe out. :-)

Bearophile

unread,
Aug 16, 2009, 8:24:36 AM8/16/09
to
Chris Rebert:

> The OP asked for "simple", not "best", "most proper", or "fastest". My
> comment was intended to mean that the code was marginally *simpler*,
> not faster.

Yep, the OP has asked for simple code. But often this is not the right
way to solve this situation. A better way is to create (or copy) a
flatten that's efficient and well tested & debugged, put it into some
library of utilities, and then use import&use it when it's needed.

Bye,
bearophile

Scott David Daniels

unread,
Aug 16, 2009, 12:40:57 PM8/16/09
to
Steven D'Aprano wrote:
> On Sun, 16 Aug 2009 02:47:42 -0700, Terry wrote:
>> Is there a simple way (the pythonic way) to flatten a list of list?
> Chris' suggestion using itertools seems pretty good:
>
>>>> from timeit import Timer
>>>> setup = """\\
> ... L = [ [None]*5000 for _ in xrange(%d) ]
> ... from itertools import chain
> ... """
>>>> Timer("list(chain.from_iterable(L))", setup % 4).repeat(number=1000)
> [0.61839914321899414, 0.61799716949462891, 0.62065696716308594]
>>>> Timer("list(chain.from_iterable(L))", setup % 8).repeat(number=1000)
> [1.2618398666381836, 1.3385050296783447, 3.9113419055938721]
>>>> Timer("list(chain.from_iterable(L))", setup % 16).repeat(number=1000)
> [3.1349358558654785, 4.8554730415344238, 5.4319999217987061]

OK, it definitely helps to get a size estimate before building:

>>> setup = """\\


L = [ [None]*5000 for _ in xrange(%d) ]

import itertools

class Holder(object):
def __init__(self, list_of_lists):
self._list = list_of_lists
def __iter__(self):
return itertools.chain.from_iterable(self._list)
def __len__(self):
return sum(len(x) for x in self._list)
"""

>>> timeit.Timer("list(Holder(L))", setup % 4).repeat(number=1000)
[0.59912279353940789, 0.59505886921382967, 0.59474989139681611]
>>> timeit.Timer("list(Holder(L))", setup % 8).repeat(number=1000)
[1.1898235669617208, 1.194797383466323, 1.1945367358141823]
>>> timeit.Timer("list(Holder(L))", setup % 16).repeat(number=1000)
[2.4244464031043123, 2.4261885239604482, 2.4050011942858589]

vs straight chain.from_iterable (on my machine):

[0.7828263089303249, 0.79326171343005925, 0.80967664884783019]
[1.499510971366476, 1.5263249938190455, 1.5599706107899181]
[3.4427520816193109, 3.632409426337702, 3.5290488036887382]

--Scott David Daniels
Scott....@Acm.Org

Francesco Bochicchio

unread,
Aug 16, 2009, 1:12:29 PM8/16/09
to
On Aug 16, 1:25 pm, Steven D'Aprano <st...@REMOVE-THIS-
cybersource.com.au> wrote:

...

> Chris' suggestion using itertools seems pretty good:
>
> >>> from timeit import Timer
> >>> setup = """\\
>
> ... L = [ [None]*5000 for _ in xrange(%d) ]
> ... from itertools import chain
> ... """>>> Timer("list(chain.from_iterable(L))", setup % 4).repeat(number=1000)
>
> [0.61839914321899414, 0.61799716949462891, 0.62065696716308594]>>> Timer("list(chain.from_iterable(L))", setup % 8).repeat(number=1000)
>
> [1.2618398666381836, 1.3385050296783447, 3.9113419055938721]>>> Timer("list(chain.from_iterable(L))", setup % 16).repeat(number=1000)
>
> [3.1349358558654785, 4.8554730415344238, 5.4319999217987061]
>
> --
> Steven

I had a peek at itertools ( which is a C module btw) and realized that
chain solves the problem by creating a chain object,
which is a sort of generator. Both creating the chain object and
converting the chain object to a list seem to be O(N), so
the whole is O(N) too ...

Then I tried this pure python version:

# ----- CODE


from timeit import Timer
setup = """\\

L = [ [None]*5000 for _ in range(%d) ]
def pychain( list_of_list ):
for l in list_of_list:
for elem in l:
yield elem
"""

print( Timer("list(pychain(L))", setup % 4).repeat(number=1000))
print( Timer("list(pychain(L))", setup % 8).repeat(number=1000))
print( Timer("list(pychain(L))", setup % 16).repeat(number=1000))
# ----- END CODE


and got times that seem to confirm it :

[2.818755865097046, 2.7880589962005615, 2.79232120513916]
[5.588631868362427, 5.588244915008545, 5.587780952453613]
[11.620548009872437, 11.39465618133545, 11.40834903717041]

For reference, here are the times of the itertools.chain solution on
my laptop:

[0.6518809795379639, 0.6491332054138184, 0.6483590602874756]
[1.3188841342926025, 1.3173959255218506, 1.3207998275756836]
[2.7200729846954346, 2.5402050018310547, 2.543621063232422]

All this with Python 3.1 compiled from source on Xubuntu 8.10.

Ciao
-----
FB

Alan G Isaac

unread,
Aug 16, 2009, 1:37:04 PM8/16/09
to
On 8/16/2009 5:47 AM Terry apparently wrote:
> Is there a simple way (the pythonic way) to flatten a list of list?
> rather than my current solution:

> new_list=[]
> for l in list_of_list:
> new_list.extend(l)


new_list = list(xi for lst in list_of_list for xi in lst)

hth,
Alan Isaac

Paul Rubin

unread,
Aug 16, 2009, 1:40:41 PM8/16/09
to
Terry <terry....@gmail.com> writes:
> Is there a simple way (the pythonic way) to flatten a list of list?
> rather than my current solution:
>
> new_list=[]
> for l in list_of_list:
> new_list.extend(l)

from itertools import chain
new_list = list(chain(list_of_list))

Tim Cook

unread,
Aug 17, 2009, 1:30:44 PM8/17/09
to

Well, This is not simple but it is comprhensive in that it has to do
several things. I am using it to decompose deeply nested lists from
Pyparsing output that may have strings in a variety of languages.
Performance wise I do not know how it stacks up against the other
examples but it works for me. :-)

def flatten(x):
"""flatten(sequence) -> list

Returns a single, flat list which contains all elements retrieved
from the sequence and all recursively contained sub-sequences
(iterables). All strings are converted to unicode.

"""
result = []
for el in x:
#if isinstance(el, (list, tuple)):
if hasattr(el, "__iter__") and not isinstance(el, basestring):
result.extend(flatten(el))
else:
result.append(el)


# all strings must be unicode
rtnlist=[]
for x in result:
if isinstance(x,str):
# replace any brackets so Python doesn't think it's a list
and we still have a seperator.
x=x.replace('[','_')
x=x.replace(']','_')
try:
x=unicode(x, "utf8") # need more decode types here
except UnicodeDecodeError:
x=unicode(x, "latin1")
except UnicodeDecodeError:
x=unicode(x,"iso-8859-1")
except UnicodeDecodeError:
x=unicode(x,"eucJP")

rtnlist.append(x)

return rtnlist

0 new messages