Is there a simple way (the pythonic way) to flatten a list of list?
rather than my current solution:
new_list=[]
for l in list_of_list:
new_list.extend(l)
or,
new_list=reduce(lambda x,y:x.extend(y), list_of_list)
br, Terry
#only marginally better:
from operator import add
new_list = reduce(add, list_of_list)
#from the itertools recipes:
from itertools import chain
def flatten(listOfLists):
return list(chain.from_iterable(listOfLists))
Cheers,
Chris
--
http://blog.rebertia.com
This is probably the shortest it can get:
sum(list_of_lists, [])
Kind Regards,
M.F.
> On Sun, Aug 16, 2009 at 5:47 AM, Terry<terry....@gmail.com> wrote:
>> Hi,
>>
>> Is there a simple way (the pythonic way) to flatten a list of list?
>> rather than my current solution:
>>
>> new_list=[]
>> for l in list_of_list:
>> new_list.extend(l)
>>
>> or,
>>
>> new_list=reduce(lambda x,y:x.extend(y), list_of_list)
>
> #only marginally better:
> from operator import add
> new_list = reduce(add, list_of_list)
Surely that's going to be O(N**2)?
I'd predict that would perform even worse than O(N**2) string
concatenation, on account that a string of size N has to allocate space
for N bytes, but a list of size N allocates space for N pointers (each 4
bytes, or 8 depending on the system), rounded up to the nearest power of
two. Also CPython can optimize string concatenation to O(N) under some
circumstances, but not lists.
>>> from timeit import Timer
>>> setup = """\\
... L = [ ([None]*5000) for x in xrange(%d) ]
... from operator import add
... """
>>> Timer("reduce(add, L)", setup % 4).repeat(number=1000)
[0.53808808326721191, 0.51404905319213867, 0.51157188415527344]
>>> Timer("reduce(add, L)", setup % 8).repeat(number=1000)
[2.1178171634674072, 3.8830602169036865, 4.72245192527771]
>>> Timer("reduce(add, L)", setup % 16).repeat(number=1000)
[17.190337896347046, 21.572744131088257, 21.584265947341919]
Looks like O(N**2) behaviour to me.
--
Steven
The OP asked for "simple", not "best", "most proper", or "fastest". My
comment was intended to mean that the code was marginally *simpler*,
not faster.
> Terry wrote:
>> Is there a simple way (the pythonic way) to flatten a list of list?
>
> This is probably the shortest it can get:
>
> sum(list_of_lists, [])
That's also O(N**2).
>>> from timeit import Timer
>>> setup = "L = [ ([None]*5000) for x in xrange(%d) ]"
>>> Timer("sum(L, [])", setup % 4).repeat(number=1000)
[0.6070549488067627, 0.54354715347290039, 0.54686999320983887]
>>> Timer("sum(L, [])", setup % 8).repeat(number=1000)
[2.1285719871520996, 3.6722278594970703, 4.0785009860992432]
>>> Timer("sum(L, [])", setup % 16).repeat(number=1000)
[18.370341062545776, 20.40509295463562, 21.871708869934082]
--
Steven
Well, if possible, I'd like not only to know a simple solution, but
also the 'best', the 'most proper' and the 'fastest':-)
If they are not the same.
br, Terry
> Hi,
>
> Is there a simple way (the pythonic way) to flatten a list of list?
> rather than my current solution:
>
> new_list=[]
> for l in list_of_list:
> new_list.extend(l)
I don't think that scales terribly well. In my testing, it performs about
as badly as the O(N**2) behaviours others have suggested (using sum or
reduce with add) -- perhaps a bit worse for small N, but not quite as
badly for large N.
> new_list=reduce(lambda x,y:x.extend(y), list_of_list)
That doesn't even work.
>>> list_of_list = [ [1,2,3], [2, 4, 8] ]
>>> new_list=reduce(lambda x,y:x.extend(y), list_of_list)
>>> new_list is None
True
Chris' suggestion using itertools seems pretty good:
>>> from timeit import Timer
>>> setup = """\\
... L = [ [None]*5000 for _ in xrange(%d) ]
... from itertools import chain
... """
>>> Timer("list(chain.from_iterable(L))", setup % 4).repeat(number=1000)
[0.61839914321899414, 0.61799716949462891, 0.62065696716308594]
>>> Timer("list(chain.from_iterable(L))", setup % 8).repeat(number=1000)
[1.2618398666381836, 1.3385050296783447, 3.9113419055938721]
>>> Timer("list(chain.from_iterable(L))", setup % 16).repeat(number=1000)
[3.1349358558654785, 4.8554730415344238, 5.4319999217987061]
--
Steven
>> Surely that's going to be O(N**2)?
>
> The OP asked for "simple", not "best", "most proper", or "fastest". My
> comment was intended to mean that the code was marginally *simpler*, not
> faster.
Fair enough, but he also asked for Pythonic, and while some people might
argue that "terrible performance" is Pythonic, I hope you wouldn't be one
of them! :)
If it soothes your ruffled sense of honour *wink*, I think your solution
with itertools.chain is probably the best so far.
--
Steven
Indeed not. :) I expected it would be worse performance-wise than the
OP's original due to all the intermediate lists that get produced; it
shouldn't be used in optimized production code.
> If it soothes your ruffled sense of honour *wink*, I think your solution
> with itertools.chain is probably the best so far.
Except it's not really my solution, it's whoever put it in the
itertools docs's. :(
But I'm glad to been able to help by pointing the recipe out. :-)
Yep, the OP has asked for simple code. But often this is not the right
way to solve this situation. A better way is to create (or copy) a
flatten that's efficient and well tested & debugged, put it into some
library of utilities, and then use import&use it when it's needed.
Bye,
bearophile
OK, it definitely helps to get a size estimate before building:
>>> setup = """\\
L = [ [None]*5000 for _ in xrange(%d) ]
import itertools
class Holder(object):
def __init__(self, list_of_lists):
self._list = list_of_lists
def __iter__(self):
return itertools.chain.from_iterable(self._list)
def __len__(self):
return sum(len(x) for x in self._list)
"""
>>> timeit.Timer("list(Holder(L))", setup % 4).repeat(number=1000)
[0.59912279353940789, 0.59505886921382967, 0.59474989139681611]
>>> timeit.Timer("list(Holder(L))", setup % 8).repeat(number=1000)
[1.1898235669617208, 1.194797383466323, 1.1945367358141823]
>>> timeit.Timer("list(Holder(L))", setup % 16).repeat(number=1000)
[2.4244464031043123, 2.4261885239604482, 2.4050011942858589]
vs straight chain.from_iterable (on my machine):
[0.7828263089303249, 0.79326171343005925, 0.80967664884783019]
[1.499510971366476, 1.5263249938190455, 1.5599706107899181]
[3.4427520816193109, 3.632409426337702, 3.5290488036887382]
--Scott David Daniels
Scott....@Acm.Org
...
> Chris' suggestion using itertools seems pretty good:
>
> >>> from timeit import Timer
> >>> setup = """\\
>
> ... L = [ [None]*5000 for _ in xrange(%d) ]
> ... from itertools import chain
> ... """>>> Timer("list(chain.from_iterable(L))", setup % 4).repeat(number=1000)
>
> [0.61839914321899414, 0.61799716949462891, 0.62065696716308594]>>> Timer("list(chain.from_iterable(L))", setup % 8).repeat(number=1000)
>
> [1.2618398666381836, 1.3385050296783447, 3.9113419055938721]>>> Timer("list(chain.from_iterable(L))", setup % 16).repeat(number=1000)
>
> [3.1349358558654785, 4.8554730415344238, 5.4319999217987061]
>
> --
> Steven
I had a peek at itertools ( which is a C module btw) and realized that
chain solves the problem by creating a chain object,
which is a sort of generator. Both creating the chain object and
converting the chain object to a list seem to be O(N), so
the whole is O(N) too ...
Then I tried this pure python version:
# ----- CODE
from timeit import Timer
setup = """\\
L = [ [None]*5000 for _ in range(%d) ]
def pychain( list_of_list ):
for l in list_of_list:
for elem in l:
yield elem
"""
print( Timer("list(pychain(L))", setup % 4).repeat(number=1000))
print( Timer("list(pychain(L))", setup % 8).repeat(number=1000))
print( Timer("list(pychain(L))", setup % 16).repeat(number=1000))
# ----- END CODE
and got times that seem to confirm it :
[2.818755865097046, 2.7880589962005615, 2.79232120513916]
[5.588631868362427, 5.588244915008545, 5.587780952453613]
[11.620548009872437, 11.39465618133545, 11.40834903717041]
For reference, here are the times of the itertools.chain solution on
my laptop:
[0.6518809795379639, 0.6491332054138184, 0.6483590602874756]
[1.3188841342926025, 1.3173959255218506, 1.3207998275756836]
[2.7200729846954346, 2.5402050018310547, 2.543621063232422]
All this with Python 3.1 compiled from source on Xubuntu 8.10.
Ciao
-----
FB
> new_list=[]
> for l in list_of_list:
> new_list.extend(l)
new_list = list(xi for lst in list_of_list for xi in lst)
hth,
Alan Isaac
from itertools import chain
new_list = list(chain(list_of_list))
Well, This is not simple but it is comprhensive in that it has to do
several things. I am using it to decompose deeply nested lists from
Pyparsing output that may have strings in a variety of languages.
Performance wise I do not know how it stacks up against the other
examples but it works for me. :-)
def flatten(x):
"""flatten(sequence) -> list
Returns a single, flat list which contains all elements retrieved
from the sequence and all recursively contained sub-sequences
(iterables). All strings are converted to unicode.
"""
result = []
for el in x:
#if isinstance(el, (list, tuple)):
if hasattr(el, "__iter__") and not isinstance(el, basestring):
result.extend(flatten(el))
else:
result.append(el)
# all strings must be unicode
rtnlist=[]
for x in result:
if isinstance(x,str):
# replace any brackets so Python doesn't think it's a list
and we still have a seperator.
x=x.replace('[','_')
x=x.replace(']','_')
try:
x=unicode(x, "utf8") # need more decode types here
except UnicodeDecodeError:
x=unicode(x, "latin1")
except UnicodeDecodeError:
x=unicode(x,"iso-8859-1")
except UnicodeDecodeError:
x=unicode(x,"eucJP")
rtnlist.append(x)
return rtnlist