Splitting lists

Ferenczi Viktor

unread,

Feb 26, 2003, 10:10:17 PM2/26/03

to

Are there any fast, simple and elegant method to split a list by a filter
function?

My current solutions with their shortcomings:

# Common definitions:
lst=range(10)
def fn(x): return x<5

# Solution 1:
tl=[e for e in lst if fn(e)]
fl=[e for e in lst if not fn(e)]
# fn(e) must be evaluated twice, list must be traversed two times

# Solution 2:
tl=[]
fl=[]
for e in lst:
if fn(e): tl.append(e)
else: fl.append(e)
# Relatively slow

# Solution 3:
cl=[(e,fn(e)) for e in lst]
tl=[e[0] for e in cl if e[1]]
fl=[e[0] for e in cl if not e[1]]
# Eats up memory, list must be traversed three times

Is there any internal function to do the above?

I need something like: tl,fl=lst.split(fn)

Thanks: Complex

Ben Wolfson

unread,

Feb 26, 2003, 11:38:57 PM2/26/03

to

On Thu, 27 Feb 2003 04:10:17 +0100, Ferenczi Viktor wrote:

> Are there any fast, simple and elegant method to split a list by a filter
> function?
>
> My current solutions with their shortcomings:

> # Solution 2:

> tl=[]
> fl=[]
> for e in lst:
> if fn(e): tl.append(e)
> else: fl.append(e)
> # Relatively slow

I don't know why you call this one relatively slow; on my system, it's
faster than the other two--which stands to reason, since the list is only
traversed once and the function is only called n times.

--
I certainly seem to be enjoying myself in the same way, smacking my
lips, sighing and moaning, dripping [REDACTED] on my shirt and
smearing it into my moustache ... But ... If I snuck a lick of your
cone, I wouldn't smack my lips. -- Ted Cohen

William Park

unread,

Feb 27, 2003, 12:44:11 AM2/27/03

to

Read the documentation, ie. filter(). Also, your 2nd method isn't that
bad. Try it, you'd be surprised.

--
William Park, Open Geometry Consulting, <openge...@yahoo.ca>
Linux solution for data management and processing.

Steven Taschuk

unread,

Feb 27, 2003, 2:01:51 AM2/27/03

to

Quoth Ferenczi Viktor:

> Are there any fast, simple and elegant method to split a list by a filter
> function?

There's nothing built-in afaik; the closest thing is filter.

> My current solutions with their shortcomings:

[...]

> # Solution 2:
> tl=[]
> fl=[]
> for e in lst:
> if fn(e): tl.append(e)
> else: fl.append(e)
> # Relatively slow

Slow relative to what? It's the fastest of your three solutions:

Solution Small List Big List
1 0.27 0.21
2 0.21 0.18
3 0.33 0.26

(In the first column, each solution was run 1000 times on a
10-element list; in the second, 10 times on a 1000-element list.)

Luckily, solution 2 is also the only one which spins the list just
once, not to mention the simplest and clearest, imho.

An improvement is possible by moving one of the name lookups out
of the loop (code below):

1lookup 0.17 0.13

And, for hotspots with specific conditions, inlining the condition
is worthwhile:

inline 0.14 0.10

Code:

# 1lookup
addt = tl.append
addf = fl.append
for e in lst:
if fn(e):
addt(e)
else:
addf(e)

# inline
for e in lst:
if e < 5:
tl.append(e)
else:
fl.append(e)

--
Steven Taschuk stas...@telusplanet.net
Every public frenzy produces legislation purporting to address it.
(Kinsley's Law)

Alex Martelli

unread,

Feb 27, 2003, 5:38:54 AM2/27/03

to

Ben Wolfson wrote:

> On Thu, 27 Feb 2003 04:10:17 +0100, Ferenczi Viktor wrote:
>
>> Are there any fast, simple and elegant method to split a list by a filter
>> function?
>>
>> My current solutions with their shortcomings:
>
>> # Solution 2:
>> tl=[]
>> fl=[]
>> for e in lst:
>> if fn(e): tl.append(e)
>> else: fl.append(e)
>> # Relatively slow
>
> I don't know why you call this one relatively slow; on my system, it's
> faster than the other two--which stands to reason, since the list is only
> traversed once and the function is only called n times.

Yes, and I think it's the most Pythonic approach. However, it
would not impress people -- it's too clear and understandable,
so everybody can see at once what it does, and nobody will be
oohing and aahing about how "kewl" Python is.

More impressive...:

appenders = tl.append, fl.append
for e in lst: appenders[not fn(e)](e)

now isn't THAT more like it...?

Alex

Duncan Booth

unread,

Feb 27, 2003, 6:05:32 AM2/27/03

to

Alex Martelli <al...@aleax.it> wrote in
news:2Vl7a.329137$AA2.12...@news2.tin.it:

> Yes, and I think it's the most Pythonic approach. However, it
> would not impress people -- it's too clear and understandable,
> so everybody can see at once what it does, and nobody will be
> oohing and aahing about how "kewl" Python is.
>
> More impressive...:
>
> appenders = tl.append, fl.append
> for e in lst: appenders[not fn(e)](e)
>
> now isn't THAT more like it...?
>

Kewl, but still maintaing a semblance of readability might be:

>>> test = range(10)
>>> def odd(x): return x%2 != 0

>>> fl = []
>>> tl = [ x for x in test if odd(x) or fl.append(x) ]
>>> tl
[1, 3, 5, 7, 9]
>>> fl
[0, 2, 4, 6, 8]
>>>

--
Duncan Booth dun...@rcp.co.uk
int month(char *p){return(124864/((p[0]+p[1]-p[2]&0x1f)+1)%12)["\5\x8\3"
"\6\7\xb\1\x9\xa\2\0\4"];} // Who said my code was obscure?

Scott Ransom

unread,

Feb 27, 2003, 10:18:03 AM2/27/03

to

"Ferenczi Viktor" <c...@cx.hu> wrote in message news:<mailman.104632129...@python.org>...

> Are there any fast, simple and elegant method to split a list by a filter
> function?

While not precisely what you are asking, if your lists are large
and consist of numbers, you may want to use Numeric.

> # Common definitions:
> lst=range(10)
> def fn(x): return x<5

import Numeric
arr = arange(10)
# or, converting from a list: arr = Numeric.asarray(lst)
tl = Numeric.compress(fn(x), arr)
# or
tl = Numeric.compress(arr<5, arr)
fl = Numeric.compress(arr>5, arr)

You can convert the resulting arrays back to lists, if required,
by using:

tl = tl.tolist()
fl = fl.tolist()

Scott

Anton Vredegoor

unread,

Feb 27, 2003, 10:23:22 AM2/27/03

to

On Thu, 27 Feb 2003 04:10:17 +0100, "Ferenczi Viktor" <c...@cx.hu>
wrote:

>I need something like: tl,fl=lst.split(fn)

It's possible to subclass builtin types.
See some example code below.

Regards,
Anton.

class splittable(list):

def split(self,fn):
tl = splittable()
fl = splittable()
for x in self:
if fn(x):
tl.append(x)
else:
fl.append(x)
return tl,fl

def test():
lst=splittable(range(10))

def fn(x):
return x<5

tl,fl = lst.split(fn)
print tl,fl

if __name__=='__main__':
test()

A. Lloyd Flanagan

unread,

Feb 27, 2003, 11:26:48 AM2/27/03

to

"Ferenczi Viktor" <c...@cx.hu> wrote in message news:<mailman.104632129...@python.org>...

> Are there any fast, simple and elegant method to split a list by a filter
> function?
>

Here's another interesting approach. I have no idea of the
performance implications, but since Set is hashed it might help:

from sets import Set
def fn(e):
return e < 5

x = Set([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
y = Set(filter(fn, x))
z = x - y
print y
print z

y = Set([1, 2, 3, 4])
z = Set([5, 6, 7, 8, 9, 10])

Of course this won't work on older python versions, or if your list
contains mutable objects.

Létezõ

unread,

Feb 27, 2003, 11:05:01 AM2/27/03

to

Dear List Splitters!

Thanks to Steven for the performance analysis and Alex & Duncan for the
their more impressive solutions. It would be interesting to insert the new
"impressive" solutions into the performace test loop. Which is the fastest?

Proposed solutions in reverse time order:

# Duncan modified by Viktor: (one name lookup moved out of the loop)
fl=[]
app=fl.append
tl = [ x for x in test if fn(x) or app(x) ]

# Duncan: (my current favourite)
fl=[]
tl = [ x for x in test if fn(x) or fl.append(x) ]

# Alex: (it's a good idea to move the lookups out)

appenders = tl.append, fl.append
for e in lst: appenders[not fn(e)](e)

# Viktor 1
tl=[e for e in lst if fn(e)]
fl=[e for e in lst if not fn(e)]

# Viktor 2

tl=[]
fl=[]
for e in lst:
if fn(e): tl.append(e)
else: fl.append(e)

# Viktor 3

cl=[(e,fn(e)) for e in lst]
tl=[e[0] for e in cl if e[1]]
fl=[e[0] for e in cl if not e[1]]

I would be useful to implement a sequence method to do this job at maximum
performance. It would be a "readibility improvement" such as enumerate() in
Python 2.3.

Best wishes: Viktor

Alex Martelli

unread,

Feb 27, 2003, 1:07:35 PM2/27/03

to

A. Lloyd Flanagan wrote:

> "Ferenczi Viktor" <c...@cx.hu> wrote in message
> news:<mailman.104632129...@python.org>...
>> Are there any fast, simple and elegant method to split a list by a filter
>> function?
>>
>
> Here's another interesting approach. I have no idea of the
> performance implications, but since Set is hashed it might help:

I doubt it will have optimal performance, because internally it's
still doing two loops on the input list x, but it can still be of
interest IF you don't care about the order of things in your
list (Set, like dict, doesn't care about items' order).

> from sets import Set
> def fn(e):
> return e < 5
>
> x = Set([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
> y = Set(filter(fn, x))
> z = x - y
> print y
> print z
>
> y = Set([1, 2, 3, 4])
> z = Set([5, 6, 7, 8, 9, 10])
>
> Of course this won't work on older python versions, or if your list
> contains mutable objects.

For Python versions < 2.3, you can just use dict directly, although
less elegantly -- e.g., to get list results (but not in order...):

y = dict( [(item,1) for item in x if fn(item)] )
z = [item for item in x if item not in y]
y = y.keys()

Alex

Steven Taschuk

unread,

Feb 27, 2003, 3:14:38 PM2/27/03

to

Quoth Létező:

> Thanks to Steven for the performance analysis and Alex & Duncan for the
> their more impressive solutions. It would be interesting to insert the new
> "impressive" solutions into the performace test loop. Which is the fastest?

Short and long list trials:

viktor1 9.56 9.17
viktor2 8.60 8.30
viktor2lookup 6.00 5.72
viktor3 12.08 14.19
duncan 8.70 8.49
duncanlookup 6.04 5.76
alex 6.47 6.18

(The difference between viktor2lookup and duncanlookup is within
experimental error. Other differences are significant.)

Code follows, so you can see how to do this yourself.

import time

def viktor1(lst, condition):
tl = [e for e in lst if condition(e)]
fl = [e for e in lst if not condition(e)]
return tl, fl

def viktor2(lst, condition):
tl = []
fl = []
for e in lst:
if condition(e):
tl.append(e)
else:
fl.append(e)
return tl, fl

def viktor2lookup(lst, condition):
tl=[]
fl=[]
addt = tl.append
addf = fl.append
for e in lst:
if condition(e):
addt(e)
else:
addf(e)
return tl, fl

def viktor3(lst, condition):
cl = [(e, condition(e)) for e in lst]
tl = [e[0] for e in cl if e[1]]
fl = [e[0] for e in cl if not e[1]]
return tl, fl

def duncan(lst, condition):
fl = []
tl = [x for x in lst if condition(x) or fl.append(x)]
return tl, fl

def duncanlookup(lst, condition):
fl = []
addf = fl.append
tl = [x for x in lst if condition(x) or addf(x)]
return tl, fl

def alex(lst, condition):
tl = []
fl = []

appenders = tl.append, fl.append
for e in lst:

appenders[not condition(e)](e)
return tl, fl

def timetrial(splitfunc, condition, listsize, iterations):
lst = range(listsize)
start = time.clock()
for i in range(iterations):
splitfunc(lst, condition)
end = time.clock()
return end - start

if __name__ == '__main__':
for f in (viktor1, viktor2, viktor2lookup,
viktor3, duncan, duncanlookup, alex):
print '%-15s %6.2f %6.2f' % (f.__name__,
timetrial(f, lambda x: x < 5, 100, 5000),
timetrial(f, lambda x: x < 5, 5000, 100))

--
Steven Taschuk w_w
stas...@telusplanet.net ,-= U
1 1

Alex Martelli

unread,

Feb 27, 2003, 3:39:29 PM2/27/03

to

Steven Taschuk wrote:

> Quoth Létezõ:

>> Thanks to Steven for the performance analysis and Alex & Duncan for the
>> their more impressive solutions. It would be interesting to insert the
>> new "impressive" solutions into the performace test loop. Which is the
>> fastest?
>
> Short and long list trials:
>
> viktor1 9.56 9.17
> viktor2 8.60 8.30
> viktor2lookup 6.00 5.72
> viktor3 12.08 14.19
> duncan 8.70 8.49
> duncanlookup 6.04 5.76
> alex 6.47 6.18

Interestingly, my numbers are a bit different:

[alex@lancelot logs]$ python2.3 -O ti.py
viktor1 1.06 1.06
viktor2 0.78 0.77
viktor2lookup 0.66 0.62
viktor3 1.33 1.58
duncan 0.81 0.80
duncanlookup 0.70 0.67
alex 0.71 0.70
[alex@lancelot logs]$

This isn't a particularly hi-end machine -- it's a 1.2 GHz Athlon
PC with DDR 2100 RAM and Linux Mandrake 9.0, reasonably cheap even
when I bought it almost two years ago. This may perhaps indicate
that the best way to speed algorithms up is to purchase halfway
decent hardware no more than a couple of years old, and run a free
operating system &c on it...?-)

Anyway, on my machine "viktor2lookup" -- the simplest approach,
except that the append bound methods are looked up into local
variables just once before the loop -- is repeatably fastest
on this particular test.

Alex

Terry Reedy

unread,

Feb 27, 2003, 5:16:21 PM2/27/03

to

"Létező" <let...@fw.hu> wrote in message
news:mailman.1046363056...@python.org...

> Dear List Splitters!
>
> Thanks to Steven for the performance analysis and Alex & Duncan for
the
> their more impressive solutions. It would be interesting to insert
the new
> "impressive" solutions into the performace test loop. Which is the
fastest?
>
> Proposed solutions in reverse time order:

Here is another 'symmetrical' version

tl,fl = [],[]
ta,fa = tl.append, fl.append
for i in range(-5,5): ((i >=0) or fa(i)) and ta(i)

fl,tl
#([-5, -4, -3, -2, -1], [0, 1, 2, 3, 4])

but unless it surprised me by being faster, I also prefer

> # Duncan: (my current favourite)
> fl=[]
> tl = [ x for x in test if fn(x) or fl.append(x) ]

with fl.append factored out

Terry J. Reedy

Terry Reedy

unread,

Feb 27, 2003, 5:31:54 PM2/27/03

to

"Létező" <let...@fw.hu> wrote in message
news:mailman.1046363056...@python.org...

> # Duncan: (my current favourite)
> fl=[]
> tl = [ x for x in test if fn(x) or fl.append(x) ]

For those not liking list comps, the filter() equivelent is
fl=[]
fa=fl.append
tl = filter(lambda x: fn(x) or fa(x), test)

Terry J. Reedy

Létezõ

unread,

Feb 27, 2003, 5:51:03 PM2/27/03

to

Hi!

Thanks to Steven and Alex for the code and results. I've run Steven's code
on my machine: P4 Northwoord 2G (1.6G overclocked with a water cooler), 512M
RDRAM, ABIT TH7-II motherboard, WinXP Pro Eng, ActiveState Python 2.2.1:

Viktor:

viktor1 1.21 1.17
viktor2 0.86 0.84
viktor2lookup 0.78 0.74
viktor3 1.56 1.99
duncan 0.87 0.83
duncanlookup 0.76 0.71
alex 0.77 0.75

This is not intended to compare AMD Athlon and Intel P4, because background
processes can eat up time and the operating system are differ (WinXP /
Linux).

Viktor

----------------------------------------------------------------------------
---

Steven:

> > viktor1 9.56 9.17
> > viktor2 8.60 8.30
> > viktor2lookup 6.00 5.72
> > viktor3 12.08 14.19
> > duncan 8.70 8.49
> > duncanlookup 6.04 5.76
> > alex 6.47 6.18

----------------------------------------------------------------------------
---

Alex:

> viktor1 1.06 1.06
> viktor2 0.78 0.77
> viktor2lookup 0.66 0.62
> viktor3 1.33 1.58
> duncan 0.81 0.80
> duncanlookup 0.70 0.67
> alex 0.71 0.70

----------------------------------------------------------------------------
---

Steven Taschuk

unread,

Feb 27, 2003, 8:25:02 PM2/27/03

to

Quoth Alex Martelli:
[...]

> Interestingly, my numbers are a bit different:
>
> [alex@lancelot logs]$ python2.3 -O ti.py
> viktor1 1.06 1.06

[...]

> This isn't a particularly hi-end machine -- it's a 1.2 GHz Athlon
> PC with DDR 2100 RAM and Linux Mandrake 9.0, reasonably cheap even

> when I bought it almost two years ago. [...]

Ah, my machine's age is showing. (How embarrassing!) I'll see
your 1.2 GHz Athlon bought two years ago and raise you a 233 MHz
Pentium II bought three and a half years ago.

Of course, the more informative statistic is the ratios:

Steven Alex Viktor
viktor1 1.00 0.96 1.00 1.00 1.00 0.97
viktor2 0.90 0.87 0.74 0.73 0.71 0.69
viktor2lookup 0.63 0.60 0.62 0.58 0.64 0.61
viktor3 1.26 1.48 1.25 1.49 1.29 1.64
duncan 0.91 0.89 0.76 0.75 0.72 0.69
duncanlookup 0.63 0.60 0.66 0.63 0.63 0.59
alex 0.68 0.65 0.67 0.66 0.64 0.62

All agree on the broad comparisons:
1. viktor2lookup, duncanlookup, alex
2. viktor2, duncan
3. viktor1
4. viktor3

I imagine that more detailed comparisons are unsound: even
ignoring the hardware differences, we're all using different
versions of Python, Viktor isn't running Linux, and you and I are
probably using different versions of kernel, libraries, etc..

> Anyway, on my machine "viktor2lookup" -- the simplest approach,
> except that the append bound methods are looked up into local
> variables just once before the loop -- is repeatably fastest
> on this particular test.

As here.

--
Steven Taschuk stas...@telusplanet.net
"Telekinesis would be worth patenting." -- James Gleick

Brandon Beck

unread,

Feb 27, 2003, 11:06:14 PM2/27/03

to

Alex Martelli <al...@aleax.it> wrote in message news:<2Vl7a.329137$AA2.12...@news2.tin.it>...

> appenders = tl.append, fl.append
> for e in lst: appenders[not fn(e)](e)
>
> now isn't THAT more like it...?

When I read this, I was immediately reminded of one of Guido's essays
on optimization (http://www.python.org/doc/essays/list2str.html). One
of the takeaways I had from reading his optimization anecdote was that
if possible you should attempt to use built-in functions with implied
loops when possible. So I immediately thought this should be faster:

func = lambda e, a=[t1.append, f1.append]: a[not fn(e)](e)
map(func, lst)
return t1, f1

After rereading Guido's essay, I see that he says that you should only
favor a built-in construct with an implied loop in situations where
the function argument is also a python built-in, and not anything
containing python code. Now I don't claim to have any knowledge of
the inner workings of the python interpreter, but I was wondering why
this is the case? I understand that a lambda needs to be interpreted
where as C code doesn't, but what is it about that in combination with
map that makes it slower then your for loop version?

Brandon

Steve Holden

unread,

Feb 28, 2003, 7:35:46 AM2/28/03

to

"Brandon Beck" <bbe...@excite.com> wrote in message
news:3ac67cce.03022...@posting.google.com...

If map is calling a builtin then you get one C function calling another. If
map is calling a Python function then it has to set up the Python calling
environment in full generality, negating at least some of the time saved by
using the implicit looping.

regards
--
Steve Holden http://www.holdenweb.com/
Python Web Programming http://pydish.holdenweb.com/pwp/
Register for PyCon now! http://www.python.org/pycon/reg.html

ATOM

unread,

Mar 7, 2003, 10:06:30 AM3/7/03

to

"Ferenczi Viktor" <c...@cx.hu> wrote in message news:<mailman.104632129...@python.org>...

Hi Viktor,

I do not use filter, it is rather a kind of
a ? b : c known from C/C++,
but what about that:

=============================================================
lst = range(10)
tl = []
fl = []

f = lambda e: (e<5 and tl.append(e)) or (e>=5 and fl.append(e))
def mySplit(list):
for item in list:
f(item)
return tl, fl

mySplit(lst)
([0, 1, 2, 3, 4], [5, 6, 7, 8, 9])
=============================================================

Only in haskell-like-language we can write it more elegant, i suppose... :-)
Best greetings!
ATOM
(arthur...@web.de)