Code golf: splitting the list

18 views
Skip to first unread message

Ivan Tarasov

unread,
Aug 24, 2010, 5:03:08 AM8/24/10
to baha...@googlegroups.com
Okay, here's a simple problem:

Write a function, which splits a list into a list of lists, where the splits are between those elements, for which a relation function (given as an argument) returns True.

Function type: splitBetween :: (a -> a -> Bool) -> [a] -> [[a]]

Test 1 (correctness): splitBetween (\a b -> b-a > 10) [1,3,4,5,16,27,29,34,59,60]
Expected result: [[1,3,4,5],[16],[27,29,34],[59,60]]

Test 2 (lazyness): take 5 $ splitWhen2 (\a b -> b-a > 10) $ cycle [1,5,16,27,29,34,60]
Expected result: [[1,5],[16],[27,29,34],[60,1,5],[16]]

I needed that function in one of my projects, and I spent surprising amount of time today trying to write it (though I was trying hard to make it not too ugly). I'm still not quite satisfied with my solution:

splitBetween :: (a -> a -> Bool) -> [a] -> [[a]]
splitBetween p as = map (map fst) $ spanAll (not.snd) $ zip as (False : zipWith p as (tail as))

spanAll :: (a -> Bool) -> [a] -> [[a]]
spanAll _ [] = []
spanAll _ [a] = [[a]]
spanAll pred (l:ls@(_:_)) = let (as, bs) = span pred ls
                              in (l:as) : spanAll pred bs

Better ideas? :-)

Cheers,
Ivan

Aaron Culich

unread,
Aug 24, 2010, 9:59:50 AM8/24/10
to baha...@googlegroups.com
Here's a version that does splitWhen with keepDelimsR instead of dropDelims. A bit simpler, but I'm sure there must be even more concise versions? (btw: i assume your splitWhen2 for Test 2 was a typo and you meant splitBetween?)

import Data.List.Split

splitWhenKeepR :: (a -> Bool) -> [a] -> [[a]]
splitWhenKeepR = split . keepDelimsR . whenElt

splitBetween :: (a -> a -> Bool) -> [a] -> [[a]]
splitBetween p xs = map (map fst) $ splitWhenKeepR (\x -> snd x) $ 
                    zip xs $ zipWith p xs (tail xs) ++ [False]

-- Test 1 (correctness): splitBetween (\a b -> b-a > 10) [1,3,4,5,16,27,29,34,59,60]
-- Expected result: [[1,3,4,5],[16],[27,29,34],[59,60]]
-- Test 2 (lazyness): take 5 $ splitBetween (\a b -> b-a > 10) $ cycle [1,5,16,27,29,34,60]
-- Expected result: [[1,5],[16],[27,29,34],[60,1,5],[16]]


--
You received this message because you are subscribed to the Google Groups "Bay Area Haskell Users Group" group.
To post to this group, send email to baha...@googlegroups.com.
To unsubscribe from this group, send email to bahaskell+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/bahaskell?hl=en.

Ivan Tarasov

unread,
Aug 24, 2010, 2:08:16 PM8/24/10
to baha...@googlegroups.com
On Tue, Aug 24, 2010 at 6:59 AM, Aaron Culich <acu...@gmail.com> wrote:
(btw: i assume your splitWhen2 for Test 2 was a typo and you meant splitBetween?)

Yep, sorry, splitWhen2 was an original ugly name which I decided to change later.

Ivan

Evan Laforge

unread,
Aug 28, 2010, 11:14:33 PM8/28/10
to baha...@googlegroups.com
On Tue, Aug 24, 2010 at 2:03 AM, Ivan Tarasov <ivan.t...@gmail.com> wrote:
> Okay, here's a simple problem:
>
> Write a function, which splits a list into a list of lists, where the splits
> are between those elements, for which a relation function (given as an
> argument) returns True.
>
> Function type: splitBetween :: (a -> a -> Bool) -> [a] -> [[a]]

Here's what I came up with without first looking at your implementation:

splitBetween :: (a -> a -> Bool) -> [a] -> [[a]]

splitBetween f [] = []
splitBetween f xs@(x:_) = (x : map fst pre) : splitBetween f (map fst post)
where (pre, post) = break (\(x, y) -> f y x) (zip (drop 1 xs) xs)

I don't know if it's particularly better, but it seems to pass the
tests, and I can read it easier than the others. But of course I
would, since I wrote it :)

Evan Laforge

unread,
Aug 28, 2010, 11:39:05 PM8/28/10
to baha...@googlegroups.com
As long as we're code golfing, here's a question for y'all who have
been learning more about monad transformers than I have:

I have this function:

justm :: (Monad m) => m (Maybe a) -> (a -> m (Maybe b)) -> m (Maybe b)
justm op1 op2 = maybe (return Nothing) op2 =<< op1

It's used when I have a bunch of monadic operations that return
Maybes. However, what I really want is to layer on a MaybeT:

something :: (Monad m) => MyMonadT m (Maybe X)
something = Maybe.runMaybeT $ do
x <- op1
y <- op2 x
etc

Unfortunately without explicit lifting for 'op1' and 'op2' I can't do it:

Couldn't match expected type `MyMonadT m X'
against inferred type `Maybe.MaybeT m1 X'

Sticking lifts on everything is just about as much work as sticking
justms on everything. Now, I'm guessing I have to make some busywork
instances to do the automatic lifting, either on MyMonadT to
understand MaybeT or on MaybeT to understand MyMonadT. However, the
last time I wanted to do that I had to write a MyMonad class and then
everyone had to say '(MyMonad m) => m X' instead of '(Monad m) =>
MyMonadT m X'. But for some reason I can never keep all that
typeclass magic clear in my mind, so it's always lots of thrashing
around to get it to work, and if it involves changing a zillion
signatures... forget it.

This all makes me think it isn't worth the hassle and I should stick
with 'justm'. It's less elegant, but seems simpler and more
localized. This makes me wonder two things:

1 - Is the automatic lifting simpler than I think, and there's some
easy way to do it?

2 - Is there a way to generalize the 'justm' style of "monad
transformer"? Though I have a feeling like if I keep chasing that
tiger I'll just wind up with 'lift'.

3 - Why are monad transformers so hard? :P

Evan Laforge

unread,
Aug 28, 2010, 11:54:53 PM8/28/10
to baha...@googlegroups.com
I should mention also that even with explicit lifting, MaybeT is still
not Maybe, so everything that's not wrapped in 'lift' must be wrapped
in 'maybe (Maybe.MaybeT (return Nothing)) return' to promote Maybe to
MaybeT.

So, 'justm', even though it feels wrong from the theory point of view,
is looking more right from the practical point of view.

If someone were designing a language that had transformers from the
beginning, how would this look different? You can't very well insist
that every Maybe function return '(Monad m) => MaybeT m' unless
there's syntax to actually make that reasonable as a default.

Aaron Culich

unread,
Aug 29, 2010, 12:02:43 AM8/29/10
to baha...@googlegroups.com
Evan- The (zip (drop 1 xs) xs) in your implementation immediately caught my eye as a potential performance problem since each call scans the length of xs - 1 each time through. Sure enough, if I call it this way:

length $ take 5000 $ splitBetween (\a b -> b-a > 10) $ cycle [1,5,16,27,29,34,60]

then the runtime on my machine for your implementation is around 25-30 seconds, and just doubling it to 10000 it takes about 2.5 mins.

 Ivan's original implementation and my modifications are about the same runtime... it can handle 1 million easily at 4-5 seconds, 10 million takes around 45 secs; which is what you'd expect for this kind of operation.

Evan Laforge

unread,
Aug 29, 2010, 12:20:28 AM8/29/10
to baha...@googlegroups.com
On Sat, Aug 28, 2010 at 9:02 PM, Aaron Culich <acu...@gmail.com> wrote:
> Evan- The (zip (drop 1 xs) xs) in your implementation immediately caught my
> eye as a potential performance problem since each call scans the length of
> xs - 1 each time through. Sure enough, if I call it this way:

I'm not sure what you mean by "scans the length", since zip is lazy
and doesn't need to know the length of its arguments. However, what
mine does do is zip together the next element, then strip it off, then
zip it back on on the next call. So unless the compiler is being
clever, at the end of a long list, pulling an element off of the zip
is going to look like:

(x2, fst (x2, fst (x2, fst (x2, x1)))) ...

So I'm suggesting that this is the source of my version's
inefficiency. Unless I'm missing what you mean by "scans" up there?

Eric Paniagua

unread,
Aug 30, 2010, 4:06:29 AM8/30/10
to Bay Area Haskell Users Group
I'm new, so "Hi, everyone!" For some context, I'm relatively new to
Haskell, but not to programming. I've also toyed with FP for a bit
but haven't done anything too deep or serious yet.

This was a pretty fun problem. I tried it out before looking at the
various spoilers in this thread. I two goals in mind: 1) do it in one
pass, and 2) keep it simple.

Here's the first solution I wrote down:

splitBetween :: (a -> a -> Bool) -> [a] -> [[a]]
splitBetween p l
| null l = []
| otherwise = pre : (splitBetween p suf) where [pre, suf] =
splitBetweenFirst p l

splitBetweenFirst :: (a -> a -> Bool) -> [a] -> [[a]]
splitBetweenFirst p (x:y:xs)
| p x y = [x : pre, suf]
| otherwise = [[x], y:xs]
where [pre, suf] = splitBetweenFirst p (y:xs)
splitBetweenFirst _ L = [L, []]

It seemed more verbose/complicated than necessary, so I tried to clean
it up and eventually got it down to this:

splitBetween :: (a -> a -> Bool) -> [a] -> [[a]]

splitBetween p (x:y:xs)
| p x y = [x] : z
| otherwise = (x : (head z)) : (tail z)
where z = splitBetween p (y:xs)

splitBetween _ [x] = [[x]]
splitBetween _ [] = []

I like that form, but I wish those pesky special cases at the bottom
could be done away with. Of course, they can be absorbed into the
first equation with guards, but that's not very pretty or even
shorter. It's either a few lines longer, or a bunch of heads and
tails get introduced and muck things up. Anyone have an idea on this?

I profiled my solution so I could compare it to the benchmarks above.
I profiled against Ivan's solution on my machine. Looks like we both
have linear run times. To my surprise, mine takes almost exactly half
the time: (0.48, 5.04, 50.58) vs (1.08, 10.42, 105.12) in seconds for
1M, 10M, 100M respectively. On closer inspection, it looks to me like
Ivan's solution actually runs over the list twice: once in SpanAll,
and once in map (map fst). I'm not really used to dissecting these
things yet, so do you agree with that analysis?


PS: Aaron - I tried running your solution on my machine, but I wasn't
able to track down the Data.List.Split module, or at least the
functions you're using. I found a wiki page on it (http://
www.haskell.org/haskellwiki/Data.List.Split), but it has no
implementations for the functions you use. Could you tell me where to
find it?

Aaron Culich

unread,
Aug 30, 2010, 1:51:30 PM8/30/10
to baha...@googlegroups.com
Hi Eric- you can get Data.List.Split from here:


by running:

cabal install split


Evan Laforge

unread,
Sep 10, 2010, 3:00:58 AM9/10/10
to baha...@googlegroups.com
> So I'm suggesting that this is the source of my version's
> inefficiency.  Unless I'm missing what you mean by "scans" up there?

Coincidentally, I just needed this very function to detect consecutive
runs, so I went back to fix my version. Removing the extra zips and
maps gives:

split_between :: (a -> a -> Bool) -> [a] -> [[a]]
split_between _ [] = []
split_between f xs = pre : split_between f post
where (pre, post) = break_between f xs

break_between :: (a -> a -> Bool) -> [a] -> ([a], [a])
break_between f (x1 : xs@(x2:_))
| f x1 x2 = ([x1], xs)
| otherwise = let (pre, post) = break_between f xs in (x1 : pre, post)
break_between _ xs = (xs, [])

This one seems to split up a 5000 element list in a more acceptable time.

Interestingly, I tend to think in breaks while Ivan seems to think in spans.

Eric:

Our solutions are similar, except yours returns a pair as [x, y] and
uses head and tail. I always get suspicious when I see partial
functions like that in use. Returning a real pair will let the
typechecker prove that it's safe, but it means you need a helper
function. It also gets rid of your pesky special cases, but mostly,
if you never use head and tail you'll never get the dreaded
"Prelude.head: empty list".

Ivan Tarasov

unread,
Aug 19, 2011, 5:36:12 PM8/19/11
to Ilya Klyuchnikov, baha...@googlegroups.com
Unfortunately, that is incorrect. Assume the following example:

splitBetween (\a b -> b-a > 10) [1,3,4,5,12,27,29,34,59,60]

With your implementation it would return [[1,3,4,5],[12],[27,29,34],[59,60]], however the distance between 5 and 12 is less or equal to 10.

The reason for that is that the predicate may be non-transitive, as is the case, whereas the groupBy approach would have worked correctly if it was transitive, due to its implementation (groupBy gets the longest contiguous sublist which is "equivalent" to the first element of the sublist).

Ivan

On Wed, Aug 17, 2011 at 4:47 PM, Ilya Klyuchnikov <ilya.kly...@gmail.com> wrote:
import Data.List

splitBetween :: (a -> a-> Bool) -> [a] -> [[a]]
splitBetween p = groupBy (\x y -> not (p x y))

That's it!

Michael Katelman

unread,
Oct 2, 2011, 8:14:20 PM10/2/11
to Bay Area Haskell Users Group
I know this is an old-ish thread, but I just signed up for the list
today and wanted to contribute something! Here's a not-very-thoroughly-
tested potential solution:

predSplit p [] = []
predSplit p xs = foldr f [[]] xs
where f x ([]:ws) = [x]:ws
f x ( w:ws) =
if p x (head w)
then [x]:w:ws
else (x:w):ws

-Mike

Ivan Tarasov

unread,
Oct 25, 2011, 6:43:47 PM10/25/11
to baha...@googlegroups.com
Michael,

I have two objections:
  1. Your implementation does not pass the laziness test (see my original post).
  2. Your implementation is not particularly elegant. A small improvement could be done by using pattern guard instead of if/then/else, however it's still not too nice. I guess there's just no nice solution, which is a bit surprising.
Cheers,
Ivan

Reply all
Reply to author
Forward
0 new messages