[Haskell-cafe] Improving *> and >> for Data.Sequence

35 views
Skip to first unread message

David Feuer

unread,
Nov 19, 2014, 1:22:34 AM11/19/14
to haskel...@haskell.org, R.Pat...@city.ac.uk
I'd like to define (*>) and (>>) for Data.Seq.Seq in a "clever" way, like replicate, but I'm a bit stuck. It kind of looks like this is the purpose behind the applicativeTree function, which bills itself as a generalization of replicateA, but something seems to have gotten stuck and the only time I see applicativeTree actually used is to define replicateA. With all the fancy nesting, I'm a bit lost as to how to go about this, and having only one example doesn't really help. Can someone help give me a clue?

Thanks,
David

Ross Paterson

unread,
Nov 19, 2014, 11:46:17 AM11/19/14
to Haskell Cafe
I don't think applicativeTree will do the job -- it assumes the argument
is a 2-3 tree (not a finger tree). The best I can think of is

xs *> ys = replicateSeq (Seq.length xs) ys

-- Concatenate n copies of xs
replicateSeq :: Int -> Seq a -> Seq a
replicateSeq n xs
| n == 0 = empty
| even n = nxs
| otherwise = xs >< nxs
where nxs = replicateSeq (n `div` 2) (xs >< xs)

I think it's O(log m*(log m + log n)), where m and n are the lengths of
the two sequences, which is certainly an improvement on O(mn).

Another way of doing replicateSeq would be

replicateSeq :: Int -> Seq a -> Seq a
replicateSeq n xs
| n == 0 = empty
| even n = half >< half
| otherwise = xs >< half >< half
where half = replicateSeq (n `div` 2) xs

I'm not sure which would give the most sharing.
_______________________________________________
Haskell-Cafe mailing list
Haskel...@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

David Feuer

unread,
Nov 19, 2014, 1:05:21 PM11/19/14
to Haskell Cafe, R.Pat...@city.ac.uk
Many thanks. I don't *think* it's ever been as bad as O(mn) (I'm pretty sure it's no worse than O(m log m log n) and it may well be better), but it's certainly not great for time and it's definitely not great for space. I believe both of your versions are essentially based on fast exponentiation, which was going to be my fall-back position barring something more magically good taking advantage of the tree structure somehow. I know there are some fancy versions of fast exponentiation to minimize multiplications, but any version thereof would be better than the current approach.

Ross Paterson

unread,
Nov 19, 2014, 1:50:20 PM11/19/14
to Haskell Cafe
On Wed, Nov 19, 2014 at 01:05:13PM -0500, David Feuer wrote:
> Many thanks. I don't *think* it's ever been as bad as O(mn) (I'm pretty
> sure it's no worse than O(m log m log n) and it may well be better),

Oh right, there are m appends, each O(log n).

David Feuer

unread,
Nov 19, 2014, 1:53:08 PM11/19/14
to Haskell Cafe
I meant O(m log (mn)) = O(m (log m + log n)), because there are m appends, building up from O(n) to O(mn), but it really doesn't matter because we can easily do better.

Ross Paterson

unread,
Nov 19, 2014, 2:02:26 PM11/19/14
to haskel...@haskell.org
On Wed, Nov 19, 2014 at 01:53:02PM -0500, David Feuer wrote:
> I meant O(m log (mn)) = O(m (log m + log n)), because there are m appends,
> building up from O(n) to O(mn), but it really doesn't matter because we can
> easily do better.

Indeed it's moot, but appending a tree of size n to one of size mn
costs O(log n).

David Feuer

unread,
Nov 19, 2014, 2:58:54 PM11/19/14
to haskel...@haskell.org
On Wed, Nov 19, 2014 at 2:02 PM, Ross Paterson <R.Pat...@city.ac.uk> wrote:
On Wed, Nov 19, 2014 at 01:53:02PM -0500, David Feuer wrote:
> I meant O(m log (mn)) = O(m (log m + log n)), because there are m appends,
> building up from O(n) to O(mn), but it really doesn't matter because we can
> easily do better.

Indeed it's moot, but appending a tree of size n to one of size mn
costs O(log n).

Right, I forgot that. I got to looking at <*> just now, and it suggests the following question: is there a particularly efficient way to build a Seq when its ultimate size is known in advance, avoiding the usual incremental rebuilding?

Ross Paterson

unread,
Nov 19, 2014, 7:38:07 PM11/19/14
to haskel...@haskell.org
On Wed, Nov 19, 2014 at 02:58:46PM -0500, David Feuer wrote:
> I got to looking at <*> just now, and it suggests the
> following question: is there a particularly efficient way to build a Seq when
> its ultimate size is known in advance, avoiding the usual incremental
> rebuilding?

The following avoids the rebuilding, but I haven't tweaked or timed it:

fromList' :: [a] -> Seq a
fromList' xs = Seq $ mkTree (Data.List.length xs) 1 $ map Elem xs

mkTree :: Int -> Int -> [a] -> FingerTree a
mkTree n size xs
| n == 0 = Empty
| n == 1 = let [x1] = xs in Single x1
| n < 6 = let (l, r) = Data.List.splitAt (n `div` 2) xs in
Deep totalSize (mkDigit l) Empty (mkDigit r)
| otherwise = let
size' = 3*size
n' = (n-4) `div` 3
digits = n - n'*3
(l, rest) = Data.List.splitAt (digits `div` 2) xs
(nodes, r) = getNodes n' size' rest
in Deep totalSize (mkDigit l) (mkTree n' size' nodes) (mkDigit r)
where
totalSize = n*size

mkDigit :: [a] -> Digit a
mkDigit [x1] = One x1
mkDigit [x1, x2] = Two x1 x2
mkDigit [x1, x2, x3] = Three x1 x2 x3
mkDigit [x1, x2, x3, x4] = Four x1 x2 x3 x4

getNodes :: Int -> Int -> [a] -> ([Node a], [a])
getNodes n _ xs
| n == 0 = ([], xs)
getNodes n size (x1:x2:x3:xs) = (Node3 size x1 x2 x3:ns, ys)
where (ns, ys) = getNodes (n-1) size xs

David Feuer

unread,
Nov 19, 2014, 8:26:58 PM11/19/14
to haskel...@haskell.org

On Nov 19, 2014 7:38 PM, "Ross Paterson" <R.Pat...@city.ac.uk> wrote:
>
> On Wed, Nov 19, 2014 at 02:58:46PM -0500, David Feuer wrote:
> > I got to looking at <*> just now, and it suggests the
> > following question: is there a particularly efficient way to build a Seq when
> > its ultimate size is known in advance, avoiding the usual incremental
> > rebuilding?
>
> The following avoids the rebuilding, but I haven't tweaked or timed it:

I don't know how well this will work for fromList, but it looks like it will almost certainly be good for <*> and *>. I'll try it out.

Ross Paterson

unread,
Nov 20, 2014, 11:00:29 AM11/20/14
to haskel...@haskell.org
On Thu, Nov 20, 2014 at 12:37:49AM +0000, Ross Paterson wrote:
> On Wed, Nov 19, 2014 at 02:58:46PM -0500, David Feuer wrote:
> > I got to looking at <*> just now, and it suggests the
> > following question: is there a particularly efficient way to build a Seq when
> > its ultimate size is known in advance, avoiding the usual incremental
> > rebuilding?
>
> The following avoids the rebuilding, but I haven't tweaked or timed it:
>
> [...]

Actually this is pretty much what the existing fromList2 does.

David Feuer

unread,
Nov 21, 2014, 12:17:40 PM11/21/14
to haskel...@haskell.org
On Thu, Nov 20, 2014 at 11:00 AM, Ross Paterson <R.Pat...@city.ac.uk> wrote:
On Thu, Nov 20, 2014 at 12:37:49AM +0000, Ross Paterson wrote:
> On Wed, Nov 19, 2014 at 02:58:46PM -0500, David Feuer wrote:
> > I got to looking at <*> just now, and it suggests the
> > following question: is there a particularly efficient way to build a Seq when
> > its ultimate size is known in advance, avoiding the usual incremental
> > rebuilding?
>
> The following avoids the rebuilding, but I haven't tweaked or timed it:
>
> [...]

Actually this is pretty much what the existing fromList2 does.

I think the technique fromList2 uses is probably sub-optimal for <*>, because it steps through things in order. The ends of fs <*> xs don't depend on the middle of f. It should be better, I think, to delay actually touching that middle until it's actually demanded.

David Feuer

unread,
Nov 21, 2014, 2:00:23 PM11/21/14
to haskel...@haskell.org
To be precise, I *think* using the fromList approach for <*> makes us create O(n) thunks in order to extract the last element of the result. If we build the result inward, I *think* we can avoid this, getting the last element of the result in O(1) time and space. But my understanding of this data structure remains primitive.

Ross Paterson

unread,
Nov 22, 2014, 5:49:35 AM11/22/14
to haskel...@haskell.org
On Fri, Nov 21, 2014 at 02:00:16PM -0500, David Feuer wrote:
> To be precise, I *think* using the fromList approach for <*> makes us create O
> (n) thunks in order to extract the last element of the result. If we build the
> result inward, I *think* we can avoid this, getting the last element of the
> result in O(1) time and space. But my understanding of this data structure
> remains primitive.

This modification of the previous should do that.

mult :: Seq (a -> b) -> Seq a -> Seq b
mult sfs sxs = fromTwoLists (length sfs * length sxs) ys rev_ys
where
fs = toList sfs
rev_fs = toRevList sfs
xs = toList sxs
rev_xs = toRevList sxs
ys = [f x | f <- fs, x <- xs]
rev_ys = [f x | f <- rev_fs, x <- rev_xs]

-- toRevList xs = toList (reverse xs)
toRevList :: Seq a -> [a]
toRevList = foldl (flip (:)) []

-- Build a tree lazy in the middle, from a list and its reverse.
--
-- fromTwoLists (length xs) xs (reverse xs) = fromList xs
--
-- Getting the kth element from either end involves forcing the lists
-- to length k.
fromTwoLists :: Int -> [a] -> [a] -> Seq a
fromTwoLists len_xs xs rev_xs =
Seq $ mkTree2 len_xs 1 (map Elem xs) (map Elem rev_xs)

-- Construct a fingertree from the first n elements of xs.
-- The arguments must satisfy n <= length xs && rev_xs = reverse xs.
-- Each element of xs has the same size, provided as an argument.
mkTree2 :: Int -> Int -> [a] -> [a] -> FingerTree a
mkTree2 n size xs rev_xs
| n == 0 = Empty
| n == 1 = let [x1] = xs in Single x1
| n < 6 = let
nl = n `div` 2
l = Data.List.take nl xs
r = Data.List.take (n - nl) rev_xs
in Deep totalSize (mkDigit l) Empty (mkRevDigit r)
| otherwise = let
size' = 3*size
n' = (n-4) `div` 3
digits = n - n'*3
nl = digits `div` 2
(l, xs') = Data.List.splitAt nl xs
(r, rev_xs') = Data.List.splitAt (digits - nl) rev_xs
nodes = mkNodes size' xs'
rev_nodes = mkRevNodes size' rev_xs'
in Deep totalSize (mkDigit l) (mkTree2 n' size' nodes rev_nodes) (mkRevDigit r)
where
totalSize = n*size

mkDigit :: [a] -> Digit a
mkDigit [x1] = One x1
mkDigit [x1, x2] = Two x1 x2
mkDigit [x1, x2, x3] = Three x1 x2 x3
mkDigit [x1, x2, x3, x4] = Four x1 x2 x3 x4

-- length xs <= 4 => mkRevDigit xs = mkDigit (reverse xs)
mkRevDigit :: [a] -> Digit a
mkRevDigit [x1] = One x1
mkRevDigit [x2, x1] = Two x1 x2
mkRevDigit [x3, x2, x1] = Three x1 x2 x3
mkRevDigit [x4, x3, x2, x1] = Four x1 x2 x3 x4

mkNodes :: Int -> [a] -> [Node a]
mkNodes size (x1:x2:x3:xs) = Node3 size x1 x2 x3:mkNodes size xs

-- length xs `mod` 3 == 0 =>
-- mkRevNodes size xs = reverse (mkNodes size (reverse xs))
mkRevNodes :: Int -> [a] -> [Node a]
mkRevNodes size (x3:x2:x1:xs) = Node3 size x1 x2 x3:mkRevNodes size xs

Ross Paterson

unread,
Nov 22, 2014, 8:16:35 AM11/22/14
to haskel...@haskell.org
On Sat, Nov 22, 2014 at 10:49:18AM +0000, Ross Paterson wrote:
> -- Construct a fingertree from the first n elements of xs.
> -- The arguments must satisfy n <= length xs && rev_xs = reverse xs.
> -- Each element of xs has the same size, provided as an argument.

Sorry, that should read:

-- The arguments must satisfy n <= length xs && take n rev_xs = reverse (take n xs).

David Feuer

unread,
Nov 22, 2014, 12:57:25 PM11/22/14
to haskel...@haskell.org

The ideal goal, which has taken me forever to identify and which may well be unattainable, is to get O(log(min{i,mn-i})) access to each element of the result, while maintaining O(mn) time to force it entirely. Each of these is possible separately, of course. To get them both, if it's possible, we need to give up on the list-like approach and start splitting Seqs instead of lists. As we descend, we want to pass a single thunk to each element of each Digit to give it just enough to do its thing. Representing the splits efficiently and/or memoizing them could be a bit of a challenge.

David Feuer

unread,
Nov 22, 2014, 10:10:53 PM11/22/14
to haskel...@haskell.org
OK, so I've thought about this some more. I think the essential *concept* I want is close to this, but it won't quite work this way:

fs <*> xs = equalJoin $ fmap (<$> xs) fs

equalJoin :: Int -> Seq (Seq a) -> Seq a
equalJoin n s
  | length s <= 2*n = simpleJoin s
  | otherwise       = simpleJoin pref ><
                      equalJoin (2*n) mid ><
                      simpleJoin suff
  where (pref, s')  = splitAt n s
        (mid, suff) = splitAt (length s - 2*n) s'

simpleJoin :: Seq (Seq a) -> Seq a
simpleJoin s
  | null s = empty
  | length s == 1 = index s 0
  | otherwise = simpleJoin front >< simpleJoin back
  where
    (front,back) = splitAt (length s `quot` 2) s

I think the reason this doesn't work is that >< is too strict. I believe the only potential way around this is to dig into the FingerTree representation and build the thing top-down. I still don't understand how (if at all) this can be done.

David Feuer

unread,
Nov 23, 2014, 12:08:05 AM11/23/14
to haskel...@haskell.org
OK, sorry for the flood of posts, but I think I've found a way to make that work. Specifically, I think I can write a three-Seq append that takes the total size and uses it to be as lazy as possible in the second of the three Seqs. I'm still working out the details, but I think it will work. It does the (possibly avoidable) rebuilding, but I *think* it's at least asymptotically optimal. Of course, if Ross Paterson can find something more efficient, that'd be even better.
Reply all
Reply to author
Forward
0 new messages