[Haskell-cafe] Sliding Window functional data structure

38 views
Skip to first unread message

Richard O'Keefe

unread,
Aug 31, 2012, 12:45:27 AM8/31/12
to Haskell Cafe
Consider the following interface

type Ord k => Sliding_Window k v

entries :: Sliding_Window k v -> [(k,v)]
The cost is expected to be linear in the length of
the result. The pairs are listed in increasing
order of k.

add :: Ord k => k -> v -> Sliding_Window k v -> Sliding_Window k v
precondition: all (< k) [k' | (k',_) <- entries q]
The cost should be at most O((log . length . entries) q).
post: entries (add k v q) = entries q ++ [(k,v)]

since :: Ord k => k -> Sliding_Window k v -> [(k,v)]
answers [(k',v) | (k',v) <- entries q, k' > k]
The cost should be at most O((log . length . entries) q
+ length result)

purge :: Ord k => k -> Sliding_Window k v -> Sliding_Window k v
answers q' such that entries q' = [(k',v) | (k',v) <- entries q,
k' > k]
The cost should be at most O((log . length . entries) q
+ length [k' | (k',v) <- entries q,
k' <= k])

Ignoring costs, this can obviously be done trivially using
a list of pairs. Paying some attention to costs, it can also
be done using some sort of balanced search tree. The data
structure is close to a priority queue, but subtly different.

I believe I can see how to do this in an imperative language
using a Dijkstra-style array of pairs: add is amortised O(1)
using the well known doubling strategy, thanks to the strictly
increasing key requirement; since is a binary search followed
by a segment copy; purge is a binary search followed by nilling
out the now unwanted elements. By adapting the back-to-back
pair of lists implementation of queues, we can obviously do
add in O(1) and purge in O(1+#deleted items) time in a pure
functional language.

Thing is, there's a baffling array of data structures to examine
(AVL, RB, 2-3, 2-3-4, 1-2-brother, finger ... trees) and I wondered
if anyone had any idea what would be particularly good for this
rather asymmetric problem.



_______________________________________________
Haskell-Cafe mailing list
Haskel...@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Ross Paterson

unread,
Aug 31, 2012, 4:03:52 AM8/31/12
to haskel...@haskell.org
On Fri, Aug 31, 2012 at 05:45:27AM +0100, Richard O'Keefe wrote:
> Consider the following interface
>
> type Ord k => Sliding_Window k v
>
> entries :: Sliding_Window k v -> [(k,v)]
> The cost is expected to be linear in the length of
> the result. The pairs are listed in increasing
> order of k.
>
> add :: Ord k => k -> v -> Sliding_Window k v -> Sliding_Window k v
> precondition: all (< k) [k' | (k',_) <- entries q]
> The cost should be at most O((log . length . entries) q).
> post: entries (add k v q) = entries q ++ [(k,v)]
>
> since :: Ord k => k -> Sliding_Window k v -> [(k,v)]
> answers [(k',v) | (k',v) <- entries q, k' > k]
> The cost should be at most O((log . length . entries) q
> + length result)
>
> purge :: Ord k => k -> Sliding_Window k v -> Sliding_Window k v
> answers q' such that entries q' = [(k',v) | (k',v) <- entries q,
> k' > k]
> The cost should be at most O((log . length . entries) q
> + length [k' | (k',v) <- entries q,
> k' <= k])

Any search tree implementation will do add and purge in O(log n) time.
A finger tree will do add in O(1) and purge in O(log(min(r, n-r))) time,
where r in the length of the result.

{-# LANGUAGE MultiParamTypeClasses #-}

module SlidingWindow where

import Data.FingerTree
import Data.Foldable
import Data.Monoid

data Entry k v = Entry k v

data Max k = Bot | Lift k
deriving (Eq, Ord)

instance Ord k => Monoid (Max k) where
mempty = Bot
mappend = max

instance Ord k => Measured (Max k) (Entry k v) where
measure (Entry k _) = Lift k

newtype SlidingWindow k v = SW (FingerTree (Max k) (Entry k v))

entries :: SlidingWindow k v -> [(k,v)]
entries (SW t) = [(k, v) | Entry k v <- toList t]

emptySW :: Ord k => SlidingWindow k v
emptySW = SW empty

add :: Ord k => k -> v -> SlidingWindow k v -> SlidingWindow k v
add k v (SW t) = SW (t |> Entry k v)

since :: Ord k => k -> SlidingWindow k v -> [(k,v)]
since k = entries . purge k

purge :: Ord k => k -> SlidingWindow k v -> SlidingWindow k v
purge k (SW t) = SW (dropUntil (> Lift k) t)

o...@cs.otago.ac.nz

unread,
Aug 31, 2012, 8:24:40 AM8/31/12
to haskel...@haskell.org

> Any search tree implementation will do add and purge in O(log n) time.

Add's obvious, but could you explain to me about purge?
All of the explanations of search trees I'm familiar with,
if they bother to explain deletion at all, talk about how
to repair the balance of a tree after deleting *one* element.
It's not at all obvious to me how to quickly rebalance an
AVL tree after purging it, for example.

> A finger tree will do add in O(1) and purge in O(log(min(r, n-r))) time,

Thanks for that and the sample code.

Okasaki, Christopher D CIV USA USMA

unread,
Aug 31, 2012, 9:08:31 AM8/31/12
to haskel...@haskell.org

One possibility would be a random-access list (http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.55.5156), which is a kind of list of trees.

 

Then 'add' is trivially O(1) and 'since' is trivially O(length result).

 

The only tricky operation is purge, which naively would be O(N), where N is the length of entries.  But that can be reduced using a kind of lazy deletion.  Pair the random-access list with a single k, meaning "everything <= k is considered deleted, even if it is still physically present in the data structure".  Purge would update the k and scan the list of trees, removing any tree with a root <= k.  There are only a logarithmic number of trees, so this scan would be O(log N).  In fact, it could be made O(log #remaining entries) by noting that, when one such tree is found, the list can just be chopped off there because all later trees should also be removed.

 

The downside of lazy deletion is that you might use a bit more space than you need, because a portion of the last tree in the list might be logically deleted but still present.  I would expect this effect to be minor in practice.

 

Taking this approach, there is one situation that would require clarification, and possibly special handling: If you call purge with a k that is larger than the most recent add, thereby purging everything, are you later permitted to add a k' that is smaller than the purged k?  This might be disallowed by extending the precondition of add.  But if it is allowed, then should the new k' be affected by the previous purge, or not?  It's easy to implement either way; it just depends on how you want to specify the desired behavior.

Reply all
Reply to author
Forward
0 new messages