Moving Window Function

27 views
Skip to first unread message

Sean Devlin

unread,
Jun 23, 2009, 2:36:25 AM6/23/09
to Clojure
Hey all,
Does anyone know of a moving window function? I'm curious if there
are any tools like this for digital signals processing, 30-day moving
averages, etc.

Sean

Christophe Grand

unread,
Jun 23, 2009, 3:53:42 AM6/23/09
to clo...@googlegroups.com
If understand what you are looking for, I think partition can be useful:
(map avg (partition 30 1 daily-data))

Christophe
--
Professional: http://cgrand.net/ (fr)
On Clojure: http://clj-me.blogspot.com/ (en)

Sean Devlin

unread,
Jun 23, 2009, 8:53:14 AM6/23/09
to Clojure
Yes, this does exactly what I want. I should have thought of it in
the first place. Thank you very much.

On Jun 23, 3:53 am, Christophe Grand <christo...@cgrand.net> wrote:
> If understand what you are looking for, I think partition can be useful:
> (map avg (partition 30 1 daily-data))
>
> Christophe
>

Kyle Schaffrick

unread,
Jun 23, 2009, 10:20:52 AM6/23/09
to clo...@googlegroups.com

If you need weighted moving averages, this is a tad ugly but seems to
work ..

(defn weighted-moving-average*
"Generate a lazy sequence consisting of weighted moving averages
over the pre-partitioned input sequence. The weights parameter
should be a sequence of weights the same length as the partition
size."
[weights input]
(lazy-seq
(when-let [input (seq input)]
(let [weight-sum (apply + weights)
weighted-input-sum (apply + (map * weights (first input)))]
(cons (/ weighted-input-sum weight-sum)
(weighted-moving-average* weights (rest input)))))))

(defn weighted-moving-average
"Generate a lazy sequence consisting of weighted moving averages
over the input sequence. The weighting is given by weight-fn, having
a domain 0..1 which will be scaled across win-size elements to
calculate the weighting."
[weight-fn win-size input]
(let [weight-samples (reverse (for [t (range 1 (inc win-size))]
(/ t win-size)))
weights (map weight-fn weight-samples)]
(weighted-moving-average*
weights (partition win-size 1 input))))

-Kyle

Kyle Schaffrick

unread,
Jun 23, 2009, 10:33:03 AM6/23/09
to clo...@googlegroups.com
On Tue, 23 Jun 2009 10:20:52 -0400
Kyle Schaffrick <ky...@raidi.us> wrote:
>
> On Mon, 22 Jun 2009 23:36:25 -0700 (PDT), Sean Devlin
> <francoi...@gmail.com> wrote:
> > Hey all,
> > Does anyone know of a moving window function? I'm curious if there
> > are any tools like this for digital signals processing, 30-day
> > moving averages, etc.
>
> If you need weighted moving averages, this is a tad ugly but seems to
> work ..
>

Okay it is still a tad ugly but hopefully now it won't be because
the blasted mail client mutilated it :)

Christophe Grand

unread,
Jun 23, 2009, 12:23:01 PM6/23/09
to clo...@googlegroups.com
I think you could simplify your code by using map twice. What about: (untested)


 (defn weighted-moving-average
   "Generate a lazy sequence consisting of weighted moving averages
   over the input sequence. The weighting is given by weight-fn, having
   a domain 0..1 which will be scaled across win-size elements to
   calculate the weighting."
   [weight-fn win-size input]
   (let [weights (reverse (for [t (range 1 (inc win-size))]
                                   (weight-fn (/ t win-size))))
          weight-sum (reduce + weights)
          normalized-weights (map #(/ % weight-sum) weights)
          avg #(reduce + (map * normalized-weights %))]
     (map avg (partition win-size 1 input))))

btw, why the 'reverse in weights?

Kyle Schaffrick

unread,
Jun 23, 2009, 1:37:03 PM6/23/09
to clo...@googlegroups.com
On Tue, 23 Jun 2009 18:23:01 +0200
Christophe Grand <chris...@cgrand.net> wrote:

> I think you could simplify your code by using map twice. What about:
> (untested)
>
> (defn weighted-moving-average
> "Generate a lazy sequence consisting of weighted moving averages
> over the input sequence. The weighting is given by weight-fn,
> having a domain 0..1 which will be scaled across win-size elements to
> calculate the weighting."
> [weight-fn win-size input]
> (let [weights (reverse (for [t (range 1 (inc win-size))]
> (weight-fn (/ t win-size))))
> weight-sum (reduce + weights)
> normalized-weights (map #(/ % weight-sum) weights)
> avg #(reduce + (map * normalized-weights %))]
> (map avg (partition win-size 1 input))))

Hmm, yes that is certainly simpler.

This actually reminds me of something that I've "felt" for awhile but
haven't been able to articulate: A lot of functional programming
literature takes great pains to explain how higher-order functions,
lambda expressions, closures, etc work, and so I feel like I have a
fairly solid grasp of the mechanics of many functional concepts. I could
probably even give you a passable explanation for how to implement
lexical scoping.

But, I still sometimes find it challenging to spot patterns that can be
replaced and simplified with HOFs and other functional idioms, and I
haven't come across any books/guides that help hone one's intuition for
doing this. This is a good example because, while I have no trouble at
all understanding your implementation, I doubt I would have ever thought
of it myself!

I wonder if anyone has a guide or tips or rules of thumb to help with
this, or if it just takes experience...

As an aside, I also notice you prefer 'reduce to 'apply when using
arithmetic functions, yet I've seen both in the wild. I'm just guessing
you prefer to make it explicit that you're doing a reduction, but I
wonder if one is better than another?

>
> btw, why the 'reverse in weights?
>

I found it makes it easier to think about the weighting function if
(weight-fn 0) evaluates to the weight of the most recent sample (the
latest in the sequence), but the 'for comprehension generates them with
(weight-fn 0) as the first element in the sequence.

I suppose, it would be more efficient to have 'range generate the series
in reverse to begin with, wouldn't it? :)

-Kyle

Daniel Lyons

unread,
Jun 23, 2009, 2:02:47 PM6/23/09
to clo...@googlegroups.com

On Jun 23, 2009, at 11:37 AM, Kyle Schaffrick wrote:

> As an aside, I also notice you prefer 'reduce to 'apply when using
> arithmetic functions, yet I've seen both in the wild. I'm just
> guessing
> you prefer to make it explicit that you're doing a reduction, but I
> wonder if one is better than another?


That's an interesting question. I just benchmarked 'em and found this:

user> (time (reduce + (range 100000000)))
"Elapsed time: 4724.192 msecs"
4999999950000000

user> (time (apply + (range 100000000)))
"Elapsed time: 3018.139 msecs"
4999999950000000

I would have expected that 'apply would eventually blow up, because in
Common Lisp there is a maximum number of arguments. My second
intuition was that if '+/2 is inlined then the reduce should be
faster, taking advantage of the inlined code. Both my intuitions were
wrong and it looks like 'apply is notably faster.


Daniel Lyons

Christophe Grand

unread,
Jun 23, 2009, 2:41:10 PM6/23/09
to clo...@googlegroups.com
On Tue, Jun 23, 2009 at 8:02 PM, Daniel Lyons <fus...@storytotell.org> wrote:


On Jun 23, 2009, at 11:37 AM, Kyle Schaffrick wrote:

> As an aside, I also notice you prefer 'reduce to 'apply when using
> arithmetic functions, yet I've seen both in the wild. I'm just
> guessing
> you prefer to make it explicit that you're doing a reduction, but I
> wonder if one is better than another?


That's an interesting question. I just benchmarked 'em and found this:

user> (time (reduce + (range 100000000)))
"Elapsed time: 4724.192 msecs"
4999999950000000

user> (time (apply + (range 100000000)))
"Elapsed time: 3018.139 msecs"
4999999950000000


Hmm, no. apply should be slower since the var-arg body for + use reduce: http://github.com/richhickey/clojure/blob/b03e19aa341fea01c1279a74f4184f6538d0f72e/src/clj/clojure/core.clj#L556

user=> (dotimes [_ 10] (time (apply + (range 100000000))))
"Elapsed time: 2607.452135 msecs"
"Elapsed time: 2622.394746 msecs"
"Elapsed time: 2643.195364 msecs"
"Elapsed time: 2641.422793 msecs"
"Elapsed time: 2642.319066 msecs"
"Elapsed time: 2566.04743 msecs"
"Elapsed time: 2566.192631 msecs"
"Elapsed time: 2570.2414 msecs"
"Elapsed time: 2566.682218 msecs"
"Elapsed time: 2575.225692 msecs"
nil
user=> (dotimes [_ 10] (time (reduce + (range 100000000))))
"Elapsed time: 1935.942767 msecs"
"Elapsed time: 1964.796084 msecs"
"Elapsed time: 2321.384841 msecs"
"Elapsed time: 1946.584133 msecs"
"Elapsed time: 1987.454202 msecs"
"Elapsed time: 1954.452801 msecs"
"Elapsed time: 1945.051815 msecs"
"Elapsed time: 1962.773062 msecs"
"Elapsed time: 1974.039368 msecs"
"Elapsed time: 1966.895303 msecs"
nil

Phew! Experiment confirms theory ;-)

Kyle: On 'apply vs 'reduce, reduce is the rule (and reduce conveys more meaning both to the (human) reader and to the compiler/environment -- ie it can be optimized (see Rich's work on chunked seqs) and is already optimized for java arrays, lists, ranges and vectors).

One notable exception is 'str where 'apply is preferred because Java strings concatenation isn't efficient.

 
I would have expected that 'apply would eventually blow up, because in
Common Lisp there is a maximum number of arguments.


In Clojure apply is lazy but don't trust it to be strictly lazy:
user=> (apply (fn [& _]) (map #(doto % println) (range 10)))
0
1
nil


 
My second
intuition was that if '+/2 is inlined then the reduce should be
faster, taking advantage of the inlined code.

inlining is really interesting with primitive args.

(dotimes [_ 10] (time (reduce #(+ %1 %2) (range 100000000))))
"Elapsed time: 2388.735091 msecs"
"Elapsed time: 2462.450186 msecs"
"Elapsed time: 2427.287407 msecs"
"Elapsed time: 2435.715922 msecs"
"Elapsed time: 2383.38923 msecs"
"Elapsed time: 2404.126851 msecs"
"Elapsed time: 2390.338506 msecs"
"Elapsed time: 2399.320026 msecs"
"Elapsed time: 2375.095648 msecs"
"Elapsed time: 2459.116243 msecs"
nil

it's worst than + because it adds an indirection (the anonymous closure).

user=> (dotimes [_ 10] (time (reduce #(+ (long %1) (long %2)) (range 100000000))))
"Elapsed time: 2679.988029 msecs"
"Elapsed time: 1876.67846 msecs"
"Elapsed time: 1862.06131 msecs"
"Elapsed time: 1903.309569 msecs"
"Elapsed time: 1943.72106 msecs"
"Elapsed time: 1912.806865 msecs"
"Elapsed time: 1860.759747 msecs"
"Elapsed time: 1871.212752 msecs"
"Elapsed time: 1894.235929 msecs"
"Elapsed time: 1863.166198 msecs"
nil

seems marginally faster than plain + but it's not a good example because of the call to 'reduce that require the args and the return value to boxed. Not worth bothering.

hth,

Christophe

Christophe Grand

unread,
Jun 23, 2009, 3:03:53 PM6/23/09
to clo...@googlegroups.com
Hi Kyle!


On Tue, Jun 23, 2009 at 7:37 PM, Kyle Schaffrick <ky...@raidi.us> wrote:
But, I still sometimes find it challenging to spot patterns that can be
replaced and simplified with HOFs and other functional idioms, [...]


I wonder if anyone has a guide or tips or rules of thumb to help with
this, or if it just takes experience...

A simple advice: forbid yourself to use lazy-seq and cons. It's akin to tying your writing hand in your back :-)

Daniel Lyons

unread,
Jun 23, 2009, 3:14:58 PM6/23/09
to clo...@googlegroups.com
Wish it were so, because I like your theory better than my reality: :)

user>  (dotimes [_ 10] (time (apply + (range 100000000))))
"Elapsed time: 3306.787 msecs"
"Elapsed time: 2920.778 msecs"
"Elapsed time: 2976.063 msecs"
"Elapsed time: 2889.74 msecs"
"Elapsed time: 3005.505 msecs"
"Elapsed time: 3075.499 msecs"
"Elapsed time: 2898.838 msecs"
"Elapsed time: 2900.19 msecs"
"Elapsed time: 3003.037 msecs"
"Elapsed time: 2954.846 msecs"
nil

user>  (dotimes [_ 10] (time (reduce + (range 100000000))))
"Elapsed time: 3742.196 msecs"
"Elapsed time: 3711.182 msecs"
"Elapsed time: 3646.465 msecs"
"Elapsed time: 3652.413 msecs"
"Elapsed time: 3650.788 msecs"
"Elapsed time: 3648.032 msecs"
"Elapsed time: 3641.9 msecs"
"Elapsed time: 3735.167 msecs"
"Elapsed time: 3682.609 msecs"
"Elapsed time: 3647.555 msecs"

I agree that the apply version should be slower, but it doesn't seem to be the case on my computer. (I'm surprised at the variability though.) I'm running a Clojure version from a couple weeks ago on Apple's Java 1.6 if that lends a clue as to the discrepancy.

At this point I'd ask if anybody has a stylistic reason to prefer one over the other, since the benchmarks seem less reliable than usual.

— 
Daniel Lyons

Christophe Grand

unread,
Jun 23, 2009, 4:38:54 PM6/23/09
to clo...@googlegroups.com
On Tue, Jun 23, 2009 at 9:14 PM, Daniel Lyons <fus...@storytotell.org> wrote:

Wish it were so, because I like your theory better than my reality: :)


Drat! Stupid reality!

Now I have to bend it to make my theory valid.

range is wily, it bit me before. It's a specialized seq.

Can we try with a regular lazy-seq?

user=> (dotimes [_ 10] (time (apply + (take 1000000 (iterate inc 0)))))
"Elapsed time: 1033.9173 msecs"
"Elapsed time: 1123.099025 msecs"
"Elapsed time: 1393.07377 msecs"
"Elapsed time: 1091.260202 msecs"
"Elapsed time: 1031.521464 msecs"
"Elapsed time: 993.97605 msecs"
"Elapsed time: 964.909367 msecs"
"Elapsed time: 1004.735937 msecs"
"Elapsed time: 1113.822776 msecs"
"Elapsed time: 994.489104 msecs"
nil
user=> (dotimes [_ 10] (time (reduce + (take 1000000 (iterate inc 0)))))
"Elapsed time: 486.111713 msecs"
"Elapsed time: 295.508019 msecs"
"Elapsed time: 290.850443 msecs"
"Elapsed time: 291.644888 msecs"
"Elapsed time: 290.163834 msecs"
"Elapsed time: 298.254597 msecs"
"Elapsed time: 288.870373 msecs"
"Elapsed time: 314.193563 msecs"
"Elapsed time: 295.104755 msecs"
"Elapsed time: 288.079002 msecs"
nil


Does this hack make your reality valid too? ;-)

Daniel Lyons

unread,
Jun 23, 2009, 5:59:23 PM6/23/09
to clo...@googlegroups.com
It does:

user> (dotimes [_ 10] (time (apply + (take 100000 (iterate inc 0)))))
"Elapsed time: 131.61 msecs"
"Elapsed time: 139.876 msecs"
"Elapsed time: 180.685 msecs"
"Elapsed time: 221.148 msecs"
"Elapsed time: 313.682 msecs"
"Elapsed time: 188.298 msecs"
"Elapsed time: 180.442 msecs"
"Elapsed time: 380.812 msecs"
"Elapsed time: 241.149 msecs"
"Elapsed time: 199.781 msecs"
nil
user>  (dotimes [_ 10] (time (reduce + (take 100000 (iterate inc 0)))))
"Elapsed time: 79.235 msecs"
"Elapsed time: 50.107 msecs"
"Elapsed time: 41.763 msecs"
"Elapsed time: 47.521 msecs"
"Elapsed time: 42.862 msecs"
"Elapsed time: 40.48 msecs"
"Elapsed time: 42.939 msecs"
"Elapsed time: 45.12 msecs"
"Elapsed time: 40.321 msecs"
"Elapsed time: 42.979 msecs"

But if I try it with your original number (10x bigger than mine) I run out of Java heap space. Which didn't happen before either, because the lists weren't materialized.

So range is cheating here? How's that?

— 
Daniel Lyons

Christophe Grand

unread,
Jun 24, 2009, 2:20:17 AM6/24/09
to clo...@googlegroups.com
On Tue, Jun 23, 2009 at 11:59 PM, Daniel Lyons <fus...@storytotell.org> wrote:

But if I try it with your original number (10x bigger than mine) I run out of Java heap space. Which didn't happen before either, because the lists weren't materialized.

So range is cheating here? How's that?


Range, like most specialized seqs, aren't implemented as a (lazy) linked list. They are simply a reference to the parent collection and an index. There's no reference to the next element, a new next is built each time you call 'next.

user=> (let [s (seq (range 10))] (identical? (next s) (next s)))
false
user=> (let [s (seq [1 2 3])] (identical? (next s) (next s)))
false
user=> (let [s (seq {1 2 3 4})] (identical? (next s) (next s)))
false
user=> (let [s (seq "abc")] (identical? (next s) (next s)))
false
user=> (let [s (seq '(1 2 3 4))] (identical? (next s) (next s)))
true
user=> (let [s (seq (map identity (range 10)))] (identical? (next s) (next s)))
true
user=> (let [s (seq (take 10 (iterate inc 0)))] (identical? (next s) (next s)))
true

Hence keeping a reference to the head doesn't retain the next seqs in memory.
(apply + ...) must hold a reference to the head (don't know where) and thus crashes when the seq isn't a specialized seq.

 

Rich Hickey

unread,
Jun 24, 2009, 8:20:19 AM6/24/09
to clo...@googlegroups.com

apply retains the head because it calls through Java (IFn.applyTo)
where it is not possible to clear locals before tail calling, so the
head is on the stack.

I'm afraid everyone's gotten spoiled :), but apply does have the
general limitation that any args consumed must fit in memory.

Rich

Reply all
Reply to author
Forward
0 new messages