What's the efficient functional way to computing the average of a sequence of numbers?

536 views
Skip to first unread message

simon.T

unread,
Mar 29, 2012, 12:18:54 AM3/29/12
to clo...@googlegroups.com
The obvious way is like the following, which traverse the sequence 2 times.
Wondering what will be the efficient way...

(defn avg [coll]
  (/ (reduce + coll) (count coll)))

Linus Ericsson

unread,
Mar 29, 2012, 6:09:10 AM3/29/12
to clo...@googlegroups.com
or to increase a counter while reducing it, a function like inc+ returning {:sum sum :count count} and then take the sum/counter, which is the mean.

The problem is possible to state as a clean map-reduce problem with only one traversing of the data. It's also possible to remove items form the mean operation (ie, the problem is associative).

/Linus

2012/3/29 simon.T <simon...@gmail.com>

--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Rick Beerendonk

unread,
Mar 29, 2012, 6:04:38 AM3/29/12
to Clojure
> The obvious way is like the following, which traverse the sequence 2 times.
> Wondering what will be the efficient way...

(defn avg [coll]
(loop [c coll tot 0 cnt 0]
(if (empty? c)
(/ tot cnt)
(recur (rest c) (+ tot (first c)) (inc cnt)))))

This will loop only once.
It happens to be faster, but the difference is not explained by (count
coll) elimination only. Maybe reduce is a factor as well.

Rob Nagle

unread,
Mar 29, 2012, 6:28:48 AM3/29/12
to Clojure
You can reduce in one pass with a function that tracks both the sum
and the count.

(defn avg [coll]
(apply / (reduce (fn [[sum n] x] [(+ sum x) (inc n)]) [0 0] coll)))

This reduce function is somewhat unusual in that its arguments have
different forms. As a result, this one does require the initial-value
argument be used. It's set to [0 0] indicating the sum and count both
start at 0. The function then "consumes" the numbers in coll one at a
time, producing the running sum and count each time. Then we just
apply / to divide the sum by the count.

David Cabana

unread,
Mar 29, 2012, 1:18:37 PM3/29/12
to clo...@googlegroups.com
On Thu, Mar 29, 2012 at 12:18 AM, simon.T <simon...@gmail.com> wrote:
> The obvious way is like the following, which traverse the sequence 2 times.
> ...

The obvious way does not necessarily traverse the sequence twice. If
a sequence S satisfies the 'counted?' predicate, (count S) takes
constant time. In particular
user=> (counted? [:a :b :c])
true

user=> (counted? '(:a :b :c))
true

user=> (counted? {:a 1 :b 2 :c 3})
true

user=> (counted? #{:a :b :c})
true

The examples are stolen from:
http://clojuredocs.org/clojure_core/clojure.core/counted_q

So it is very likely that (/ (reduce + coll) (count coll)) will not
traverse 'coll' twice, and the natural way is the preferred way.
<Insert standard warning about premature optimization./>

Alan Malloy

unread,
Mar 29, 2012, 4:43:38 PM3/29/12
to Clojure
On Mar 29, 10:18 am, David Cabana <drcab...@gmail.com> wrote:
"Very likely" strikes me as a huge overstatement here. Most sequences
that you want to average won't be source-code literals, they'll be
lazy sequences, and those aren't counted.

David Cabana

unread,
Mar 29, 2012, 6:22:58 PM3/29/12
to clo...@googlegroups.com
> "Very likely" strikes me as a huge overstatement here. Most sequences
> that you want to average won't be source-code literals, they'll be
> lazy sequences, and those aren't counted

Point taken about lazy sequences. But the above was not intended to
suggest the sequence needs to be source code literal to satisfy
'counted?', rather that vectors, lists, maps, and sets do so. That
covers a fair bit of ground.

Benjamin Peter

unread,
Mar 30, 2012, 2:31:24 AM3/30/12
to Clojure
Hi,

On Mar 29, 10:43 pm, Alan Malloy <a...@malloys.org> wrote:
> "Very likely" strikes me as a huge overstatement here. Most sequences
> that you want to average won't be source-code literals, they'll be
> lazy sequences, and those aren't counted.

I think this topic is interesting. My guess would be, that the
sequence would have been traversed completely by the reduce call and
therefore clojure could know it's size and provide a constant time
count.

Could this be implemented? Is it?


regards,

Benjamin

Larry Travis

unread,
Mar 30, 2012, 2:01:52 PM3/30/12
to clo...@googlegroups.com, Larry Travis
I too think this is interesting because because it serves to illustrate some important general aspects of Clojure with a very simple problem.

I wrote four Clojure programs contrasting different ways of solving the problem, and then timed the application of each ten times to a million-item sequence mill-float-numbs of floating-point random numbers.  Here are the interesting results:

(defn average1
  [seq1]
  (/ (reduce + seq1) (count seq1)))

(defn average2
  [seq1]
  (loop [remaining (rest seq1)
            cnt 1
            accum (first seq1)]
    (if (empty? remaining)
      (/ accum cnt)
      (recur (rest remaining)
                (inc cnt)
                (+ (first remaining) accum)))))

(defn average3
  [seq1]
  (letfn [(count-add
             [ [cnt accum] numb]
             [(inc cnt) (+ accum numb)] ) ]
    (let [result-couple (reduce count-add [0 0] seq1)]
      (/ (result-couple 1) (result-couple 0)))))

(defn average4
  [seq1]
    (let [result-couple (reduce
                                  (fn [ [cnt accum] numb]
                                    [(inc cnt) (+ accum numb)] )
                                  [0 0]
                                  seq1)]
      (/ (result-couple 1) (result-couple 0))))

user=> (time (dotimes [i 10] (average1 mill-float-numbs)))
"Elapsed time: 526.674 msecs"

user=> (time (dotimes [i 10] (average2 mill-float-numbs)))
"Elapsed time: 646.608 msecs"

user=> (time (dotimes [i 10] (average3 mill-float-numbs)))
"Elapsed time: 405.484 msecs"

user=> (time (dotimes [i 10] (average4 mill-float-numbs)))
"Elapsed time: 394.31 msecs"

  --Larry

Stephen Compall

unread,
Mar 30, 2012, 2:34:00 PM3/30/12
to Benjamin Peter, Clojure
On Thu, 2012-03-29 at 23:31 -0700, Benjamin Peter wrote:
> the sequence would have been traversed completely by the reduce call
> and therefore clojure could know it's size and provide a constant time
> count.
>
> Could this be implemented?

Yes. You could probably implement it yourself, as a wrapper sequence
type, though this wouldn't make all other sequence types automatically
countable.

> Is it?

It won't be, in general, because you would have to "hold onto the head"
until you got to the end, or go through a costly half-broken weak
reference dance.

It is worth considering that even for some uncounted sequences, the cost
of a second traversal for "count" may be less than the bookkeeping cost
of keeping a count as you traverse once.

--
Stephen Compall
^aCollection allSatisfy: [:each|aCondition]: less is better

simon.T

unread,
Mar 30, 2012, 9:36:33 PM3/30/12
to clo...@googlegroups.com
Hi  Rob,

Appreciate it, I like the code and explanation, great!

Simon

Sean Corfield

unread,
Mar 30, 2012, 11:56:13 PM3/30/12
to clo...@googlegroups.com
On Fri, Mar 30, 2012 at 11:01 AM, Larry Travis <tra...@cs.wisc.edu> wrote:
> user=> (time (dotimes [i 10] (average1 mill-float-numbs)))
> "Elapsed time: 526.674 msecs"
>
> user=> (time (dotimes [i 10] (average2 mill-float-numbs)))
> "Elapsed time: 646.608 msecs"
>
> user=> (time (dotimes [i 10] (average3 mill-float-numbs)))
> "Elapsed time: 405.484 msecs"
>
> user=> (time (dotimes [i 10] (average4 mill-float-numbs)))
> "Elapsed time: 394.31 msecs"

I can understand the first one being "slow" but I'm a bit surprised
about the loop being the slowest of the four options. Can someone shed
some light on that?

Nice to see the accumulating reduce being faster since that's how I've
settled in to solving this kind of problem in our code, without really
wondering about performance (it's "fast enough" and I think it's the
more elegant solution).
--
Sean A Corfield -- (904) 302-SEAN
An Architect's View -- http://corfield.org/
World Singles, LLC. -- http://worldsingles.com/

"Perfection is the enemy of the good."
-- Gustave Flaubert, French realist novelist (1821-1880)

David Powell

unread,
Mar 31, 2012, 5:42:15 AM3/31/12
to clo...@googlegroups.com

As an aside:

Fingertrees are an interesting way to keep a collection that can efficiently compute means over its values, or a window of its values.


-- 
Dave

Larry Travis

unread,
Apr 28, 2012, 3:21:25 AM4/28/12
to clo...@googlegroups.com, Larry Travis
I have installed Leiningen not so much to manage projects but to enable use of clojure-jack-in as a means of getting Swank and Slime to work.  And they do work for me.  But now I have a question that I can't find an answer for in any  Leiningen documentation I know about. I have a largish, previously created group of clj local files that exist on my system and whose function definitions I want to use in my further work.  Is there a way to specify such local files in a project :dependencies declaration -- or do I have to do a load-file on each of them once I get a Slime REPL running for the project (or maybe insert them all into the src/core.clj file of the project and do a single load-file on it)?

That question is surely complicated enough to indicate how badly confused I am! Thanks for help.
  --Larry


Sean Corfield

unread,
Apr 28, 2012, 5:46:41 AM4/28/12
to clo...@googlegroups.com
On Sat, Apr 28, 2012 at 12:21 AM, Larry Travis <tra...@cs.wisc.edu> wrote:
> I have installed Leiningen not so much to manage projects but to enable use
> of clojure-jack-in as a means of getting Swank and Slime to work.  And they
> do work for me.  But now I have a question that I can't find an answer for
> in any  Leiningen documentation I know about. I have a largish, previously
> created group of clj local files that exist on my system and whose function
> definitions I want to use in my further work.  Is there a way to specify
> such local files in a project :dependencies declaration -- or do I have to
> do a load-file on each of them once I get a Slime REPL running for the
> project (or maybe insert them all into the src/core.clj file of the project
> and do a single load-file on it)?

Once possibility is to put all those files into a project together,
let's called it utilities and assume it has a 0.0.1-SNAPSHOT version.

Then you can run 'lein install' and it'll package them up and put them
in your local repository.

Then, wherever you want to use them in another project, just declare a
dependency on [utilities "0.0.1-SNAPSHOT"] and Leiningen will
automatically pull them in (from your local repo).

Then you just use/require the namespace(s) containing the relevant functions.

Does that help?

Larry Travis

unread,
Apr 28, 2012, 5:41:16 PM4/28/12
to clo...@googlegroups.com, larry Travis
Sean,
Your advice makes good sense, but I can't make it work. Per that advice,
I paste some of my function definitions into the core.clj file of the
project "prjctOne" and proceed thusly:

larrytravis$ lein install
Copying 1 file to /Users/larrytravis/prjctOne/lib
No namespaces to :aot compile listed in project.clj.
Created /Users/larrytravis/prjctOne/prjctOne-1.0.0-SNAPSHOT.jar
Wrote pom.xml
[INFO] Installing
/Users/larrytravis/prjctOne/prjctOne-1.0.0-SNAPSHOT.jar to
/Users/larrytravis/.m2/repository/prjctOne/prjctOne/1.0.0-SNAPSHOT/prjctOne-1.0.0-SNAPSHOT.jar

But when I create a new project prjctTwo, I can't figure out what
dependency declaration for it gives a prjctTwo Slime REPL access to the
functions in prjctOne. That is, what would correspond to the [utilities
"0.0.1-SNAPSHOT"] vector in your example?
--Larry

Sean Corfield

unread,
Apr 28, 2012, 6:07:18 PM4/28/12
to clo...@googlegroups.com
On Sat, Apr 28, 2012 at 2:41 PM, Larry Travis <tra...@cs.wisc.edu> wrote:
> Created /Users/larrytravis/prjctOne/prjctOne-1.0.0-SNAPSHOT.jar
...
> prjctOne.  That is, what would correspond to the [utilities
> "0.0.1-SNAPSHOT"] vector in your example?

Try [prjctOne "1.0.0-SNAPSHOT"]

Larry Travis

unread,
Apr 28, 2012, 7:42:30 PM4/28/12
to clo...@googlegroups.com, Larry Travis
Sean:
Your suggestion doesn't work. The Slime REPL comes up fine when I use
the dependency vector you suggest (some of the vectors I have tried
prevent the Swank server from starting), but the REPL doesn't know
anything about the functions defined in prjctOne or about the name-space
in which they exist.

So I guess for the nonce I'll just use load-file commands in the Slime
REPL to make use of my local stash of clj files, and I'll worry some
time later about getting Leiningen to construct dependencies on those
files. I'd rather spend my time on getting my programs to work than on
getting my programming environment to work.

If you get any more ideas about how I might try to solve my problem, let
me know. Thanks.
--Larry

Neale Swinnerton

unread,
Apr 28, 2012, 8:01:42 PM4/28/12
to clo...@googlegroups.com
leiningen relies on maven dependency resolution...

the dependency entry is of the form

[groupId/artifactId "version"]

You have the groupId and the artifactId both set to prjctOne. You can tell this from the path it's installed into your local repo. I believe this is default behaviour if you specify a simple project name in project.clj (i.e one without a /)

This is the key entry in your logging:

[INFO] Installing /Users/larrytravis/prjctOne/prjctOne-1.0.0-SNAPSHOT.jar to /Users/larrytravis/.m2/repository/prjctOne/prjctOne/1.0.0-SNAPSHOT/prjctOne-1.0.0-SNAPSHOT.jar

So you need...

[prjctOne/prjctOne "1.0.0-SNAPSHOT"]

Neale
{t: @sw1nn, w: sw1nn.com }




--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to

Larry Travis

unread,
Apr 29, 2012, 12:47:04 AM4/29/12
to clo...@googlegroups.com, Larry Travis
Neale:
Indeed, that's exactly the dependency vector I needed.  I'm impressed by your expertise. Thanks very much to both you and Sean.
  --Larry

Phil Hagelberg

unread,
Apr 30, 2012, 12:14:31 PM4/30/12
to clo...@googlegroups.com
Neale Swinnerton <ne...@isismanor.com> writes:

> So you need...
>
> [prjctOne/prjctOne "1.0.0-SNAPSHOT"]

Actually this is incorrect; you never need to specify the group ID if
it's the same as the artifact ID.

-Phil

Larry Travis

unread,
May 1, 2012, 12:28:45 AM5/1/12
to clo...@googlegroups.com, Larry Travis
Phil, Neale, Sean:
You guys are all way ahead of me as to why I am getting the results I am
getting, but it is only Neale's advice that works. That is

[prjctOne/prjctOne "1.0.0-SNAPSHOT"] works, but


[prjctOne "1.0.0-SNAPSHOT"] does not.

--Larry

Phil Hagelberg

unread,
May 1, 2012, 12:22:21 PM5/1/12
to clo...@googlegroups.com
On Mon, Apr 30, 2012 at 9:28 PM, Larry Travis <tra...@cs.wisc.edu> wrote:
> Phil, Neale, Sean:
> You guys are all way ahead of me as to why I am getting the results I am
> getting, but it is only Neale's advice that works.  That is
>
> [prjctOne/prjctOne "1.0.0-SNAPSHOT"]   works, but
>
>
> [prjctOne "1.0.0-SNAPSHOT"]  does not.

Interesting. I've never seen that behaviour before; it sounds like a bug.

I'm trying to reproduce this problem here but am unable to. Can you
provide steps to reproduce it locally? What version of Leiningen is
this?

-Phil

Larry Travis

unread,
May 2, 2012, 8:44:28 PM5/2/12
to clo...@googlegroups.com, Larry Travis
Phil:
I now can't get the behavior to reproduce either. I have no idea what
kind of dumb mistake I was making in the first place, and I'm very sorry
to have wasted your time. (For what it's worth, both dependency-vector
versions work in my reproduction attempts -- but you probably already
knew that they would!)
--Larry

Sean Corfield

unread,
May 3, 2012, 1:46:32 AM5/3/12
to clo...@googlegroups.com
On Wed, May 2, 2012 at 5:44 PM, Larry Travis <tra...@cs.wisc.edu> wrote:
> I now can't get the behavior to reproduce either.  I have no idea what kind
> of dumb mistake I was making in the first place, and I'm very sorry to have
> wasted your time. (For what it's worth, both dependency-vector versions work
> in my reproduction attempts -- but you probably already knew that they
> would!)

That makes me feel better because I couldn't understand why my
suggestion didn't work :)

Glad it's working for you now - welcome to the wonderful world of Leiningen!
Reply all
Reply to author
Forward
0 new messages