Measuring code complexity

59 views
Skip to first unread message

John Harrop

unread,
Oct 25, 2009, 10:08:23 PM10/25/09
to clo...@googlegroups.com
Reading http://www.paulgraham.com/power.html and specifically the section titled "Metrics" near the top, I realized that it would be very easy to calculate such metrics for Lisp code, and it took me literally only seconds to hack something in Clojure:

user=> (defn count-nodes [data] (if (clojure.contrib.core/seqable? data) (+ 1 (reduce + 0 (map count-nodes data))) 1))
#'user/count-nodes
user=> (count-nodes '(+ 3 (* 2 7)))
7

Each node of the AST is counted: 1 for the argument and if the argument is seqable the sum of count-nodes applied recursively to the child nodes. The example uses a simple math calculation and correctly counts seven nodes: five leaves (three integer literals and two symbols) and two non-leaf nodes (one for the product and one for the sum).

The count-nodes function itself has a code complexity of 22 by the same measure:

user=> (count-nodes '(defn count-nodes [data] (if (clojure.contrib.core/seqable? data) (+ 1 (reduce + 0 (map count-nodes data))) 1)))
22

An alternative version also has 22:

user=> (count-nodes '(defn count-nodes [data] (+ 1 (if (clojure.contrib.core/seqable? data) (reduce + 0 (map count-nodes data)) 0))))
22

The alternative version might be considered preferable for other reasons; the + 1 for the node passed to the current call is in a single place instead of repeated. On the other hand it always does an addition even with a leaf node, so is slightly less processor-efficient.

Both versions can be reduced to 21 by noting that the 0 in the reduce's argument list is optional. The first version can be reduced further though by exploiting this argument to reduce:

user=> (count-nodes '(defn count-nodes [data] (if (clojure.contrib.core/seqable? data) (reduce + 1 (map count-nodes data))) 1))
19

The downside is that the relationship between the two 1s is even more obscured.

The drop from 22 to 19 exactly corresponds to the complexity cost of adding something to something: since (+ x y) adds a +, a y, and an invocation to just x, that cost is 3. This doesn't depend on whether x is a complicated subexpression or just a constant.

Perhaps something like count-nodes would be a better metric for clojure golf than counting characters. The effects include that use of #(foo) has no benefit (the expanded (fn [] (foo)) will be seen by count-nodes regardless) and there's no penalty for using meaningful identifiers, but there are still big wins from using higher-order functions, macros, and other potent abstractions.

Consider this alternative implementation of count-nodes:

(defn count-nodes [data]
  (loop [s [data] n 0]
    (if (empty? s)
      n
      (let [e (first data) r (rest data) x (inc n)]
        (if (clojure.contrib.core/seqable? e)
          (recur (concat r e) x)
          (recur r x))))))

It has, itself, 50 nodes, more than double the initial implementation using map and reduce and nearly triple the winner above! The imperative style also is full of bookkeeping complexity, such as managing an explicit stack, which is abstracted away in the map based implementations; the latter have the code focused mainly on the node-counting itself rather than on the gritty details of tree traversal and sum accumulation. The map versions are also easy to parallelize simply by changing "map" to "pmap"; parallelizing the imperative version requires explicit futures and in Java would furthermore require mucking about with parts of java.util.concurrent, probably with a LinkedBlockingDeque in there somewhere as well as a Future or two and various anonymous Runnables.

Even a plain old serial implementation in Java is bad:

public class NodeCounter {
    public static int countNodes (Object data) {
        if (!(data instanceof Iterable)) return 1;
        int result = 1;
        for (Object e : (Iterable)data) {
            result += countNodes(e);
        }
        return result;
    }
}

Of course this can't be applied to its own source code without first converting it into a parse tree whose non-leaf nodes are Iterable. I'll count by hand, using the rule that a keyword, identifier, operator, non-delimiter punctuation mark, or literal counts 1 and a balanced pair of delimiters counts 1 (I'll just count open delimiters and ignore close delimiters). The 1 for each semicolon replaces the () for each invocation in a Lisp, and the reduced count exploiting operator precedence to save parentheses or using combination operators like += may roughly balance the added count from type identifiers and keywords. I'll also count a cast as 1 rather than the 2 I'd get from counting separately the parenthesis pair and the enclosed type name. Still I get:

4  public class NodeCounter {
8      public static int countNodes (Object data) {
10         if (!(data instanceof Iterable)) return 1;
5          int result = 1;
8          for (Object e : (Iterable)data) {
6              result += countNodes(e);
0          }
3          return result;
0      }
0  }
------
44

This is actually slightly better than the loop/recur version in Clojure, despite the types everywhere, probably because of the for-each loop being a bit more abstract than loop/recur (a while with an explicit iterator would perhaps be a fairer comparison; and map + reduce is clearly much more abstract still) and the use of recursion. It would surely blow up to 60 or even 70 if we added explicit stack management to traverse the tree non-recursively and used a while loop with an explicit iterator.

The Java version has hidden complexities not counted here, too: explicit mutable state, in the form of the local variable "result", and a short-circuit return. As noted previously, parallelizing this would be much harder even than parallelizing even the loop-recur Clojure implementation.

This illustrates one of Graham's points about language expressiveness well, as well as pointing up two other advantages of Clojure: the easier parallelizability of more abstractly specified algorithms using "map" and from avoiding mutable state, and the benefit of Lisp's code-is-data philosophy, which makes it easy to make a Clojure function to count AST nodes in Clojure code (a one-liner!!) and to apply it to Clojure source (just quote the argument), vs. the Java version not working on its own source code without adding in a significant amount of parsing code to turn a Java string containing source code into a count of terminal-tokens-minus-close-delimiters. (Even a Ruby version would have this problem!)

MarkSwanson

unread,
Oct 25, 2009, 11:56:17 PM10/25/09
to Clojure
That's interesting.
I ran this against a quoted Clojure fn of mine and received 92.

I'm curious (general Clojure question) about your use of the quoted
form. The Clojure docs state that this results in an unevaluated form,
but I had trouble finding more details on this. F.E. I'd like to run
count-nodes against a compiled fn that I've previously defined, but I
could not get an existing fn into quoted form - and (count-nodes a-fn)
always returns 1. Is using a macro the only way to get an expression's
unevaluated form?

Do you (or anyone) believe some general guidelines could be gathered
from running count-nodes against all of the individual fns in a
project. F.E. a number > X(?) for a fn means you might be using an
imperative style and should be looking at breaking up the fn or
looking for ways to use map/reduce/other idiomatic Clojure fns to
simplify your code?

If yes, maybe the people working on Clojure/Maven (or other Clojure
build/package projects) would be interested in making this one of the
standard reports.

Every time I engage a company for contract work I wonder what I'm in
for ( the same goes when I dive in to another open source project). I
think it's possible that viewing this metric by fn and by file would
give some insight.



John Harrop

unread,
Oct 26, 2009, 12:22:28 AM10/26/09
to clo...@googlegroups.com
On Sun, Oct 25, 2009 at 10:08 PM, John Harrop <jharr...@gmail.com> wrote:
4  public class NodeCounter {
8      public static int countNodes (Object data) {
10         if (!(data instanceof Iterable)) return 1;
5          int result = 1;
8          for (Object e : (Iterable)data) {
6              result += countNodes(e);
0          }
3          return result;
0      }
0  }
------
44

This is actually slightly better than the loop/recur version in Clojure, despite the types everywhere, probably because of the for-each loop being a bit more abstract than loop/recur (a while with an explicit iterator would perhaps be a fairer comparison; and map + reduce is clearly much more abstract still) and the use of recursion. It would surely blow up to 60 or even 70 if we added explicit stack management to traverse the tree non-recursively and used a while loop with an explicit iterator.

It's worse than I thought:

4  public class NodeCounter {
8      public static int countNodes (Object data) {
7          LinkedList stack = new LinkedList();
6          stack.add(data);
5          int result = 1;
9          while (!(stack.isEmpty())) {
3              result++;
8              Object front = stack.getFirst();
6              if (front instanceof Iterable) {
10                 Iterator i = ((Iterable)front).iterator();
6                  while (i.hasNext())
9                      stack.addFirst(i.next());
0              }
0          }
3          return result;
0      }
0  }
----
84

This is the closest possible Java to the loop-recur implementation. It's about 68% larger in complexity by the AST-size metric.

John Harrop

unread,
Oct 26, 2009, 12:31:54 AM10/26/09
to clo...@googlegroups.com
On Sun, Oct 25, 2009 at 11:56 PM, MarkSwanson <mark.sw...@gmail.com> wrote:
I'm curious (general Clojure question) about your use of the quoted
form. The Clojure docs state that this results in an unevaluated form,
but I had trouble finding more details on this. F.E. I'd like to run
count-nodes against a compiled fn that I've previously defined, but I
could not get an existing fn into quoted form - and (count-nodes a-fn)
always returns 1. Is using a macro the only way to get an expression's
unevaluated form?

Basically. Compiled functions don't have source. The clojure.contrib.repl-utils namespace has a (source foo) function, but this relies on .clj files in the classpath so won't work for functions defined at the REPL. It also uses println and returns nil; you'd have to rebind *out* to a pipe and scrape the text that comes out the other end of the pipe.
 
Do you (or anyone) believe some general guidelines could be gathered
from running count-nodes against all of the individual fns in a
project. F.E. a number > X(?) for a fn means you might be using an
imperative style and should be looking at breaking up the fn or
looking for ways to use map/reduce/other idiomatic Clojure fns to
simplify your code?

I'd expect that to depend on the function's responsibility, but 100 might be a reasonable guideline. Of course, it's sometimes useful to break a function up by using internal letfns in a bigger function -- I have one 300-line function in one project but it's mostly a bunch of internal letfns. The internal functions have no other use so don't need to be top-level, and making them top-level complicates things because a lot of them use the parent function's arguments and some computed values as a context; all of them would gain many more parameters, or else require top-level vars rebound by the parent function.

Every time I engage a company for contract work I wonder what I'm in
for ( the same goes when I dive in to another open source project). I
think it's possible that viewing this metric by fn and by file would
give some insight.

Try it and let us all know what you find.

Timothy Pratley

unread,
Oct 26, 2009, 12:45:19 AM10/26/09
to Clojure
This sounds like a neat idea to me. Maybe the way to get the
'complexity' is to calculate it at definition, this macro doesn't work
for obvious reasons:
(defmacro defnc
[n & body]
`(let [f# (defn ~n ~@body)]
(alter-meta! (var ~n) assoc :complexity (count-nodes ~body))
f#))
But I think something similar could do the trick?

Chouser

unread,
Oct 26, 2009, 1:09:17 AM10/26/09
to clo...@googlegroups.com
On Sun, Oct 25, 2009 at 10:08 PM, John Harrop <jharr...@gmail.com> wrote:
> Reading http://www.paulgraham.com/power.html and specifically the section
> titled "Metrics" near the top, I realized that it would be very easy to
> calculate such metrics for Lisp code, and it took me literally only seconds
> to hack something in Clojure:
>
> user=> (defn count-nodes [data] (if (clojure.contrib.core/seqable? data) (+
> 1 (reduce + 0 (map count-nodes data))) 1))
> #'user/count-nodes
> user=> (count-nodes '(+ 3 (* 2 7)))
> 7

Fun!

> The count-nodes function itself has a code complexity of 22 by the same
> measure:
> user=> (count-nodes '(defn count-nodes [data] (if
> (clojure.contrib.core/seqable? data) (+ 1 (reduce + 0 (map count-nodes
> data))) 1)))

Well, speaking of higher-order functions and golf...

(count-nodes
'(defn count-nodes [data]
(count (tree-seq clojure.contrib.core/seqable? seq data))))

; 12

Unfortunately I don't think contrib's sequable is quite what we want:

(count-nodes "abcde")
; 6

This is a bit closer I think:

(count-nodes
'(defn count-nodes [data]
(count (tree-seq (partial instance? clojure.lang.Seqable) seq data))))
; 15

...though preferring 'partial' over #() is of debatable merit.

--Chouser

John Harrop

unread,
Oct 26, 2009, 2:05:42 AM10/26/09
to clo...@googlegroups.com
On Mon, Oct 26, 2009 at 12:45 AM, Timothy Pratley <timothy...@gmail.com> wrote:

This sounds like a neat idea to me. Maybe the way to get the
'complexity' is to calculate it at definition, this macro doesn't work
for obvious reasons:
(defmacro defnc
 [n & body]
 `(let [f# (defn ~n ~@body)]
    (alter-meta! (var ~n) assoc :complexity (count-nodes ~body))
    f#))
But I think something similar could do the trick?

Try

(defmacro defnc
 [n & body]
 `(let [f# (defn ~n ~@body)]
    (alter-meta! (var ~n) assoc :complexity (count-nodes (quote ~body)))
    f#))
 

John Harrop

unread,
Oct 26, 2009, 2:08:15 AM10/26/09
to clo...@googlegroups.com
On Mon, Oct 26, 2009 at 1:09 AM, Chouser <cho...@gmail.com> wrote:
   (count-nodes "abcde")
   ; 6 

Yikes.

Maybe special-casing strings would be best: change (seqable? x) into  (and (seqable? x) (not (string? x))). I don't know if Seqable is synonymous with that; might there be other seqable?s that aren't Seqables? DOM nodes and some other Java objects perhaps?

Meikel Brandmeyer

unread,
Oct 26, 2009, 3:21:56 AM10/26/09
to Clojure
Hi,

On Oct 26, 7:08 am, John Harrop <jharrop...@gmail.com> wrote:

> >    (count-nodes "abcde")
> >    ; 6
>
> Yikes.
>
> Maybe special-casing strings would be best: change (seqable? x) into  (and
> (seqable? x) (not (string? x))). I don't know if Seqable is synonymous with
> that; might there be other seqable?s that aren't Seqables? DOM nodes and
> some other Java objects perhaps?

Another problematic case might be maps. seq'ing a map gives
MapEntries, each with a key and a value. So you also count the
MapEntries. Eg. {:a :b :c :d} should give 7 instead of 5. Not tested,
though. I'm not sure how MapEntries are handled by sequable? and such.
(If they are treated opaque, you should even get 3 instead of 5...)

Sincerely
Meikel

Stuart Sierra

unread,
Oct 26, 2009, 9:20:56 AM10/26/09
to Clojure
On Oct 25, 11:56 pm, MarkSwanson <mark.swanson...@gmail.com> wrote:
> I'd like to run
> count-nodes against a compiled fn that I've previously defined, but I
> could not get an existing fn into quoted form

Can't be done. Once a fn is compiled, it's just Java bytecode.

-SS

kyle smith

unread,
Oct 26, 2009, 6:09:11 PM10/26/09
to Clojure
Rather than the number of nodes in a tree, I think a better metric
would be the number of edges in a graph. Variables that are
referenced on many different lines should get double counted. I think
this would explain why imperative spaghetti code is so complex. On
the other hand, I suspect functional code is generally more tree-like.

Daniel Simms

unread,
Oct 26, 2009, 5:52:58 PM10/26/09
to clo...@googlegroups.com
On Mon, Oct 26, 2009 at 6:20 AM, Stuart Sierra
<the.stua...@gmail.com> wrote:
> Can't be done.  Once a fn is compiled, it's just Java bytecode.

On the other hand, size of the generated byte code is moderately
interesting (as is runtime performance in time and heap).

I do *love* the fact that the analysis can be done in Clojure itself.
Can you imagine doing that in Java? Maybe that (how many lines to
analyze the source in the language itself) should be the power
meta-metric. Lisps clearly win big here.

John Harrop

unread,
Oct 27, 2009, 8:39:59 AM10/27/09
to clo...@googlegroups.com
A variable already gets counted everywhere the symbol for it appears in the code's AST.
Reply all
Reply to author
Forward
0 new messages