Modelling complex data structures (graphs and trees for example)

Colin Yates

unread,

Jun 16, 2011, 10:08:25 AM6/16/11

to Clojure

(newbie warning)

Our current solution is an OO implementation in Groovy and Java. We
have a (mutable) Project which has a DAG (directed acyclic graph).
This is stored as a set of nodes and edges. There are multiple
implementations of nodes (which may themselves be Projects). There
are also multiple implementations of edges.

My question isn't how to do this in a functional paradigm, my first
question is *how do I learn* to do this in a functional paradigm. I
want to be able to get the answer myself ;). To that end, are there
any "domain driven design with functional programming" type resources?

A more specific question is how do I model a graph? These graphs can
be quite extensive, with mutations on the individual nodes as well as
the structure (i.e. adding or removing branches). Does this mean that
every every node would be a ref? I think the general answer is that
the aggregate roots are refs, meaning they are an atomic block, but is
there any more guidance?

Raoul Duke

unread,

Jun 16, 2011, 1:28:29 PM6/16/11

to clo...@googlegroups.com

On Thu, Jun 16, 2011 at 7:08 AM, Colin Yates <colin...@gmail.com> wrote:
> (newbie warning)

> any "domain driven design with functional programming" type resources?

have you googled at all? seriously, graphs & fp are standard fare in
the fp world! i'd expect there to be a zillion docs about doing it in
other fp languages; haskell, lisp, scheme, ocaml, scala...

Andreas Liljeqvist

unread,

Jun 16, 2011, 1:59:39 PM6/16/11

to clo...@googlegroups.com

I would make the graph immutable.
If performance is a objective it might not work though.
My tip for learning it:
Sit down and think about the problem. Consider your Clojure structures.

2011/6/16 Colin Yates <colin...@gmail.com>

--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

lambdatronic

unread,

Jun 16, 2011, 2:02:16 PM6/16/11

to Clojure

My 2c:

Regarding learning how to model a complex data structure in a
functional paradigm:

I can think of few resources which sum up the proper mindset you
need to get into
better than the canonical Clojure essay on state and identity, found
here:

http://clojure.org/state

Regarding how to model a graph in Clojure:

I would suggest that you avoid thinking about refs unless you are
likely to need to
perform some kind of concurrent updates to your graph. A nested
associative data
structure should do the trick just fine.

(defrecord Node [foo bar baz])

(defn node [foo bar baz] (Node. foo bar baz))

(def the-graph {})

(defn add-node [g n]
(if (g n)
g
(assoc g n {:next #{} :prev #{}})))

(defn add-edge [g n1 n2]
(-> g
(add-node n1)
(add-node n2)
(update-in [n1 :next] conj n2)
(update-in [n2 :prev] conj n1)))

(defn remove-edge [g n1 n2]
(-> g
(add-node n1)
(add-node n2)
(update-in [n1 :next] disj n2)
(update-in [n2 :prev] disj n1)))

(defn remove-node [g n]
(if-let [{:keys [next prev]} (g n)]
((comp
#(dissoc % n)
#(reduce (fn [g* n*] (remove-edge g* n* n)) % prev)
#(reduce (fn [g* n*] (remove-edge g* n n*)) % next))
g)
g))

(defn contains-node? [g n]
(g n))

(defn contains-edge? [g n1 n2]
(get-in g [n1 :next n2]))

(defn next-nodes [g n]
(get-in g [n :next]))

;; Assumes DAG
(defn depth-first-search [g root-node goal?]
(loop [open-list (list root-node)]
(when-first [n open-list]
(if (goal? n)
n
(recur (concat (next-nodes g n) (rest open-list)))))))

And there you go. A fast, functional DAG implementation that doesn't
use any mutable state.

Happy hacking,
~Gary

Andreas Liljeqvist

unread,

Jun 17, 2011, 4:05:22 AM6/17/11

to clo...@googlegroups.com

Surely you must have rooted my box.
That is my code more or less :)

To the op:
Use the immutable structures if possible.
Make your types as basic as possible.
Use Clojure's higer order functions reduce, map etc....
I consider loop as a last resort.
Then your solution will closely match the problem.

2011/6/16 lambdatronic <gwjo...@uvm.edu>

James Keats

unread,

Jul 8, 2011, 3:57:06 PM7/8/11

to Clojure

May I humbly suggest that this ought to be a database-ish concern
rather than a middleware one? have you looked at neo4j for example? A
quick google found this:

http://wiki.neo4j.org/content/Roles

"This is an implementation of an example found in the article A Model
to Represent Directed Acyclic Graphs (DAG) on SQL Databases by Kemal
Erdogan. ... In Neo4j storing the roles is trivial, as working with
graphs is what Neo4j was designed for"

I would humbly suggest that you use as much of the database
functionality as possible for your data needs and avoid replicating it
in your middleware. I hope this works. :-)

James Keats

unread,

Jul 8, 2011, 4:03:44 PM7/8/11

to Clojure

On Jul 8, 8:57 pm, James Keats <james.w.ke...@gmail.com> wrote:
> On Jun 16, 3:08 pm, Colin Yates <colin.ya...@gmail.com> wrote:
> > (newbie warning)
>
> > Our current solution is an OO implementation in Groovy and Java. We
> > have a (mutable) Project which has a DAG (directed acyclic graph).
> > This is stored as a set of nodes and edges. There are multiple
> > implementations of nodes (which may themselves be Projects). There
> > are also multiple implementations of edges.
>
> > My question isn't how to do this in a functional paradigm, my first
> > question is *how do I learn* to do this in a functional paradigm. I
> > want to be able to get the answer myself ;). To that end, are there
> > any "domain driven design with functional programming" type resources?
>
> > A more specific question is how do I model a graph? These graphs can
> > be quite extensive, with mutations on the individual nodes as well as
> > the structure (i.e. adding or removing branches). Does this mean that
> > every every node would be a ref? I think the general answer is that
> > the aggregate roots are refs, meaning they are an atomic block, but is
> > there any more guidance?
>

I just noticed that you wanted to *learn" to do it yourself. Sorry, I
apologize if my reply misread your query. Regards. :-)

Colin Yates

unread,

Jul 9, 2011, 9:15:29 AM7/9/11

to clo...@googlegroups.com

I did think about moving this logic to the database, but I am toying around with a different model - having the entire data set in memory (possibly across multiple nodes using messaging infrastructure to communicate). The reason for this is:

- writes are very small but reads are very high

- each read typically requires complex processing

- most operations cover a large part of the entire dataset

Paying the cost of having the entire data set *efficiently* available for the application (Clojure in this case) means:

- less dependence on (probably hard to test) yet-another-bit-of-tech. Integration testing DAOs or Repositories always seems like a lot of work. Reducing the technical pieces just makes things much easier

- I am hoping clever use of persistent structures will help here, as there is a lot of commonality in the data itself (i.e. 5 projects might actually share 80% of the same state). Clever use in constructing these might pay dividends...

- I don't think I can offload *all* processing onto a third party technology so I need the ability to deal with large data sets in memory with real-time (whatever that means) - if I need it for one, I may as well use it for all.

Ambitious, and full of hairy concerns! But the idea of moving away from single-threaded web-based applications with big powerful data engines to a single chunk of logic that occasionally throws state to a fairly dumb persistent store is certainly not new ground, and seems to offer a much more powerful architecture.

For example, dealing with historical data is always a pain point. What I want is the ability to snapshot the entire system whenever anything changes, to allow us to see how the system (or client rather) has improved. In a relational database, this would be ridiculous, so I captured a "snapshot of interesting data". Tomorrow they realise that something else was interesting.... We also played with document stores (MongoDB) which makes the job much much smaller - just cloning a single document (and related data), but then it has to be hydrated, so for ease of use a snapshot is taken every X period, even if the data hasn't changed. Yuck.

Now Clojure appears, with its extremely efficient (in terms of memory) way of storing data, and suddenly it feels like storing a representation every time the structure changes (which is only once or twice a week) and then realising the entire history in memory is now do-able. This means if a Project only changed 5 times over a 3 month period there would only be 5 instances of that project in storage. Calculating how each project contributes to a historical chart broken down by day (or hour whatever) is much much easier to do in Java/Clojure/whatever than third party store of choice. I am asserting that providing a sequence for a project for every day over the last year when there are only 5 snapshots will certainly not consume sizeOfProject * daysInYear memory.

(Not sure that was the best example of the pain points I am trying to solve actually :), but anyway).

I guess, after 15 years of using the "web, app-logic, database" template-cutter I am giving myself a clean piece of paper and asking "what do you want to do and what is the simplest way to do it", and keeping everything in the application layer (rather than the persistence layer) seems appealing.

We aren't dealing with billions of rows - I still need to experiment, but it feels like having our entire data set in memory is possible on a fairly beefy server. I appreciate the JVM isn't the best wrt huge heaps, but I can work around that (with multiple virtual machines each running their own JVM and using ActiveMQ for example). Clojure's STM seems to be the final step on the ladder to reach this goal.

I have previously considered CouchDB (for its views), Hadoop (for its highly scalable and parallelisable map/reduce execution), Cassandra for its ability to store huge amounts of highly nested structure, Neo4j to store large numbers of small nodes that are heavily inter-related. And of course, MongoDB, which I am currently using in production. I also considered Erlang and Scala for their distributed VM actor models, but I am really really sold on the power of LISP macros.

I dunno - might be a fool's errand, but spreading the complexity over that much technology just seems like hard work. *If* the working set can be stored in current memory then I think a much simpler, and much more powerful solution will emerge. Sure, I am putting all my eggs in Clojure+my-own-ability, but at risk of re-inventing the wheel, but maybe that is the right thing to do - building the simplest and most elegant solution with new tools.

I probably ate something that disagreed with me, but I just want to break free from the shackles of these heavy-weight tools and fly! OK - that's enough.

Or, it might all be a catastrophic failure and I will be signing up to Careers 2.0 :)

Col

P.S> Usual disclaimer - still only written three lines of Clojure :)

James Keats

unread,

Jul 9, 2011, 11:25:53 AM7/9/11

to Clojure

Well if it's a project you own then you're free to do whatever you
want, but if you're only an employee then I urge you to consider
carefully what you're about to do, and be as conservative as you could
be about it. :-)

Colin Yates

unread,

Jul 9, 2011, 11:36:03 AM7/9/11

to clo...@googlegroups.com

he he :)

Well, conservative might be a run-of-the-mill Java/Spring/Hibernate application with all of that fun as those are the tools which I am most familiar with.

I am not going to type another long email, but it is interesting how people define "risk" and "conservative". I *do not* think "doing the same thing as always and hope for the best" is the right answer. If the question is "give us something that takes years to develop and can be maintained by a team of interchangable average devs doing the same old thing" then sure - this is not the answer - in any way :).

I think the question being asked is "can you provide a solution which allows us to respond very quickly to changing requirements that will be developed by yourself and whoever else you think you need" and just maybe this is the right answer...who knows - it is all an experiment.

I am very fortunate that I am paid to work in an organisation where any technical solution is evaluated based on "I know how to design, which are the right tools" rather than the *very* entrenched "I know some tools, what can I build with them". It might have something to do with me being the technical authority here...not sure :)!

Benny Tsai

unread,

Jul 9, 2011, 12:27:12 PM7/9/11

to clo...@googlegroups.com

Hi Colin,

Sorry, a bit late to the party here, but it might be worth taking a look at Jeffrey Straszheim's c.c.graph library to see one way of modeling DAG's and implementing various graph operations (such as topological sort and computing strongly connected components) in Clojure:

API: http://clojure.github.com/clojure-contrib/graph-api.html
Source: https://github.com/clojure/clojure-contrib/blob/master/modules/graph/src/main/clojure/clojure/contrib/graph.clj

Note that in the library, graphs are represented by a directed-graph struct (defined at the top of the source file) with two fields:

- nodes: a collection of the nodes in the graph
- neighbors: a function that takes a node and returns a collection of that node's neighbors

Since Clojure maps are also functions that will return the value associated with a key when called with the key, neighbors can simply be a map of nodes to collections of neighbors.

records are now recommended over structs, so it may be better to define a directed-graph record:

(defrecord directed-graph [nodes neighbors])

A graph (for example, a graph of two nodes :a and :b that are connected to each other) can then created via:

(def my-graph (directed-graph. [:a :b] {:a [:b], :b [:a]}))

records can be used in exactly the same way as structs, so this can be used right away with all the functions defined in the library.

Hope this helps!

Colin Yates

unread,

Jul 9, 2011, 12:29:03 PM7/9/11

to clo...@googlegroups.com

Nice link - many thanks

--

Benny Tsai

unread,

Jul 9, 2011, 12:31:36 PM7/9/11

to clo...@googlegroups.com

Oops, correction: since the library already defines a struct called directed-graph, it appears that you can't define a record of the same name. So it'll have to be called something else:

(defrecord graph [nodes neighbors])
(def my-graph (graph. [:a :b] {:a [:b], :b [:a]}))

Sean Corfield

unread,

Jul 9, 2011, 4:48:52 PM7/9/11

to clo...@googlegroups.com

On Sat, Jul 9, 2011 at 9:27 AM, Benny Tsai <benny...@gmail.com> wrote:
> Sorry, a bit late to the party here, but it might be worth taking a look
> at Jeffrey Straszheim's c.c.graph library to see one way of modeling DAG's
> and implementing various graph operations (such as topological sort and
> computing strongly connected components) in Clojure:
>
> API: http://clojure.github.com/clojure-contrib/graph-api.html
> Source: https://github.com/clojure/clojure-contrib/blob/master/modules/graph/src/main/clojure/clojure/contrib/graph.clj

Caveat: that library doesn't seem to have an active maintainer and has
no "new contrib" equivalent which means you may end up stuck on
Clojure 1.2...

If folks want this library to survive with Clojure 1.3, get your
signed CA on file, join clojure-dev and volunteer to help get the
library moved into "new contrib":

http://dev.clojure.org/display/design/Contrib+Library+Names

(Yes, I know I'm in danger of sounding like a broken record about
old/new contrib and Clojure 1.3!)
--
Sean A Corfield -- (904) 302-SEAN
An Architect's View -- http://corfield.org/
World Singles, LLC. -- http://worldsingles.com/
Railo Technologies, Inc. -- http://www.getrailo.com/

"Perfection is the enemy of the good."
-- Gustave Flaubert, French realist novelist (1821-1880)

Reply all

Reply to author

Forward