I'm making a little library for myself to represent a kind of graph. I want to
make it simple, the way Rich Hickey designed much of Clojure. My designs so
far have lacked that kind of plainness and ease of use. I keep wondering,
"What would Rich Hickey do?" Even if you're not Rich Hickey, maybe you can
suggest something. I'd love to hear some ideas about how to design this.
This kind of graph is different from most graphs in two ways:
(1) Edges connect to "ports" rather than simply to nodes. A port is a tuple
(node, port-label). So, for example, if you had nodes :elizabeth and :charles,
you might have an edge connecting :elizabeth's "child" port to :charles's
"mother" port, rather than simply an edge connecting :elizabeth and :charles.
(2) Edges can connect to edges as well as to ports. For example, the
aforementioned edge might itself be connected by an edge to the "attests" port
of a node :royal-document-22416.
These differences from ordinary graphs preclude the use of the loom protocols
or implementations such as ubergraph. I would like also to allow any arbitrary
map to be associated with any node or edge, just as ubergraph allows. By the
way, if you know of a standard name for this kind of graph, please tell me.
It's not quite the same as a hypergraph. Note also that directed graphs are a
special case of this kind of graph, since the port labels can imply asymmetry
if you like. For example, if you only use labels :in and :out for ports, then
you have an ordinary directed graph.
The fact that edges can connect to edges seems to be the main source of my
difficulties. I would like to allow nodes and port labels to be of any data
type: keywords, strings, vectors, anything, even graphs and edges from graphs.
The library should make no assumptions about the data types of the nodes and
port labels. So, when passing arguments to functions in the library, you can't
have a convention like "a 2-element vector represents a port" or "a 2-element
vector of 2-element vectors represents an edge".
------------------------------------------------------------------------
Here are the main design ideas that I've entertained so far:
1. I could wrap nodes, ports, and edges in records, like this:
(defrecord Node [nodeid])
(defrecord Port [nodeid port-label])
(defrecord Edge [connpt1 connpt2])
where connpt1 and connpt2 are "connection points": either a port or an edge.
This completely disambiguates everything. Library functions never look inside
a nodeid or port-label, but they can look inside a connection point as needed.
To add a node or edge to a graph g, you could call:
(defn add [g elem] ...)
where elem must be a Node or Edge record.
A problem with this is that it's a pain to use. Every time you call a function
like 'add', you have to wrap all the nodes. Wrapping edges is slightly more of
a nuisance.
Another problem is that some decisions seem unclear and arbitrary, hence
error-prone for a caller. For example, what should the 'nodes' function
return: nodes wrapped in Node records, or unwrapped nodes? Each choice seems
reasonable—hence you can easily assume the wrong choice when writing code that
calls the library.
2. I could put two kinds of function in the library: some with wrapped
arguments and return values, and equivalents with unwrapped values. So, for
example, there could be:
(defn add [g elem] ...) ;same as above
(defn add-nodes [g & nodeids] ...)
(defn add-edges [g & edges] ...) ;each edge is, uh, I'm not sure
(defn has? [g elem] ...) ;elem must be wrapped
(defn has-node? [g nodeid] ...) ;unwrapped
(defn has-edge? [g connpt1 connpt2) ...) ;uh...
(defn nodes [g] ...) ;returns Node records
(defn nodes-unwrapped [g] ...) ;returns nodeids
It's still not fully clear how to pass edges to 'add-edges' and 'has-edge?'.
You can't assume a convention like [nodeid1 port-label1 nodeid2 port-label2]
because edges are allowed as connection points. Maybe edges would still always
need to be wrapped.
3. Inspired by this hypergraph library by bdesham:
I could give ids to edges as well as to nodes. However, most edges that I
anticipate working with will not have a meaningful id other than "the edge
between these connection points."
4. I could have a few "shorthand" functions for describing graphs with a
little DSL, in addition to the core API which would require all graph
elements to be wrapped. The shorthand DSL could be written to make the most
common kinds of graphs and subgraphs very convenient to describe, maybe even
sacrificing full generality.
------------------------------------------------------------------------
OK, that's enough. I'm probably missing something obvious here—obvious, at
least, to Rich Hickey or someone with more Clojure experience. Would you care
to smack me on the side of the head with the bit of sense that I'm missing?
How could you keep this simple and easy to use?
--
Ben Kovitz