Port graphs: What would Rich Hickey do?

720 views
Skip to first unread message

Ben Kovitz

unread,
Jan 1, 2018, 5:07:28 PM1/1/18
to Clojure
I'm making a little library for myself to represent a kind of graph. I want to
make it simple, the way Rich Hickey designed much of Clojure. My designs so
far have lacked that kind of plainness and ease of use. I keep wondering,
"What would Rich Hickey do?" Even if you're not Rich Hickey, maybe you can
suggest something. I'd love to hear some ideas about how to design this.

This kind of graph is different from most graphs in two ways:

(1) Edges connect to "ports" rather than simply to nodes. A port is a tuple
(node, port-label). So, for example, if you had nodes :elizabeth and :charles,
you might have an edge connecting :elizabeth's "child" port to :charles's
"mother" port, rather than simply an edge connecting :elizabeth and :charles.

(2) Edges can connect to edges as well as to ports. For example, the
aforementioned edge might itself be connected by an edge to the "attests" port
of a node :royal-document-22416.

These differences from ordinary graphs preclude the use of the loom protocols
or implementations such as ubergraph. I would like also to allow any arbitrary
map to be associated with any node or edge, just as ubergraph allows. By the
way, if you know of a standard name for this kind of graph, please tell me.
It's not quite the same as a hypergraph. Note also that directed graphs are a
special case of this kind of graph, since the port labels can imply asymmetry
if you like. For example, if you only use labels :in and :out for ports, then
you have an ordinary directed graph.

The fact that edges can connect to edges seems to be the main source of my
difficulties. I would like to allow nodes and port labels to be of any data
type: keywords, strings, vectors, anything, even graphs and edges from graphs.
The library should make no assumptions about the data types of the nodes and
port labels. So, when passing arguments to functions in the library, you can't
have a convention like "a 2-element vector represents a port" or "a 2-element
vector of 2-element vectors represents an edge".

------------------------------------------------------------------------

Here are the main design ideas that I've entertained so far:

1. I could wrap nodes, ports, and edges in records, like this:

  (defrecord Node [nodeid])
  (defrecord Port [nodeid port-label])
  (defrecord Edge [connpt1 connpt2])

where connpt1 and connpt2 are "connection points": either a port or an edge.
This completely disambiguates everything. Library functions never look inside
a nodeid or port-label, but they can look inside a connection point as needed.

To add a node or edge to a graph g, you could call:

  (defn add [g elem] ...)

where elem must be a Node or Edge record.

A problem with this is that it's a pain to use. Every time you call a function
like 'add', you have to wrap all the nodes. Wrapping edges is slightly more of
a nuisance.

Another problem is that some decisions seem unclear and arbitrary, hence
error-prone for a caller. For example, what should the 'nodes' function
return: nodes wrapped in Node records, or unwrapped nodes? Each choice seems
reasonable—hence you can easily assume the wrong choice when writing code that
calls the library.

2. I could put two kinds of function in the library: some with wrapped
arguments and return values, and equivalents with unwrapped values. So, for
example, there could be:

  (defn add [g elem] ...)  ;same as above
  (defn add-nodes [g & nodeids] ...)
  (defn add-edges [g & edges] ...)  ;each edge is, uh, I'm not sure
  (defn has? [g elem] ...)  ;elem must be wrapped
  (defn has-node? [g nodeid] ...)  ;unwrapped
  (defn has-edge? [g connpt1 connpt2) ...)  ;uh...
  (defn nodes [g] ...)  ;returns Node records
  (defn nodes-unwrapped [g] ...)  ;returns nodeids

It's still not fully clear how to pass edges to 'add-edges' and 'has-edge?'.
You can't assume a convention like [nodeid1 port-label1 nodeid2 port-label2]
because edges are allowed as connection points. Maybe edges would still always
need to be wrapped.

3. Inspired by this hypergraph library by bdesham:
I could give ids to edges as well as to nodes. However, most edges that I
anticipate working with will not have a meaningful id other than "the edge
between these connection points."

4. I could have a few "shorthand" functions for describing graphs with a
little DSL, in addition to the core API which would require all graph
elements to be wrapped. The shorthand DSL could be written to make the most
common kinds of graphs and subgraphs very convenient to describe, maybe even
sacrificing full generality.

------------------------------------------------------------------------

OK, that's enough. I'm probably missing something obvious here—obvious, at
least, to Rich Hickey or someone with more Clojure experience. Would you care
to smack me on the side of the head with the bit of sense that I'm missing?
How could you keep this simple and easy to use?

--
Ben Kovitz


Raoul Duke

unread,
Jan 1, 2018, 5:13:54 PM1/1/18
to Clojure
€0.02 i like option #3, i think it would be possibly nice for edges to be named based on the ports they connect.

Christopher Small

unread,
Jan 1, 2018, 7:57:11 PM1/1/18
to Clojure

Do you need to be able to attach data to specific edges instances? Or would it be sufficient to be able to have different types of edges, and be able to associate data with those abstract edge types?

If the latter, you might want to look at Datomic (also authored by Rich Hickey), DataScript (an in-memory approximation) or RDF (data language of the Semantic Web). They model graphs as `(entity, attribute, value)` triples, where `attribute`s can act as labeled, directional arrows between entities, and can themselves show up in the `entity` position of other triples, thus allowing you to have "meta-attributes" (attributes pointing to/from other attributes).

Obviously, this is a little bit different from your "ports" idea (I haven't read about this; is it published on anywhere?), but in some ways a helpful restriction. For example, `mom` vs `child` is isomorphic to an edge labeled `isMotherOf`. But simpler, because there's less to specify, and in particular, less to get wrong with respect to which kinds of ports can connect to which kinds of other ports. So IMHO, simple labeled directional edges are best when semantically sufficient.

If either data needs to be attached to edges, or there are semantics of the modeling of things in terms of ports which are advantageous, you can still use EAV to model this data. You would just have entities modeling the edges and/or ports, and connect them using triples along the lines of `[[edge :my.graph.edge/from node-or-port-1] [edge :my.graph.edge/to node-or-port-2] [edge :my.graph.edge/label "a really cool edge"]]`. With Datomic or DataScript, you'd then be able to put together a set of Datalog rules which represent whatever higher level graph relations that might be useful. (Datalog is awesome, BTW)

Interested to hear what you think of this idea :-)

Chris

Ben Kovitz

unread,
Jan 1, 2018, 11:41:14 PM1/1/18
to Clojure
On Monday, January 1, 2018 at 7:57:11 PM UTC-5, Christopher Small wrote:

…you might want to look at Datomic (also authored by Rich Hickey), DataScript (an in-memory approximation) or RDF (data language of the Semantic Web). They model graphs as `(entity, attribute, value)` triples, where `attribute`s can act as labeled, directional arrows between entities, and can themselves show up in the `entity` position of other triples, thus allowing you to have "meta-attributes" (attributes pointing to/from other attributes).

Datomic might be just the dose of simplicity I was looking for! I haven't yet used Datomic, so now is probably a very good time to try it out. I had thought that Datomic might make a good underlying implementation rather than the API, but now I wonder if the Datomic API has ideas I could use. The entity-attribute-value approach sounds to me like one of the best ideas ever (long predating computers, even). Thanks! I'll have a closer look. This might even be what Rich Hickey would do! :)

Do you need to be able to attach data to specific edges instances?

Yes. Every edge needs to be able to have an arbitrary map of attributes.

Or would it be sufficient to be able to have different types of edges, and be able to associate data with those abstract edge types?

I'd like to be able to treat all edges the same.
 
This is for a research project, so I need to be able to try out new ideas quickly to see how they perform. I've been using an implementation of port graphs modeled on ubergraph with some success, where all nodes and edges can have arbitrary attribute maps associated with them. Where I've needed edges that link to edges, I've used a couple hacks, like making a separate graph where some of the nodes are edges from the main graph. The nuisance of using those hacks has been steering me away from trying out promising ideas that freely make use of edge-to-edge edges. So, I figure that it's time to just implement this right and be done with it.

The fact that it's a research project makes minimal error-proneness especially important. It's often hard to know in advance what the correct behavior looks like. The reason for writing the program is to find out.
 
Obviously, this is a little bit different from your "ports" idea (I haven't read about this; is it published on anywhere?), but in some ways a helpful restriction.

If you google for "port graphs", you'll find some stuff. It's not clear to me why, but port graphs seem to be popular for "graph-rewriting systems" (also googlable). I'm working on something like a graph rewriting system. I hit on the idea as a way to make the graph simpler, and only later found out that there's already a standard name for these graphs and even some good research literature. There's *some* sort of natural fit here.
 
For example, `mom` vs `child` is isomorphic to an edge labeled `isMotherOf`. But simpler, because there's less to specify, and in particular, less to get wrong with respect to which kinds of ports can connect to which kinds of other ports.

Something I've liked about the port graphs up until now is that it's easy to add an explicit representation of rules for which kinds of ports can connect and which can't. The current implementation (the one without edge-to-edge edges) has some "shorthand" functions that exploit those rules so that you can specify some edges very simply, often without explicitly providing the port labels, since the system can (in some cases) deduce them. Most of my previous experiences with programming involving graphs have involved a lot of hard-to-read code and a lot of sweating and gritting my teeth while trying to avoid bugs. The ports-rules-and-shorthand approach has enabled to me keep the code mostly readable. In fact, much of the code is just static data that specifies small subgraphs—in a pretty readable way.

The rules are also nicely separable from the generic aspects of the graph. I added a couple protocols beyond the loom protocols: one for creating new nodes of a given 'class', so they can get ids and attributes assigned to them automatically; and one for consulting the rules for legal connections.

But it's clearly time for me to look at this in a new way. I like your observation that (in effect) edge classes of the form (port-label1, port-label2) are isomorphic to edges between ports. That and Datomic might be exactly the whack(s) on the side of the head that I needed!

Ben Kovitz

unread,
Jan 2, 2018, 2:31:56 PM1/2/18
to Clojure
Christopher Small,

Your suggestion to look into Datomic definitely turned up a new idea. I went through the Datalog tutorial at http://www.learndatalogtoday.org (which is excellent, by the way), and noticed that the attribute names in the datoms are a lot like the port labels in port graphs. A port label is essentially an attribute key whose value is the edges incident to it. And when you chain data patterns in Datalog queries, that's a lot like navigating through a graph.

The new, hopefully simple, easy-to-use, and not error-prone idea is to use the clojure.lang.IPersistentMap  interface to query and update graphs. Nodes, edge, attribute names, port labels, and a few "reserved keywords" like :adj-nodes and :edge, could navigate the graph as if it were an ordinary nested map with get-in, assoc-in, and the like, without ever needing to wrap anything inside a record to tell what it is. Here are some sample queries and what they could return:

(get-in g [node port-label :adj-nodes])
  a coll of all the nodes with an edge to [node port-label]
(get-in g [node port-label :adj-node])
  one node with an edge to [node port-label], or nil
(get-in g [node port-label])
  true or false, according to whether node has a port named port-label
(get-in g [node k])
  value of node's attr k
(get-in g [edge k])
  value of edge's attr k
(get g node-or-edge)
  the entire map of attrs associated with node-or-edge, or nil if it does not exist
(get-in g [node k1 k2 k3])
  treat node's attrs as a nested map
(get-in [node port-label :edges])
  a coll of all the edges that link to [node port-label]
(get-in [node port-label :edge])
  a single edge that links to port-label, or nil
(get-in [node port-label1 port-label2])
  coll of all nodes that link to [node port-label1] from a port named port-label2

And here are some function calls that would return a modified graph:

(assoc-in g [node k] v)
  sets node attr k to v
(assoc-in g [edge k] v)
  sets edge attr k to v  (and similarly for multiple keys)
(assoc-in g [node1 port-label1 port-label2 node2] :edge)
  makes an edge between [node1 port-label1] and [node2 port-label2]
(assoc-in g [node1 port-label1 port-label2 :edge k] v)
  sets value of edge attr k to v

I haven't yet thought this through completely, and I suspect that some parts are still error-prone. For example, notice in the last function call that it's not clear whether you should include node2. The approach might be "too cute"—that is, brittle because it tries to impose one kind of structure (nested maps) onto one that isn't fully isomorphic (graphs).

But I like the basic idea of having a "little language" for designating whatever you want, and the little language is nothing but a sequence that tells how to navigate through the graph to find (or make) what you're interested in. 

This would solve the fundamental problem of how to designate an edge without imposing type restrictions. For each connection point, you just have a sequence that describes a path to that point. If the sequence ends on a port label, the connection point is a port. If it ends on :edge, the connection point is an edge. So, instead of type restrictions, there are just a few reserved words in the tiny DSL for navigating the graph.

This hopefully makes simple editing and querying of port graphs easy, but it does not even try to solve the problem of conveniently entering a graph. I figure that a "shorthand", i.e. a separate tiny DSL, is the solution for that. I found with my current implementation of port graphs that it really helped to use a custom shorthand for each specific type of graph. For example, there's a obvious and very nice shorthand for specifying graphs that represent mathematical expressions, but it's terrible for graphs of social networks. Graphs of social networks, though, can be conveniently described by their own custom sexprs.

I figure that implementing it will force me to think this idea through the rest of the way. Here goes…

Christopher Small

unread,
Jan 2, 2018, 3:11:45 PM1/2/18
to clo...@googlegroups.com
> ...noticed that the attribute names in the datoms are a lot like the port labels in port graphs. A port label is essentially an attribute key whose value is the edges incident to it. And when you chain data patterns in Datalog queries, that's a lot like navigating through a graph.

That's very much in line with what I was thinking you might want to do in this setting.

You should also take a look at the entity api. It let's you traverse the graph via these entity objects which behave more or less like a dictionary of the av pairs associated with a particular entity in the database. Importantly though, when an attribute points to another entity (a reference attribute), it is returned as yet another of these entity objects, letting you traverse the graph using get, in a pretty similar fashion to what you're describing.


You may also check out the pull api. It let's you pull facts out of the graph as regular old maps by specifying what attributes & paths you wish to pull out of the graph. The result is that you can read from the underlying graph database as though it was a document store, but where the documents are dynamically computed/queried from the graph based on the pull expression you provide. You can also use pull expressions inside the `:find` clause of either a DataScript of Datomic datalog query, which is often a convenient way of specifying both a set of entities (via a datalog query) and statement of facts to pull about them.

http://docs.datomic.com/pull.html

Collectively, these three interfaces give you quite a lot of expressiveness and flexibility in how you query/traverse your data.



--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+unsubscribe@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to a topic in the Google Groups "Clojure" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/clojure/uSwY475pbGA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to clojure+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Gary Verhaegen

unread,
Jan 4, 2018, 4:01:41 AM1/4/18
to clo...@googlegroups.com
If you’re shopping around for ideas as to how to define data manipulation API, it may be worth taking some time looking at Specter too.

Have you considered encoding your port graph into a normal graph? That would require you to define a mapping, but would then allow you to fairly easily reuse theorems, algorithms and libraries by having them operate on the encoded graph, with a hopefully thin layer of translation.

One possible encoding based on what you’ve described could be to turn nodes and edges in the port graph into nodes in the encoded graph, and ports in the port graph into edges in the encoded graph. It’s hard to say how well that specific transform would work in your case without more details about your type of graph and somewhat more considerate thinking than I’ve given this, but I have seen this approach of encoding graphs into other graphs work out pretty well in other contexts.

For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.

Ben Kovitz

unread,
Jan 4, 2018, 12:48:17 PM1/4/18
to Clojure
On Tuesday, January 2, 2018 at 3:11:45 PM UTC-5, Christopher Small wrote:
 
 

Thanks—this is really good stuff! Not that I expected anything less, but it's a happy surprise that it applies so well to graphs. Now I'm wondering if I should just use Datomic rather than borrow ideas from it.

One thing I'm convinced of now is that it's OK to assign ids to everything. I was cringing at that idea before, because it seems like an unnecessary layer of indirection and one more thing that could go wrong. But clearly Datomic's approach of assigning every "entity" an id enables some wonderful simplicity, since it enables one to very conveniently chain entities together. And when basically everything is an entity, that means you can chain anything to anything—which addresses the "type headache" I was trying to avoid.

Ben Kovitz

unread,
Jan 4, 2018, 1:19:20 PM1/4/18
to Clojure
On Thursday, January 4, 2018 at 4:01:41 AM UTC-5, Gary Verhaegen wrote:

Have you considered encoding your port graph into a normal graph?

That was the first idea I had, and I quickly rejected it as adding complication. The consideration that led me to port graphs in the first place was that when, say, I have a node for "plus", and it has neighbor nodes for "operand" and "sum", which other nodes connect to, an easy bug is that when the "plus" is deleted, its "operand" and "sum" nodes might not get deleted—leaving the graph in an invalid state. I figured it would be best to just absorb those connection nodes into the "plus" node in the form of a list of "ports", and include the port labels in the edges. Then deleting the "plus" automatically deletes the ports, you can easily query for "what nodes are connected to the plus" (where you want to skip connection nodes), and generally there are fewer ways to make mistakes in the code.

However, following Christopher Small's suggestion to check out the Datomic APIs, I'm now thinking that a layer to map the port graphs to simplicial ("normal", undirected) graphs might actually be the simplest thing. Nodes, ports, and edges are easily represented by nodes in a simplicial graph. This escapes the "type headache" that led me to start this thread, since all edges in the port graph are just plain nodes in the simplicial graph, whether they're port-to-port edges, port-to-edge edges, or edge-to-edge edges. The layer of indirection might cost some performance—and performance does matter here, since this program is extremely CPU-bound—but it's more important to preëmpt bugs by keeping code simple.

And now you make this interesting point, which I hadn't previously considered:

That would require you to define a mapping, but would then allow you to fairly easily reuse theorems, algorithms and libraries by having them operate on the encoded graph, with a hopefully thin layer of translation.

I could even use ubergraph to represent the simplicial graph, and thereby get access to all code that follows the loom protocols. 

I have seen this approach of encoding graphs into other graphs work out pretty well in other contexts.

Do you remember any of those other contexts? Maybe I could check one out and learn something from it.

If you’re shopping around for ideas as to how to define data manipulation API, it may be worth taking some time looking at Specter too.

Thanks for this suggestion, too. I'm now checking it out.
 

Gary Verhaegen

unread,
Jan 4, 2018, 5:13:43 PM1/4/18
to clo...@googlegroups.com
Unfortunately most of the contexts were fairly specific and a few
years ago, so I don't remember all of the details, and mostly in the
context of private research that I'm not at liberty to share. Some of
the results have been published (though not my specific work) by a
then-colleague of mine; you can find his papers by googling "amine
ghrab graph olap".

Christopher Small

unread,
Jan 4, 2018, 5:21:09 PM1/4/18
to clo...@googlegroups.com
Yes; Datomic really is great. There's plenty of wonderful stuff I haven't even mentioned, like database as a value, persistent history, snapshots at a timepoint, transaction metadata, smart peer replication etc. It's kind of the best thing since sliced bread. But if you don't need all that jazz and are fine with keeping things in memory for your problem, you may wish to consider DataScript. It's free and open source and has remarkably good coverage of the entity, pull and q APIs, and can run in cljs or clj.

One final note: with respect to "type headache" and "chain[ing] anything to anything", you're right; this is a major benefit over traditional SQL. Attribute are fundamentally polymorphic, which can be super helpful with tricky modelling problems.

Chris


--

Ben Kovitz

unread,
Jan 4, 2018, 6:51:27 PM1/4/18
to Clojure
I replied to Gary Verhaegen:

If you’re shopping around for ideas as to how to define data manipulation API, it may be worth taking some time looking at Specter too.

Thanks for this suggestion, too. I'm now checking it out.

Well! So far, Specter appears to have taken the "path map" idea that I'd been toying with to its ultimate logical conclusion—far better than I could ever do it. Nathan Marz even says that needing to manipulate tricky graphs "forced" him to come up with Specter. If I understand it correctly, Specter would even allow me to map these peculiar port graphs to simplicial graphs without adding any performance overhead.

It looks like I'd need to write a custom navigator or two. I haven't yet found a good tutorial for getting beyond the most elementary concepts of Specter, though. Can you recommend one?

James Gatannah

unread,
Jan 8, 2018, 12:14:59 AM1/8/18
to Clojure


On Thursday, January 4, 2018 at 5:51:27 PM UTC-6, Ben Kovitz wrote:

Well! So far, Specter appears to have taken the "path map" idea that I'd been toying with to its ultimate logical conclusion—far better than I could ever do it. Nathan Marz even says that needing to manipulate tricky graphs "forced" him to come up with Specter. If I understand it correctly, Specter would even allow me to map these peculiar port graphs to simplicial graphs without adding any performance overhead.

It looks like I'd need to write a custom navigator or two. I haven't yet found a good tutorial for getting beyond the most elementary concepts of Specter, though. Can you recommend one?

FWIW, I think https://leanpub.com/specter looks extremely interesting. (Or it may be awful...I haven't had a chance to read it yet, much less work through the exercises).

Regards,
James

Ben Kovitz

unread,
Jan 10, 2018, 5:56:07 PM1/10/18
to Clojure
On Monday, January 8, 2018 at 12:14:59 AM UTC-5, James Gatannah wrote:
 
FWIW, I think https://leanpub.com/specter looks extremely interesting. (Or it may be awful...I haven't had a chance to read it yet, much less work through the exercises).

Actually, I worked through that ebook last week. (Or at least the version of the ebook in Google's cache. Leanpub's web site appears to be down.) It's excellent! It didn't go all that far, but it got across the main ideas very effectively.

I've been continuing to look at Specter, and the more I've learned, the better it's seemed. I have yet to try it myself on anything beyond toy examples, though. Currently I'm stumped on how to make a navigator for a data structure that doesn't work with 'get' and 'assoc'. There's probably still something elementary that I haven't understood yet. I just posted a question on the github site:
Reply all
Reply to author
Forward
0 new messages