Extending Graph for Scala for Datomic

364 views
Skip to first unread message

Joe Barnes

unread,
Nov 16, 2012, 6:07:18 PM11/16/12
to scala...@googlegroups.com
I'm interested in extending the Graph for Scala to support Datomic persistence.  Has anyone tried this?  I'm a seasoned Java developer, and recent grad of Odersky's Scala course.  I know I'll be in way over my head if I attempt it.

Joe

Daniel Sobral

unread,
Nov 16, 2012, 9:43:17 PM11/16/12
to Joe Barnes, scala-user
Some of the most important Scala libraries were written by seasoned Java developers who were learning Scala, and wanted to do X. I suggest you go for it.


On Fri, Nov 16, 2012 at 9:07 PM, Joe Barnes <barn...@gmail.com> wrote:
I'm interested in extending the Graph for Scala to support Datomic persistence.  Has anyone tried this?  I'm a seasoned Java developer, and recent grad of Odersky's Scala course.  I know I'll be in way over my head if I attempt it.

Joe



--
Daniel C. Sobral

I travel to the future all the time.

Sonnenschein

unread,
Nov 17, 2012, 4:52:11 AM11/17/12
to scala...@googlegroups.com
Hi Joe,

this is great news. I don't think anybody has tried this.

Scala users have been expressing their need for making Graph4Scala capable of reflecting graphs stored in graph databases such as Neo4J but, I regret, I didn't find time to implement such extensions and so far nobody has teamed up with me. What I've already managed is a kind of preparation for such extensions: whenever you instantiate a Graph there is a seamless capability to provide configuration data. This is essetially the similar to the recent extension of parallel collections.

Could you tell us more about why you think Datomic persistence, which is afaics object-based, as opposite to graph-based, is still suited for a Graph4Scala facade?

Peter

Joe Barnes

unread,
Nov 17, 2012, 11:11:06 AM11/17/12
to scala...@googlegroups.com
Hey Peter, thanks for your response.  I recall reading in the Graph for Scala documentation that you envision integration with persistence libraries.  That is partly what has inspired me to consider trying this out.

There are several reasons why I'm more interested in Datomic than Neo4j, both of which I have played around with.  The most compelling and unique feature of Datomic is the built-in history.  It never deletes data, so you can make queries against the past, queries over deltas of time, etc.  I can imagine lots of useful applications of this powerful capability.

The second reason is that Datomic embraces the functional paradigm by making all data immutable (hence the keeping of all data).  I believe it makes Datomic a better fit for Scala persistence.

While I find the Neo4J concept very intriguing, I think the two reasons I just gave are compelling enough for me to prefer Datomic.  Rather than thinking of a graph as the fundamental relationship as in Neo4J, I would make a graph another data structure that can be used, like a list is.  (Just to drive the point further home, we could likewise think of every field of an object as a list, but with size == 1 as a special case)  Since Datomic supports recursive querying, it should be a good fit for Graph for Scala.

Finally, I think a little background on me will be helpful too.  I'm a seasoned Java developer as I previously mentioned.  I've done a lot of work in Hibernate, so ORM has certainly poisoned my thoughts on persistence. :)  Scala and functional programming is a very recent discovery for me.  I consider myself more of a lazy math geek that found out that computers will do the hard work for me.  I took graduate math courses in graph theory while earning my Master's in CS.  I would like to develop this Datomic capability for me to use at my current employer.  We're in the telecom industry, and I need a robust way to store and reason about network graphs.  Because Datomic isn't free, there's still a chance I'll have to fall back to Neo4J in the end.

Joe

Sonnenschein

unread,
Nov 17, 2012, 3:23:12 PM11/17/12
to scala...@googlegroups.com

Thank you for elaborating on my question, Joe. Looking at Datomic basics I realized that you can persist a graph by defining references in your Datomic schema. Is that the way you’d go?

What kind of integration do you envision? In a loose integration, you could call toDatomic that would persist the graph and fromDatomic loading your graph by means of a Datomic query. A tight integration would mean a specific implementation that would react to any method call in a Datomic-specific way for instance by utilizing the Datomic cache…

Joe Barnes

unread,
Nov 17, 2012, 5:43:06 PM11/17/12
to scala...@googlegroups.com
Yes, I would define references in the schema to allow me to create graphs.  I plan to start tinkering with this idea by writing loosely-coupled code to convert to and from the Datomic data model.  However, I think ultimately I will need a tight integration.  That will allow calls such as pathTo() can be performed against the DB, rather than having to pull in the entire graph into memory.

This is my vision for the integration: I want to be able to write functions that transform persistent graph data without using the Datomic interface.  I figure I'll still have a way to write Datalog queries to retrieve a particular graph or set of data.  As long as manipulating the data is straight-forward, I'll say it's a success.

Joe

Sonnenschein

unread,
Nov 18, 2012, 4:37:02 AM11/18/12
to scala...@googlegroups.com
Hi Joe,

it's a good idea to start with a loosely-coupled preview which will prove to be an easy task. You probably will have seen that JSON export/import and Dot export, both separate modules within Graph for Scala, follow this strategy.

Later on, you may also want to examine the ‘constrained’ module which falls into the tightly-coupled category. It is a good example for how to override behaviour at specific points by referring to extended configuration parameters. While the configuration parameter of a constrained graph contains its constraint, graphs reflecting persistent data will have a configuration parameter providing the database connection, schema description and alike.

As an aside, if you have specific restrictions on the types of connecting nodes and their cardinalities, as it will be the case with Datomic schemas, these restrictions could potentially be managed by a graph constraint in the sense of the constrained module.

What about to depict the needed transformation process by writing a few lines of test-first, non-compiling code right now? As I’m unsure about who is interested to follow the details, we may also go off list with the option to get back if somebody joins.

Peter

Tim Pigden

unread,
Nov 18, 2012, 4:44:53 AM11/18/12
to Sonnenschein, scala-user
Peter, Joe,
I'm not interested in details right now as I've too much else on the go but if you make progress I'm interested in results. This potentially addresses an area in our Vehicle Routing software where the solution is a graph and the model constraints (e.g. customer opening hours) change over time. So having Datomic address the issue that a solution is valid for the set of data constraints at the time solution was constructed solves a real problem.
Tim
--
Tim Pigden
Optrak Distribution Software Limited
+44 (0)1992 517100
http://www.linkedin.com/in/timpigden
http://optrak.com
Optrak Distribution Software Ltd is a limited company registered in England and Wales.
Company Registration No. 2327613 Registered Offices: Orland House, Mead Lane, Hertford, SG13 7AT England 
This email and any attachments to it may be confidential and are intended solely for the use of the individual to whom it is addressed. Any views or opinions expressed are solely those of the author and do not necessarily represent those of Optrak Distribution Software Ltd. If you are not the intended recipient of this email, you must neither take any action based upon its contents, nor copy or show it to anyone. Please contact the sender if you believe you have received this email in error.

Joe Barnes

unread,
Nov 18, 2012, 10:27:59 AM11/18/12
to scala...@googlegroups.com
This sounds like a great approach.  I'll certainly write tests first.  I'm familiar with JSON, so I'll start with mimicking that module.  

I'll try to get started on this tomorrow.  My main task at work is to actually spec out the features that will use this graph persistence library, so I'll be splitting time with that.  I'm also off two days next week due to Thanksgiving.  I may not get rolling too quickly in the short term.

I'm in favor of keeping this on the forum.  Perhaps it'll prove useful to someone who wants to contribute in the future.

Thanks,
Joe

Joe Barnes

unread,
Nov 18, 2012, 10:31:50 AM11/18/12
to scala...@googlegroups.com, Sonnenschein
Hey Tim, I'm glad to hear that this library may help you as well.  I've merely played with Datomic thus far, but I'm optimistic that it coupled with a good graph API will solve problems for many domains.

Joe

Stefan Ollinger

unread,
Nov 18, 2012, 2:19:47 PM11/18/12
to scala...@googlegroups.com
There is also a rudimentary Scala implementation:
https://github.com/tinkerpop/tinkubator/tree/master/gremlin-scala

Regards,
Stefan

On 18.11.2012 16:37, Joe Barnes wrote:
Thanks for the tip, Stefan.  I stumbled across TInkerpop and Blueprints while investigating.  I'll have to give a closer look at the Datomic code.  I didn't know they had made it beyond investigation with Datomic.

Joe


On Sun, Nov 18, 2012 at 6:21 AM, Stefan Ollinger <Stefan....@gmx.de> wrote:
Hi,

there is Blueprints, which is basically a Graph API with implementations for several graph databases. Currently they have support for Neo4j [1], SAIL [2] and also (not official) for Datomic [3]. Maybe it would be an option to use their API.

Regards,
Stefan

[1] https://github.com/tinkerpop/blueprints/wiki/Neo4j-Implementation
[2] https://github.com/tinkerpop/blueprints/wiki/Sail-Implementation
[3] https://github.com/datablend/blueprints/tree/master/blueprints-datomic-graph

Joe Barnes

unread,
Nov 20, 2012, 12:35:57 PM11/20/12
to scala...@googlegroups.com
Peter,

I've created a public git repo on assembla: https://www.assembla.com/code/graph-datomic/git/nodes

I used your TJsonDemo.scala as a starting point to create DatomicDemo.scala as my non-compilable test suite.  Take a look when you get time.

Joe


On Sunday, 18 November 2012 03:37:02 UTC-6, Sonnenschein wrote:

Sonnenschein

unread,
Nov 21, 2012, 3:29:18 AM11/21/12
to scala...@googlegroups.com

Hi Joe,

Thanks for creating the graph-datomic repository. My comments/ideas:

val schemaStr = …

Why would you want to describe the Datomic schema by means of a String? I’d opt for a type-safe alternative. Also, analyzing a string at run-time is expensive. How do you plan to map between Scala types and Datomic entries in detail?

val graphSchema = Graph.schema…

Graph companion objects don’t have schema factory methods, like they don’t have JSON-descriptor methods either. Same is valid for Edge, Node etc. This would also contradict loosely coupling.

assert(library.toDatomic(schema).toString === facts)

We won’t be able to check for equality on a toString basis unless the ordering is defined. Either we use Graph methods like toSortedString that ensure right ordering or we compare results with the expected results by equals. I prefer the last.

Peter

Joe Barnes

unread,
Nov 21, 2012, 9:53:48 AM11/21/12
to scala...@googlegroups.com
Peter,

We are certainly in agreement on your first and last comments.  I had already begun making changes to not use strings and make the comparisons order-agnostic.  I'm well on my way to having that implemented.  However, I may have to override the === operator to make it work because of the temporary IDs that are used.

As for the schema factory methods, I arrived at this approach because I wanted something like your predefined edge descriptors.  Any hints on how to better approach this are welcomed.

How do you like the package name I used?  I know it's only a minor detail, but I later began to think perhaps "scalax.collection.db.datomic" may be preferable because of its shorter length.  

Joe

Sonnenschein

unread,
Nov 21, 2012, 2:41:50 PM11/21/12
to scala...@googlegroups.com
Joe:
 
As for the schema factory methods, I arrived at this approach because I wanted something like your predefined edge descriptors.  Any hints on how to better approach this are welcomed.

I may have misunderstood you at this point assuming that you aimed to enhance the objects in core. If that's not the case, feel free to define your own objects for this purpose like scalax.collection.io.json.descriptor does.

How do you like the package name I used?  I know it's only a minor detail, but I later began to think perhaps "scalax.collection.db.datomic" may be preferable because of its shorter length.

I've been using scalax.collection.io.<x> but I also like scalax.collection.db.datomic.

Assembla could also serve as our collaboration platform. On this list, we are advised to post user-related questions only. I'll sign up as a watcher to your repository...

Peter

Joe Barnes

unread,
Nov 22, 2012, 9:04:49 PM11/22/12
to scala...@googlegroups.com
How can we communicate on assembla?  I've not seen a way to do that yet.

In the meantime, can you give me some pointers on how your json library is able to extract the fields and values from the scala objects?  I need some way to convert a Scala object into a map of property names to values.

Joe

Sonnenschein

unread,
Nov 23, 2012, 3:02:32 AM11/23/12
to scala...@googlegroups.com
Joe:
 
How can we communicate on assembla?  I've not seen a way to do that yet.
 
Tickets could be a way. I'll also take a look at it. If nothing is near acceptable there, we should drop the idea.

In the meantime, can you give me some pointers on how your json library is able to extract the fields and values from the scala objects?  I need some way to convert a Scala object into a map of property names to values.
 
Nodes and edges are (de)serialized by means of (lift-)JSON Serializer. The user may pass his own serializers in NodeDescriptor/EdgeDescriptor. Otherwise default serialization takes place.

In case of key-value stores, we need corresponding (de)serialization to(from) maps. Either you can spot an appropriate open source library or we must implement it on our own:(. They often require case classes because they are easier to process but it's better to support custom map serializers working with any type. To get some good ideas we could also post a question here. Sorry for not being able to give you any concrete hint at the moment.

Peter
Reply all
Reply to author
Forward
0 new messages