Just starting to use neo4j for working with large graphs (mostly social networks) and it looks very promising. After getting some mysterious results I discovered the reference node (ok, I should have read the docs more carefully...). Browsing the discussion lists, it has come up before [1] and I'm in agreement that it should be removed, although I couldn't see any decision on what was to be done or any outstanding issue. I have my own graph data which I load into neo4j, creating the nodes and relationships as I go- I don't want or need the reference node and it's really an inconvenience. I decided the best thing to do was just delete it when I create the database:
graphdb = neo4j.GraphDatabase(path)
with graphdb.transaction:
graphdb.getReferenceNode().delete()
but I'm not sure now if this might cause problems later on. I can work around it, but it is sometimes handy to have a 0-referenced (by id) list of nodes.
The notion of a fixed reference node is flawed in several ways. The database is creating data you didn't ask it to- how do you start with a clean database (first node with id=0?). If you delete the reference node, and then try to access it later you get a NotFoundException- but adding and removing nodes are fairly fundamental uses of such graph databases. It seems the idea behind the reference node is to give you a starting point for traversals- but why not just choose the first node
referenceNode = graphdb.nodes[0]
or a random one:
referenceNode = graphdb.getRandomNode()
In general, one would not expect the reference node to be the root node of your graph from which all other nodes are reachable- this assumes your graph is fully connected. If you know this is the case (its not for any of my datasets), then why not just set the reference yourself in your own code- is it really that difficult to track a reference to one node?
myReferenceNode = node0
or if its really easier, then the graph db could store the reference for you:
graphdb.setReferenceNode(myRootNode)
- at least this would allow you to set a specific node as the reference node after you had created it, and change it if required.
Obviously, since some people have already come to depend on the presence of the referenceNode, perhaps there could be some option to create it or not when the database is created:
Now there is a number of projects and code depending on the reference node, but we could start with deprecating it if there is enough opinion behind it. Will check with the others.
I am not sure where this stands. I run into issues if I create an empty graph, remove the reverence node, shut down, start up, and add new nodes. The first new node added is given an id of 0. This was OK until I started working in a High Availability environment, because I would know if I am creating a new graph or opening an existing one. In HA, if I create a "new" graph, I am in the dark as to whether I am the actual creator or if it already exists in the system and updates will be injected into my graph (including reference node removal) under the covers. If I then remove the reference node, I can get exceptions if it was already removed by another instance, or I can end up removing a non-reference node if id 0 was reused as can happen.
I suggest two improvements: 1) Never reuse id 0, unless the method used to do so is explicitly called "createReferenceNode" or the like. 2) Allow a user to create a graph with no reference node. I saw comments in another thread (dated 2010) about making the creation lazy - was this done? If so, in what release?
basically Neo4j doesn't handle deletion of the reference node very well and
nowadays it makes much sense to have it created on demand and that all dbs
are created without it. Also, would named reference nodes be of any use, in
that by default there are none but you can get or create reference nodes by
name when needed? It's a slightly redundant feature though (there are
indexes of course).
> I am not sure where this stands. I run into issues if I create an empty
> graph, remove the reverence node, shut down, start up, and add new nodes.
> The first new node added is given an id of 0. This was OK until I started
> working in a High Availability environment, because I would know if I am
> creating a new graph or opening an existing one. In HA, if I create a "new"
> graph, I am in the dark as to whether I am the actual creator or if it
> already exists in the system and updates will be injected into my graph
> (including reference node removal) under the covers. If I then remove the
> reference node, I can get exceptions if it was already removed by another
> instance, or I can end up removing a non-reference node if id 0 was reused
> as can happen.
> I suggest two improvements:
> 1) Never reuse id 0, unless the method used to do so is explicitly called
> "createReferenceNode" or the like.
> 2) Allow a user to create a graph with no reference node. I saw comments
> in another thread (dated 2010) about making the creation lazy - was this
> done? If so, in what release?
On Saturday, May 5, 2012 5:00:37 AM UTC-4, Mattias Persson wrote:
> Hi Paul,
> basically Neo4j doesn't handle deletion of the reference node very well > and nowadays it makes much sense to have it created on demand and that all > dbs are created without it. Also, would named reference nodes be of any > use, in that by default there are none but you can get or create reference > nodes by name when needed? It's a slightly redundant feature though (there > are indexes of course).
> 2012/5/3 Paul Jackson
>> I am not sure where this stands. I run into issues if I create an empty >> graph, remove the reverence node, shut down, start up, and add new nodes. >> The first new node added is given an id of 0. This was OK until I started >> working in a High Availability environment, because I would know if I am >> creating a new graph or opening an existing one. In HA, if I create a "new" >> graph, I am in the dark as to whether I am the actual creator or if it >> already exists in the system and updates will be injected into my graph >> (including reference node removal) under the covers. If I then remove the >> reference node, I can get exceptions if it was already removed by another >> instance, or I can end up removing a non-reference node if id 0 was reused >> as can happen.
>> I suggest two improvements: >> 1) Never reuse id 0, unless the method used to do so is explicitly called >> "createReferenceNode" or the like. >> 2) Allow a user to create a graph with no reference node. I saw comments >> in another thread (dated 2010) about making the creation lazy - was this >> done? If so, in what release?