[ANN] Neo4j 1.9.M01 released

Peter Neubauer

unread,

Oct 26, 2012, 4:39:21 PM10/26/12

to Neo4j User, neo4jrb, neo4j-...@googlegroups.com

Hi all,

we are very happy to announce the first release of Neo4j 1.9.M01.

Highlights in terms of new functionality is a totally new High Availability cluster communication framework, based on Paxos, and getting rid of the hard-to-configure Zookeeper Coordinator subsystem. Testing, feedback and comments are VERY welcome!

In this release we would like to thank Wes Freeman who has been contributing a lot of great features to Cypher, console.neo4j.org and to the discussions on this list. You rock Wes.

Cheers,

/peter neubauer

G: neubauer.peter
S: peter.neubauer
P: +46 704 106975
L: http://www.linkedin.com/in/neubauer
T: @peterneubauer

Neo4j 1.8 GA - http://www.dzone.com/links/neo4j_18_release_fluent_graph_literacy.html

Axel Morgner

unread,

Oct 26, 2012, 4:58:05 PM10/26/12

to ne...@googlegroups.com

Cool Peter!

Watched your HA screencast, the new HA architecture sounds really good!

Thanks!

--

--

Axel Morgner · ax...@morgner.de · @amorgner

c/o Morgner UG · Hanauer Landstr. 291a · 60314 Frankfurt · Germany
phone: +49 151 40522060 · skype: axel.morgner

structr - Open Source CMS and Web Framework based on Neo4j: http://structr.org
structr Mailing List and Forum: https://groups.google.com/forum/#!forum/structr
Graph Database Usergroup "graphdb-frankfurt", sponsored by Neo4j: http://www.meetup.com/graphdb-frankfurt
Das Sport-Sharing-Netzwerk des Deutschen Olympischen Sportbundes (DOSB): https://splink.de

Peter Neubauer

unread,

Oct 26, 2012, 5:06:14 PM10/26/12

to ne...@googlegroups.com

Please test and report!

Cheers,

/peter neubauer

G: neubauer.peter
S: peter.neubauer
P: +46 704 106975
L: http://www.linkedin.com/in/neubauer
T: @peterneubauer

Neo4j 1.8 GA - http://www.dzone.com/links/neo4j_18_release_fluent_graph_literacy.html

--

Javier de la Rosa

unread,

Oct 26, 2012, 5:18:58 PM10/26/12

to ne...@googlegroups.com, neo4jrb, neo4j-...@googlegroups.com

On Fri, Oct 26, 2012 at 4:39 PM, Peter Neubauer
<peter.n...@neotechnology.com> wrote:
> Read more at http://blog.neo4j.org/2012/10/neo4j-19m01-self-managed-ha.html

This new setup for HA is awesome.
Just a couple of questions. You mention something at the end of the
screencast related to node ids. You meant that node ids don't change
across the instances? So, start n=node(2) return n will alwaus return
the same node?

And the second one, let's imagine a very intense operations like
creation of thousands nodes, cloning or importing a graph, how long
does is take to replicate to the other instances?

Best regards.

--
Javier de la Rosa
http://versae.es

Peter Neubauer

unread,

Oct 26, 2012, 6:20:44 PM10/26/12

to ne...@googlegroups.com, neo4jrb, neo4j-...@googlegroups.com

Javier,

cool you like the new setup and screencast - actually it is fun to do these!

Regarding your questions - node IDs do not change across instances. Yes, start n = node(2) will always return the same node on all instances.

For replication, there is actually two protocols I think - if there are no transactions from a slave to merge (e.g. a new cluster member is joining) then the whole store is copied upon first connect, making this a comparably fast operation. After that, TX are propagated using a TX protocol. So bringing new instances online should not take much time.

Cheers,

/peter neubauer

G: neubauer.peter
S: peter.neubauer
P: +46 704 106975
L: http://www.linkedin.com/in/neubauer
T: @peterneubauer

Neo4j 1.8 GA - http://www.dzone.com/links/neo4j_18_release_fluent_graph_literacy.html

--

RickBullotta

unread,

Oct 27, 2012, 10:29:41 AM10/27/12

to ne...@googlegroups.com, neo4jrb, neo4j-...@googlegroups.com

My #1 request would be to lift the 32 billion relationship/node and 64 billion property limit, or to implement distributed graphs. That is quickly becoming a very restrictive limitation. We're going to have to create our own sharding scheme as a workaround for now (and as a result, we've had to do a lot of "non-graphy" things since we can't maintain relationships across shards very easily).

Niels Hoogeveen

unread,

Oct 28, 2012, 8:49:04 AM10/28/12

to ne...@googlegroups.com, neo4jrb, neo4j-...@googlegroups.com

Hi Rick,

Slighly off topic, I have been thinking about sharding lately, since I want to introduce that in one of the next versions of my software. One of the strategies I am condsidering now has the following properties:

Each node belongs to a shard.

Relationships between nodes belonging to the same shard are treated the same as relationships are treated now.

Creating a relationships between nodes belonging to different shards is treated differently. Suppose we want to create a relationship from node1 (in shard1) to node2 (in shard 2). First we do a lookup for a node in shard2 that represents node1, if not we create that node. Then we do the same for node2 in shard1. Then we create two relationships, one between node1 and the representative of node2 in shard1, and a relationship between node2 and the representative of node1 in shard2.

Representative nodes contain the uuid of the original node and have a relationship to a representative node of the shard of the original node, so it can transparently be looked up.

Taking these steps guarantees that shards are effectively disonnected from one another and can thus be distributed over different databases.

When a shard is moved from one database to another, all nodes representing that shard in all other shards need to be updated, unless we devise some central repository for shards.

Any thoughts?

RickBullotta

unread,

Oct 28, 2012, 9:36:44 AM10/28/12

to ne...@googlegroups.com, neo4jrb, neo4j-...@googlegroups.com

Hi, Niels.

I suppose that a couple of the challenges would involve:

- Creating/managing node UUIDs (this would/could consume a lot of properties and a lot of cache memory, since the Long node id is not a reliable UUID)

- Looking up UUIDs to resolve them to a node, since Lucene doesn't seem to like very large indices and potentially every node would be in that index

- The number of extra nodes/relationships required to maintain connections between shards could be substantial depending on the specific graph's complexity

We're trying to keep fairly clear isolation between our shards so that we don't keep any significant "relationships" across nodes in different shards. In our model, most subgraphs are really discrete collections and it makes it (somewhat) easier for us to move them around between databases and servers.

Rick

Dmitriy Shabanov

unread,

Oct 28, 2012, 3:13:31 PM10/28/12

to ne...@googlegroups.com, neo4jrb, neo4j-...@googlegroups.com

On Sun, Oct 28, 2012 at 6:36 PM, RickBullotta <rick.b...@gmail.com> wrote:

I suppose that a couple of the challenges would involve:

- Creating/managing node UUIDs (this would/could consume a lot of properties and a lot of cache memory, since the Long node id is not a reliable UUID)

uuid is just 2 longs, so it double memory consumption ... hmm ... not much on one side and a lot for another. Maybe some switch to run db in two different modes? or anything.

- Looking up UUIDs to resolve them to a node, since Lucene doesn't seem to like very large indices and potentially every node would be in that index
- The number of extra nodes/relationships required to maintain connections between shards could be substantial depending on the specific graph's complexity

it simpler if think in discovery service alya jxta. that mean no requirement to remember where it stored, but know where to ask (several places or all).

We're trying to keep fairly clear isolation between our shards so that we don't keep any significant "relationships" across nodes in different shards. In our model, most subgraphs are really discrete collections and it makes it (somewhat) easier for us to move them around between databases and servers.

I'm agree that 32 billion too small figure. If my site have 1M accounts only 32k nodes left for objects per account, not much. Have only one db much better that several in many reasons.

--
Dmitriy Shabanov

Michael Hunger

unread,

Oct 28, 2012, 6:21:28 PM10/28/12

to ne...@googlegroups.com

The store-size issue is planned to be addressed in 1.10 in spring 2013.

Michael

--

Niels Hoogeveen

unread,

Oct 28, 2012, 7:16:03 PM10/28/12

to ne...@googlegroups.com

When addressing the store size, would it be an option to include an id-offset for nodes and relationships; a parameter that can be set upon database creation. This would allow for cheap storage of sharding information. The id's now are longs, so theoretically 64 bits can be used to address nodes in the database. However a database can not contain more than 2^64 / record size number of nodes. This leaves room for having database ids. If the record size is somewhere in the order of 32 byte, this would mean we don't need 8 bits of the 64 bit address space, leaving room for at least 256 unique database ids.

Any node or relationship with an id different not in the range of the current database can be identified and the corresponding database id can be determined for free.

Niels

Dmitriy Shabanov

unread,

Oct 29, 2012, 5:23:43 AM10/29/12

to ne...@googlegroups.com

Well, have it 128 bit allow to share same id for same node other any db (globally unique id). It much better than workaround with database id as part of node id. Global address space is dream of dreams -)

Axel Morgner

unread,

Oct 29, 2012, 5:30:17 AM10/29/12

to ne...@googlegroups.com

+1 for UUIDs as optional/additional node id

--

--

Axel Morgner · ax...@morgner.de · @amorgner

c/o Morgner UG · Hanauer Landstr. 291a · 60314 Frankfurt · Germany
phone: +49 151 40522060 · skype: axel.morgner

http://structr.org
http://www.meetup.com/graphdb-frankfurt
https://splink.de

Dmitriy Shabanov

unread,

Oct 29, 2012, 5:32:11 AM10/29/12

to ne...@googlegroups.com

I mean UUIDs for nodes & relationships, not just nodes -)

On Mon, Oct 29, 2012 at 2:30 PM, Axel Morgner <ax...@morgner.de> wrote:

+1 for UUIDs as optional/additional node id

Am 29.10.2012 10:23, schrieb Dmitriy Shabanov:

Well, have it 128 bit allow to share same id for same node other any db (globally unique id). It much better than workaround with database id as part of node id. Global address space is dream of dreams -)

On Mon, Oct 29, 2012 at 4:16 AM, Niels Hoogeveen <nielsh...@gmail.com> wrote:

When addressing the store size, would it be an option to include an id-offset for nodes and relationships; a parameter that can be set upon database creation. This would allow for cheap storage of sharding information. The id's now are longs, so theoretically 64 bits can be used to address nodes in the database. However a database can not contain more than 2^64 / record size number of nodes. This leaves room for having database ids. If the record size is somewhere in the order of 32 byte, this would mean we don't need 8 bits of the 64 bit address space, leaving room for at least 256 unique database ids.

Any node or relationship with an id different not in the range of the current database can be identified and the corresponding database id can be determined for free.

--
Dmitriy Shabanov

Axel Morgner

unread,

Oct 29, 2012, 5:33:13 AM10/29/12

to ne...@googlegroups.com

Yes! (forgot to write)

--

Rick Bullotta

unread,

Oct 29, 2012, 10:30:38 AM10/29/12

to ne...@googlegroups.com

Realize that the node ID is a sequential #, and this is essential to preserve since (I assume) it provides extremely fast random retrieval from a fixed offset. Therefore an UUID would be an additional memory item. Adding 16 bytes (two longs) + overhead (let's just estimate 24 bytes) on a system with a billion nodes or so quickly adds up! 24GB of storage/RAM.

Regarding discovery versus indices, it really doesn't matter - you'll still need a monstrously huge index to do the lookup, won't you?

Regarding current size limitations, the one that we find more restrictive is the # of properties. We hit that limit long before we hit the node/relationship limit.

--
Dmitriy Shabanov

--

Niels Hoogeveen

unread,

Oct 29, 2012, 11:07:19 AM10/29/12

to ne...@googlegroups.com

A GUID is of course nice to have, but can easily be added as a property. What GUID's miss is structural information.

The current node-id and relationship-id contain information where to find the corresponding record. Record length * id = position in the file.

As I stated in a previous message, not the entire address space can be used to locate node and relationship records, so the remaining space could in principle be used for other purposes, like a store id.

This would give a nice structural key, making it possible to locate a node or relationship within a particular store.

GUID's are too opaque for this purpose, requiring an index to link a GUID to a particular node or relationship in a particular store. Such an index can easily become very big and would not only require a lot of storage, but also increase lookup time.

The proposal for a structural key based on store-id and node-id/relationship-id, adds no overhead. It does however place a limit on the number of databases one installation can serve.

8 bit store-id + 56 bit node-id/relationship-id: 256 stores with approximately 10^17 nodes/relationships

12 bit store-id + 52 bit node-id/relationship-id: 4096 stores with approximately 10^16 nodes/relationships

16 bit store-id + 48 bit node-id/relationship-id: 65,536 stores with approximately 10^14 nodes/relationships

Niels

Rick Bullotta

unread,

Oct 29, 2012, 11:11:55 AM10/29/12

to ne...@googlegroups.com

The other consideration in these discussions is the portability of the IDs - backup/archive/transfer of nodes or subgraphs between graphs should be supported somehow (which may require making the ability to reuse IDs a configurable option), as well as determining how to assign/manage the store IDs.

--

Dmitriy Shabanov

unread,

Oct 29, 2012, 12:22:40 PM10/29/12

to ne...@googlegroups.com

Well, properties come in because of node/relationship limits (my guess). It possible to move properties into node/relationship area. We can look on the problem from different points:

- memory size (storage size) ... if you have small db you will have small requirements. If you db grow you have to provide more memory anyway.

- "structure" design ... that always question of finding way to fit into limits

- "ideas" design ... that most interesting point, because it related to way we are thinking. Very often we need to find workaround for our systems to support growth (in most cases because of decisions at "structure" design stage). But things become very simple as soon as we start think in term of global addressing space. Of course, I can continue this subject (and can if anyone interesting). For now hope that points understandable.

On Mon, Oct 29, 2012 at 7:30 PM, Rick Bullotta <rick.b...@gmail.com> wrote:

Realize that the node ID is a sequential #, and this is essential to preserve since (I assume) it provides extremely fast random retrieval from a fixed offset. Therefore an UUID would be an additional memory item. Adding 16 bytes (two longs) + overhead (let's just estimate 24 bytes) on a system with a billion nodes or so quickly adds up! 24GB of storage/RAM.

Regarding discovery versus indices, it really doesn't matter - you'll still need a monstrously huge index to do the lookup, won't you?

Regarding current size limitations, the one that we find more restrictive is the # of properties. We hit that limit long before we hit the node/relationship limit.

--
Dmitriy Shabanov

Dmitriy Shabanov

unread,

Oct 29, 2012, 12:33:50 PM10/29/12

to ne...@googlegroups.com

Niels, you write right things from storage infrastructure point of view, BUT from point of systems design this gives nothing. I don't want to say that you are wrong. Just wanna say that I (as minimum) have to support UUID to node/repationship Id mapping anyway.

Maybe, it have to stay this way. And one solution for that two different level problems don't exist at all.

To be clear here they are:
- look up at physical storage
- look up at global addressing space

On Mon, Oct 29, 2012 at 8:07 PM, Niels Hoogeveen <nielsh...@gmail.com> wrote:

A GUID is of course nice to have, but can easily be added as a property. What GUID's miss is structural information.

The current node-id and relationship-id contain information where to find the corresponding record. Record length * id = position in the file.

As I stated in a previous message, not the entire address space can be used to locate node and relationship records, so the remaining space could in principle be used for other purposes, like a store id.

This would give a nice structural key, making it possible to locate a node or relationship within a particular store.

GUID's are too opaque for this purpose, requiring an index to link a GUID to a particular node or relationship in a particular store. Such an index can easily become very big and would not only require a lot of storage, but also increase lookup time.

The proposal for a structural key based on store-id and node-id/relationship-id, adds no overhead. It does however place a limit on the number of databases one installation can serve.

8 bit store-id + 56 bit node-id/relationship-id: 256 stores with approximately 10^17 nodes/relationships
12 bit store-id + 52 bit node-id/relationship-id: 4096 stores with approximately 10^16 nodes/relationships

16 bit store-id + 48 bit node-id/relationship-id: 65,536 stores with approximately 10^14 nodes/relationships

--
Dmitriy Shabanov

Reply all

Reply to author

Forward