--
Axel Morgner · ax...@morgner.de · @amorgner
c/o Morgner UG · Hanauer Landstr. 291a · 60314 Frankfurt ·
Germany
phone: +49 151 40522060 · skype: axel.morgner
structr - Open Source CMS and Web Framework based on Neo4j: http://structr.org
structr Mailing List and Forum: https://groups.google.com/forum/#!forum/structr
Graph Database Usergroup "graphdb-frankfurt", sponsored by
Neo4j: http://www.meetup.com/graphdb-frankfurt
Das Sport-Sharing-Netzwerk des Deutschen Olympischen
Sportbundes (DOSB): https://splink.de
--
--
I suppose that a couple of the challenges would involve:- Creating/managing node UUIDs (this would/could consume a lot of properties and a lot of cache memory, since the Long node id is not a reliable UUID)
- Looking up UUIDs to resolve them to a node, since Lucene doesn't seem to like very large indices and potentially every node would be in that index- The number of extra nodes/relationships required to maintain connections between shards could be substantial depending on the specific graph's complexity
We're trying to keep fairly clear isolation between our shards so that we don't keep any significant "relationships" across nodes in different shards. In our model, most subgraphs are really discrete collections and it makes it (somewhat) easier for us to move them around between databases and servers.
--
--
Axel Morgner · ax...@morgner.de · @amorgner
c/o Morgner UG · Hanauer Landstr. 291a · 60314 Frankfurt ·
Germany
phone: +49 151 40522060 · skype: axel.morgner
+1 for UUIDs as optional/additional node id
Am 29.10.2012 10:23, schrieb Dmitriy Shabanov:
Well, have it 128 bit allow to share same id for same node other any db (globally unique id). It much better than workaround with database id as part of node id. Global address space is dream of dreams -)
On Mon, Oct 29, 2012 at 4:16 AM, Niels Hoogeveen <nielsh...@gmail.com> wrote:
When addressing the store size, would it be an option to include an id-offset for nodes and relationships; a parameter that can be set upon database creation. This would allow for cheap storage of sharding information. The id's now are longs, so theoretically 64 bits can be used to address nodes in the database. However a database can not contain more than 2^64 / record size number of nodes. This leaves room for having database ids. If the record size is somewhere in the order of 32 byte, this would mean we don't need 8 bits of the 64 bit address space, leaving room for at least 256 unique database ids.Any node or relationship with an id different not in the range of the current database can be identified and the corresponding database id can be determined for free.
--
--
Dmitriy Shabanov
--
--
Realize that the node ID is a sequential #, and this is essential to preserve since (I assume) it provides extremely fast random retrieval from a fixed offset. Therefore an UUID would be an additional memory item. Adding 16 bytes (two longs) + overhead (let's just estimate 24 bytes) on a system with a billion nodes or so quickly adds up! 24GB of storage/RAM.
Regarding discovery versus indices, it really doesn't matter - you'll still need a monstrously huge index to do the lookup, won't you?Regarding current size limitations, the one that we find more restrictive is the # of properties. We hit that limit long before we hit the node/relationship limit.
A GUID is of course nice to have, but can easily be added as a property. What GUID's miss is structural information.The current node-id and relationship-id contain information where to find the corresponding record. Record length * id = position in the file.As I stated in a previous message, not the entire address space can be used to locate node and relationship records, so the remaining space could in principle be used for other purposes, like a store id.This would give a nice structural key, making it possible to locate a node or relationship within a particular store.GUID's are too opaque for this purpose, requiring an index to link a GUID to a particular node or relationship in a particular store. Such an index can easily become very big and would not only require a lot of storage, but also increase lookup time.The proposal for a structural key based on store-id and node-id/relationship-id, adds no overhead. It does however place a limit on the number of databases one installation can serve.8 bit store-id + 56 bit node-id/relationship-id: 256 stores with approximately 10^17 nodes/relationships12 bit store-id + 52 bit node-id/relationship-id: 4096 stores with approximately 10^16 nodes/relationships16 bit store-id + 48 bit node-id/relationship-id: 65,536 stores with approximately 10^14 nodes/relationships