I'm after some database design help, which will need some explanation to set the context.I'm using Neo4J (1.8 community edition, server mode - not embedded) to store a map of the type relationships within the DLLs deployed within my organisation. I don't think it matters, but they're .NET DLLs, generated from C# code.I have about 2500 DLLs or so to parse. In the process of parsing I build up a list of all C# types detected, and for each type the list of other types that it uses (by variable reference, method reference or base type reference). Hope that makes sense - I want to end up with a complete tree of type dependencies to allow impact analysis.Once I have all of the types, certain known types are identified - we call them 'Entity' types. Example entity types are web pages, web service proxies, database commands.There is then a GUI that sits on top of this information that allows a user to select a known entity and see it's tree of entity dependencies: up the tree to see the entities which use the entity in question (and all the way up the tree, not just one layer); and down the tree to see the entities used by the entity in question (and again further down to see downstream entity dependencies). In this view, the types that are not known entities are not shown, although their existence is still needed to link known entities.So basically I'm looking at a very deep tree, with certain types of nodes to be shown, but the intervening nodes not shownI can successfully get the data into the Neo4J database - I manage to get all of the DLL, type and type relationship information into a QuickGraph bidirectional graph which I then save as a GraphML file. Using Neo4jClient's Gremlin support it's then a one step operation to get the data imported into Neo4J via loadGraphML.Once the data is there, I end up with 120702 different type nodes and 471144 relationships between these types - these relationships I have given the type 'TypeUsedByType'.From these 120702 type nodes I then manage to identify 6500 types that are my special 'Entity' types.And then the uncertainty about what approach begins as I'm having trouble getting Neo4J to return the tree (up and down) for a single type.So far I have given each Type node that refers to an 'Entity' an additional property - the numberic EntityTypeID of the particular entity type (eg WebPage, DBCommand, WebService etc). I have also created a set of nodes for the different entity types and added a relationship between each identified entity and the node for that entity type.I would then like a set of relationships between the known entities - say 'EntityUsedByEntity'. This would cut out the middle men - the types that come between entities that aren't entities themselves. This is the set of relationships I would like to show in the GUI.So to identify these EntityUsedByEntity relationships, I need to take each of my 6500 identified entities and go down its 'TypeUsedByType' relationships until a known entity is found. At that stage, I would like to stop (I don't need to go any further) and create an EntityUsedByEntity relationship between the source entity and the found entity.Which comes to my question: how best to do this in a way that Neo4J can cope with. I should say up front, I'm trying this in cypher, but if gremlin seems a better choice I'll try that. Using the java traversal API doesn't seem an option as I'm accessing Neo4J as a server - and I'm writing in C#.Ideally I'd like a single cypher query to create all the EntityUsedByEntity relationshipsstart entity=node:node_auto_index(NodeType = 'TypeVertex') // currently I only have an index on the node type, but could create one for the entity nodesmatch entity<-[:TypeUsedByType*1..]-entity2 // but how to match only the first one down each path?where has(entity.EntityTypeId) and has(entity2.EntityTypeId)create unique entity<-[:EntityUsedByEntity]-entity2I had no luck with this - it went away and never came back, so I just took a single entity node and changed the start expression to just refer to it. ie start entity=node(99731). Even then it had troubles...I've tried making the length of the relationships unbounded, but in the particular case I'm looking at, that doesn't seem to come back - I can bound it at 11 and it comes back in 17 seconds on my machine (i7 laptop running Windows XP). If I leave it unbound, it doesn't seem to come back.
As mentioned, when I identified my entities I also created a relationship back to the underlying entity node. I tried including that in the match clause, but that didn't seem to make any difference.The problem seems to me that I don't know how many relationships to go down, so I can't use an upper bound. But that then means that cypher will find all the paths before applying the where clause. From my cursory look at the traversal API it seems possible to apply a filter and each step down, but I can't see how to do this in cypher - is it possible?I must say that I'm a bit surprised that Neo4J has trouble with a deep tree like this. Is there something else I should be doing - smarter cypher, different relationships etc.Thanks heaps if you've managed to read this far, I hope this makes enough sense. Any advice/suggestions would be greatly appreciatedBTW I used to have this data in Oracle and tried to find the relationships with connect by queries. It was having trouble (aka not coming back) when I was trying to populate all of the EntityUsedByEntity type relationships (by inserting into an EntityRelationship table) in one operation. But I'm pretty sure it would cope with my single type case that's causing me issues in Neo4J. But I was expecting Neo4J to be better at coping with these heavy relationships. Hopefully there's something reasonably obvious I'm missing here.Ian--
--
--
--
Hi,
Have you looked at Gremlin? It is lazy and has path and tree operators.
Good luck,
Marko.
--