I'm after some database design help, which will need some explanation to set the context.
I'm using Neo4J (1.8 community edition, server mode - not embedded) to store a map of the type relationships within the DLLs deployed within my organisation. I don't think it matters, but they're .NET DLLs, generated from C# code.
I have about 2500 DLLs or so to parse. In the process of parsing I build up a list of all C# types detected, and for each type the list of other types that it uses (by variable reference, method reference or base type reference). Hope that makes sense - I want to end up with a complete tree of type dependencies to allow impact analysis.
Once I have all of the types, certain known types are identified - we call them 'Entity' types. Example entity types are web pages, web service proxies, database commands.
There is then a GUI that sits on top of this information that allows a user to select a known entity and see it's tree of entity dependencies: up the tree to see the entities which use the entity in question (and all the way up the tree, not just one layer); and down the tree to see the entities used by the entity in question (and again further down to see downstream entity dependencies). In this view, the types that are not known entities are not shown, although their existence is still needed to link known entities.
So basically I'm looking at a very deep tree, with certain types of nodes to be shown, but the intervening nodes not shown
I can successfully get the data into the Neo4J database - I manage to get all of the DLL, type and type relationship information into a QuickGraph bidirectional graph which I then save as a GraphML file. Using Neo4jClient's Gremlin support it's then a one step operation to get the data imported into Neo4J via loadGraphML.
Once the data is there, I end up with 120702 different type nodes and 471144 relationships between these types - these relationships I have given the type 'TypeUsedByType'.
From these 120702 type nodes I then manage to identify 6500 types that are my special 'Entity' types.
And then the uncertainty about what approach begins as I'm having trouble getting Neo4J to return the tree (up and down) for a single type.
So far I have given each Type node that refers to an 'Entity' an additional property - the numberic EntityTypeID of the particular entity type (eg WebPage, DBCommand, WebService etc). I have also created a set of nodes for the different entity types and added a relationship between each identified entity and the node for that entity type.
I would then like a set of relationships between the known entities - say 'EntityUsedByEntity'. This would cut out the middle men - the types that come between entities that aren't entities themselves. This is the set of relationships I would like to show in the GUI.
So to identify these EntityUsedByEntity relationships, I need to take each of my 6500 identified entities and go down its 'TypeUsedByType' relationships until a known entity is found. At that stage, I would like to stop (I don't need to go any further) and create an EntityUsedByEntity relationship between the source entity and the found entity.
Which comes to my question: how best to do this in a way that Neo4J can cope with. I should say up front, I'm trying this in cypher, but if gremlin seems a better choice I'll try that. Using the java traversal API doesn't seem an option as I'm accessing Neo4J as a server - and I'm writing in C#.
Ideally I'd like a single cypher query to create all the EntityUsedByEntity relationships
start entity=node:node_auto_index(NodeType = 'TypeVertex') // currently I only have an index on the node type, but could create one for the entity nodes match entity<-[:TypeUsedByType*1..]-entity2 // but how to match only the first one down each path? where has(entity.EntityTypeId) and has(entity2.EntityTypeId) create unique entity<-[:EntityUsedByEntity]-entity2
I had no luck with this - it went away and never came back, so I just took a single entity node and changed the start expression to just refer to it. ie start entity=node(99731). Even then it had troubles...
I've tried making the length of the relationships unbounded, but in the particular case I'm looking at, that doesn't seem to come back - I can bound it at 11 and it comes back in 17 seconds on my machine (i7 laptop running Windows XP). If I leave it unbound, it doesn't seem to come back.
As mentioned, when I identified my entities I also created a relationship back to the underlying entity node. I tried including that in the match clause, but that didn't seem to make any difference.
The problem seems to me that I don't know how many relationships to go down, so I can't use an upper bound. But that then means that cypher will find all the paths before applying the where clause. From my cursory look at the traversal API it seems possible to apply a filter and each step down, but I can't see how to do this in cypher - is it possible?
I must say that I'm a bit surprised that Neo4J has trouble with a deep tree like this. Is there something else I should be doing - smarter cypher, different relationships etc.
Thanks heaps if you've managed to read this far, I hope this makes enough sense. Any advice/suggestions would be greatly appreciated
BTW I used to have this data in Oracle and tried to find the relationships with connect by queries. It was having trouble (aka not coming back) when I was trying to populate all of the EntityUsedByEntity type relationships (by inserting into an EntityRelationship table) in one operation. But I'm pretty sure it would cope with my single type case that's causing me issues in Neo4J. But I was expecting Neo4J to be better at coping with these heavy relationships. Hopefully there's something reasonably obvious I'm missing here.
Ian,
I think we can figure this out. Could you do a samll-ish sample over at
http://console.neo4j.org/usage.html, so we have a dataset to play with?
Then, we could reason about this. To start with, what do you mean by "the
first one down each path"? In your example (in the console) what would that
path be?
> I'm after some database design help, which will need some explanation to
> set the context.
> I'm using Neo4J (1.8 community edition, server mode - not embedded) to
> store a map of the type relationships within the DLLs deployed within my
> organisation. I don't think it matters, but they're .NET DLLs, generated
> from C# code.
> I have about 2500 DLLs or so to parse. In the process of parsing I build
> up a list of all C# types detected, and for each type the list of other
> types that it uses (by variable reference, method reference or base type
> reference). Hope that makes sense - I want to end up with a complete tree
> of type dependencies to allow impact analysis.
> Once I have all of the types, certain known types are identified - we call
> them 'Entity' types. Example entity types are web pages, web service
> proxies, database commands.
> There is then a GUI that sits on top of this information that allows a
> user to select a known entity and see it's tree of entity dependencies: up
> the tree to see the entities which use the entity in question (and all the
> way up the tree, not just one layer); and down the tree to see the entities
> used by the entity in question (and again further down to see downstream
> entity dependencies). In this view, the types that are not known entities
> are not shown, although their existence is still needed to link known
> entities.
> So basically I'm looking at a very deep tree, with certain types of nodes
> to be shown, but the intervening nodes not shown
> I can successfully get the data into the Neo4J database - I manage to get
> all of the DLL, type and type relationship information into a QuickGraph
> bidirectional graph which I then save as a GraphML file. Using
> Neo4jClient's Gremlin support it's then a one step operation to get the
> data imported into Neo4J via loadGraphML.
> Once the data is there, I end up with 120702 different type nodes and
> 471144 relationships between these types - these relationships I have given
> the type 'TypeUsedByType'.
> From these 120702 type nodes I then manage to identify 6500 types that are
> my special 'Entity' types.
> And then the uncertainty about what approach begins as I'm having trouble
> getting Neo4J to return the tree (up and down) for a single type.
> So far I have given each Type node that refers to an 'Entity' an
> additional property - the numberic EntityTypeID of the particular entity
> type (eg WebPage, DBCommand, WebService etc). I have also created a set of
> nodes for the different entity types and added a relationship between each
> identified entity and the node for that entity type.
> I would then like a set of relationships between the known entities - say
> 'EntityUsedByEntity'. This would cut out the middle men - the types that
> come between entities that aren't entities themselves. This is the set of
> relationships I would like to show in the GUI.
> So to identify these EntityUsedByEntity relationships, I need to take each
> of my 6500 identified entities and go down its 'TypeUsedByType'
> relationships until a known entity is found. At that stage, I would like to
> stop (I don't need to go any further) and create an EntityUsedByEntity
> relationship between the source entity and the found entity.
> Which comes to my question: how best to do this in a way that Neo4J can
> cope with. I should say up front, I'm trying this in cypher, but if gremlin
> seems a better choice I'll try that. Using the java traversal API doesn't
> seem an option as I'm accessing Neo4J as a server - and I'm writing in C#.
> Ideally I'd like a single cypher query to create all the
> EntityUsedByEntity relationships
> start entity=node:node_auto_index(NodeType = 'TypeVertex') // currently
> I only have an index on the node type, but could create one for the entity
> nodes
> match entity<-[:TypeUsedByType*1..]-entity2 // but how to match only
> the first one down each path?
> where has(entity.EntityTypeId) and has(entity2.EntityTypeId)
> create unique entity<-[:EntityUsedByEntity]-entity2
> I had no luck with this - it went away and never came back, so I just took
> a single entity node and changed the start expression to just refer to it.
> ie start entity=node(99731). Even then it had troubles...
> I've tried making the length of the relationships unbounded, but in the
> particular case I'm looking at, that doesn't seem to come back - I can
> bound it at 11 and it comes back in 17 seconds on my machine (i7 laptop
> running Windows XP). If I leave it unbound, it doesn't seem to come back.
> As mentioned, when I identified my entities I also created a relationship
> back to the underlying entity node. I tried including that in the match
> clause, but that didn't seem to make any difference.
> The problem seems to me that I don't know how many relationships to go
> down, so I can't use an upper bound. But that then means that cypher will
> find all the paths before applying the where clause. From my cursory look
> at the traversal API it seems possible to apply a filter and each step
> down, but I can't see how to do this in cypher - is it possible?
> I must say that I'm a bit surprised that Neo4J has trouble with a deep
> tree like this. Is there something else I should be doing - smarter cypher,
> different relationships etc.
> Thanks heaps if you've managed to read this far, I hope this makes enough
> sense. Any advice/suggestions would be greatly appreciated
> BTW I used to have this data in Oracle and tried to find the relationships
> with connect by queries. It was having trouble (aka not coming back) when I
> was trying to populate all of the EntityUsedByEntity type relationships (by
> inserting into an EntityRelationship table) in one operation. But I'm
> pretty sure it would cope with my single type case that's causing me issues
> in Neo4J. But I was expecting Neo4J to be better at coping with these heavy
> relationships. Hopefully there's something reasonably obvious I'm missing
> here.
I'll have a look at the best way to get a sample for you - I should be able to choose a subset DLLs that still have the appropriate depth. I haven't played with the console, and my initial data load was done via a bit of gremlin from within neo4jclient so that might take me a little while. There'll also be a timezone lag too - I'm in Australia.
In the meantime, re "the first down each path": my cypher queries return a (large) number of paths. Each of those paths follows a tree of type dependencies. Within each of those paths some of the nodes (ie types) will be my known entities. So far each path, I only want the path down as far as the first matching entity - even though the path could extend much further.
For example given a single path of type usage: *A* ->uses->B->uses->C->uses->*D*->uses->E->uses->*F*->uses->G->uses->H
And say A, D and F are known entity types, when starting from A I would like my query to return the path to D, without the need to get to F. I would then create a direct relationship between A and D, but no direct relationship between A and F. Separately I would create a relationship between D and F.
So far I've only managed to get my queries to return all of the paths in their entirety and then perform the filtering afterwards (using a where clause). This means that all of the paths are hugely deep and have a lot of repeated information.
On Friday, November 16, 2012 2:29:04 AM UTC+11, Peter Neubauer wrote:
> Ian, > I think we can figure this out. Could you do a samll-ish sample over at > http://console.neo4j.org/usage.html, so we have a dataset to play with? > Then, we could reason about this. To start with, what do you mean by "the > first one down each path"? In your example (in the console) what would that > path be?
if you're in Australia and using neo4jclient you should try to chat to Tatham and Romiko which are also in downunder and the authors of that library. They have quite a lot of modeling experience.
> I'll have a look at the best way to get a sample for you - I should be able to choose a subset DLLs that still have the appropriate depth. I haven't played with the console, and my initial data load was done via a bit of gremlin from within neo4jclient so that might take me a little while. There'll also be a timezone lag too - I'm in Australia.
> In the meantime, re "the first down each path": my cypher queries return a (large) number of paths. Each of those paths follows a tree of type dependencies. Within each of those paths some of the nodes (ie types) will be my known entities. So far each path, I only want the path down as far as the first matching entity - even though the path could extend much further.
> For example given a single path of type usage: A->uses->B->uses->C->uses->D->uses->E->uses->F->uses->G->uses->H
> And say A, D and F are known entity types, when starting from A I would like my query to return the path to D, without the need to get to F. I would then create a direct relationship between A and D, but no direct relationship between A and F. Separately I would create a relationship between D and F.
> So far I've only managed to get my queries to return all of the paths in their entirety and then perform the filtering afterwards (using a where clause). This means that all of the paths are hugely deep and have a lot of repeated information.
> Hope that makes it a bit clearer.
> Thanks
> Ian
> On Friday, November 16, 2012 2:29:04 AM UTC+11, Peter Neubauer wrote:
> Ian,
> I think we can figure this out. Could you do a samll-ish sample over at http://console.neo4j.org/usage.html, so we have a dataset to play with? Then, we could reason about this. To start with, what do you mean by "the first one down each path"? In your example (in the console) what would that path be?
I had wondered the same thing, but for now my issue has morphed into more of a cypher question rather than a Neo4jClient issue. But drawing on Tatham and Romiko's modeling experience would be easier given the timezone compatibility - any idea how to get their attention here? Do you know if they cruise these forums much?
On Friday, November 16, 2012 8:14:20 AM UTC+11, Michael Hunger wrote:
> Ian,
> if you're in Australia and using neo4jclient you should try to chat to > Tatham and Romiko which are also in downunder and the authors of that > library. They have quite a lot of modeling experience.
> Michael
> Am 15.11.2012 um 21:53 schrieb Ian Tompsett:
> Thanks Peter
> I'll have a look at the best way to get a sample for you - I should be > able to choose a subset DLLs that still have the appropriate depth. I > haven't played with the console, and my initial data load was done via a > bit of gremlin from within neo4jclient so that might take me a little > while. There'll also be a timezone lag too - I'm in Australia.
> In the meantime, re "the first down each path": my cypher queries return a > (large) number of paths. Each of those paths follows a tree of type > dependencies. Within each of those paths some of the nodes (ie types) will > be my known entities. So far each path, I only want the path down as far as > the first matching entity - even though the path could extend much further.
> For example given a single path of type usage: *A* > ->uses->B->uses->C->uses->*D*->uses->E->uses->*F*->uses->G->uses->H
> And say A, D and F are known entity types, when starting from A I would > like my query to return the path to D, without the need to get to F. I > would then create a direct relationship between A and D, but no direct > relationship between A and F. Separately I would create a relationship > between D and F.
> So far I've only managed to get my queries to return all of the paths in > their entirety and then perform the filtering afterwards (using a where > clause). This means that all of the paths are hugely deep and have a lot of > repeated information.
> Hope that makes it a bit clearer.
> Thanks > Ian > On Friday, November 16, 2012 2:29:04 AM UTC+11, Peter Neubauer wrote:
>> Ian, >> I think we can figure this out. Could you do a samll-ish sample over at >> http://console.neo4j.org/usage.html, so we have a dataset to play with? >> Then, we could reason about this. To start with, what do you mean by "the >> first one down each path"? In your example (in the console) what would that >> path be?
I had to deal with a similar scenario - finding patterns in software dependency graphs that need reasoning about paths (= computation of transitive closure). If you are interested, this has resulted in this tools: http://xplrarc.massey.ac.nz/ , the queries are in the analyse menu. The largest graphs (extracted from the JRE) have a similar size to your graphs.
In my experience the current version of cypher does not cope very well with this (there are some older discussions on this issue in the forum, and a benchmarking project to demonstrate some of the problems: http://code.google.com/p/graph-query-benchmarks/). You might want to try guery as an alternative to query your neo4j database - http://code.google.com/p/gueryframework/ - the latest version has a blueprints adapter. Have a look at the test cases, they contain some examples similar to what you need.
On Friday, 16 November 2012 10:34:26 UTC+13, Ian Tompsett wrote:
> Thanks Michael
> I had wondered the same thing, but for now my issue has morphed into more > of a cypher question rather than a Neo4jClient issue. But drawing on Tatham > and Romiko's modeling experience would be easier given the timezone > compatibility - any idea how to get their attention here? Do you know if > they cruise these forums much?
> Ian
> On Friday, November 16, 2012 8:14:20 AM UTC+11, Michael Hunger wrote:
>> Ian,
>> if you're in Australia and using neo4jclient you should try to chat to >> Tatham and Romiko which are also in downunder and the authors of that >> library. They have quite a lot of modeling experience.
>> Michael
>> Am 15.11.2012 um 21:53 schrieb Ian Tompsett:
>> Thanks Peter
>> I'll have a look at the best way to get a sample for you - I should be >> able to choose a subset DLLs that still have the appropriate depth. I >> haven't played with the console, and my initial data load was done via a >> bit of gremlin from within neo4jclient so that might take me a little >> while. There'll also be a timezone lag too - I'm in Australia.
>> In the meantime, re "the first down each path": my cypher queries return >> a (large) number of paths. Each of those paths follows a tree of type >> dependencies. Within each of those paths some of the nodes (ie types) will >> be my known entities. So far each path, I only want the path down as far as >> the first matching entity - even though the path could extend much further.
>> For example given a single path of type usage: *A* >> ->uses->B->uses->C->uses->*D*->uses->E->uses->*F*->uses->G->uses->H
>> And say A, D and F are known entity types, when starting from A I would >> like my query to return the path to D, without the need to get to F. I >> would then create a direct relationship between A and D, but no direct >> relationship between A and F. Separately I would create a relationship >> between D and F.
>> So far I've only managed to get my queries to return all of the paths in >> their entirety and then perform the filtering afterwards (using a where >> clause). This means that all of the paths are hugely deep and have a lot of >> repeated information.
>> Hope that makes it a bit clearer.
>> Thanks >> Ian >> On Friday, November 16, 2012 2:29:04 AM UTC+11, Peter Neubauer wrote:
>>> Ian, >>> I think we can figure this out. Could you do a samll-ish sample over at >>> http://console.neo4j.org/usage.html, so we have a dataset to play with? >>> Then, we could reason about this. To start with, what do you mean by "the >>> first one down each path"? In your example (in the console) what would that >>> path be?
Not surprisingly this returns queries pretty quickly, but should be enough to get the structure in place. I've added some noise in there to include some paths that shouldn't be followed.
Hopefully it's clear enough - TypeA is the start point (node 4) and it's an entity. It has 3 direct type relationships - to TypeB, Type AC and Type AD. And then there's downstream type relationships, some of which are known entity types (reflected by both an EntityTypeId property, and a relationship back to the EntityType nodes). TypeA has entity relationships to 2 and 3, but not 4 - as that's via TypeAD
So initially what I'd like is the most efficient cypher query for detecting the direct entity relationships for TypeA - EntityType2 and EntityType3 in my example. That is, I'd like to see the following relationships returned
On Friday, 16 November 2012 07:53:37 UTC+11, Ian Tompsett wrote:
> Thanks Peter
> I'll have a look at the best way to get a sample for you - I should be > able to choose a subset DLLs that still have the appropriate depth. I > haven't played with the console, and my initial data load was done via a > bit of gremlin from within neo4jclient so that might take me a little > while. There'll also be a timezone lag too - I'm in Australia.
> In the meantime, re "the first down each path": my cypher queries return a > (large) number of paths. Each of those paths follows a tree of type > dependencies. Within each of those paths some of the nodes (ie types) will > be my known entities. So far each path, I only want the path down as far as > the first matching entity - even though the path could extend much further.
> For example given a single path of type usage: *A* > ->uses->B->uses->C->uses->*D*->uses->E->uses->*F*->uses->G->uses->H
> And say A, D and F are known entity types, when starting from A I would > like my query to return the path to D, without the need to get to F. I > would then create a direct relationship between A and D, but no direct > relationship between A and F. Separately I would create a relationship > between D and F.
> So far I've only managed to get my queries to return all of the paths in > their entirety and then perform the filtering afterwards (using a where > clause). This means that all of the paths are hugely deep and have a lot of > repeated information.
> Hope that makes it a bit clearer.
> Thanks > Ian > On Friday, November 16, 2012 2:29:04 AM UTC+11, Peter Neubauer wrote:
>> Ian, >> I think we can figure this out. Could you do a samll-ish sample over at >> http://console.neo4j.org/usage.html, so we have a dataset to play with? >> Then, we could reason about this. To start with, what do you mean by "the >> first one down each path"? In your example (in the console) what would that >> path be?
I don't have a direct answer to the situation described here, though I would like to add my two cents anyway.
Thinking about this problem yesterday, it occurred to me that a traversal description over a graph can be viewed as a formal language expression, where the alphabet of the language consists of RelationshipTypes.
The following operations can easily be achieved:
concatenation: creation of paths union: choice between two traveral descriptions kleene star: repeated traversal description
These three operations give a regular language, and with that have the same expressive power as a regular expression.
It is possible to add intersection and complement, resulting in a visibly pushdown language. I am not an expert on Cypher or Gremlin, but as far as I am concerned, all these operations can be supported in a query language. For the content management system multispective.com, I have implemented these operations, giving a reasonably powerful language for querying a Neo4j database.
-----Original Message-----
From: Ian Tompsett <iantompset...@gmail.com>
Sender: neo4j@googlegroups.com
Date: Thu, 15 Nov 2012 21:09:05 To: <neo4j@googlegroups.com>
Reply-To: neo4j@googlegroups.com
Subject: Re: [Neo4j] Need help with designing deep tree of type relationships
- can Neo4J cope?
Not surprisingly this returns queries pretty quickly, but should be enough to get the structure in place. I've added some noise in there to include some paths that shouldn't be followed.
Hopefully it's clear enough - TypeA is the start point (node 4) and it's an entity. It has 3 direct type relationships - to TypeB, Type AC and Type AD. And then there's downstream type relationships, some of which are known entity types (reflected by both an EntityTypeId property, and a relationship back to the EntityType nodes). TypeA has entity relationships to 2 and 3, but not 4 - as that's via TypeAD
So initially what I'd like is the most efficient cypher query for detecting the direct entity relationships for TypeA - EntityType2 and EntityType3 in my example. That is, I'd like to see the following relationships returned
TypeA-TypeD
TypeA-TypeAC
TypeA-TypeAD
Thanks again for your help
On Friday, 16 November 2012 07:53:37 UTC+11, Ian Tompsett wrote:
> Thanks Peter
> I'll have a look at the best way to get a sample for you - I should be > able to choose a subset DLLs that still have the appropriate depth. I > haven't played with the console, and my initial data load was done via a > bit of gremlin from within neo4jclient so that might take me a little > while. There'll also be a timezone lag too - I'm in Australia.
> In the meantime, re "the first down each path": my cypher queries return a > (large) number of paths. Each of those paths follows a tree of type > dependencies. Within each of those paths some of the nodes (ie types) will > be my known entities. So far each path, I only want the path down as far as > the first matching entity - even though the path could extend much further.
> For example given a single path of type usage: *A*
> ->uses->B->uses->C->uses->*D*->uses->E->uses->*F*->uses->G->uses->H
> And say A, D and F are known entity types, when starting from A I would > like my query to return the path to D, without the need to get to F. I > would then create a direct relationship between A and D, but no direct > relationship between A and F. Separately I would create a relationship > between D and F.
> So far I've only managed to get my queries to return all of the paths in > their entirety and then perform the filtering afterwards (using a where > clause). This means that all of the paths are hugely deep and have a lot of > repeated information.
> Hope that makes it a bit clearer.
> Thanks
> Ian
> On Friday, November 16, 2012 2:29:04 AM UTC+11, Peter Neubauer wrote:
>> Ian,
>> I think we can figure this out. Could you do a samll-ish sample over at >> http://console.neo4j.org/usage.html, so we have a dataset to play with? >> Then, we could reason about this. To start with, what do you mean by "the >> first one down each path"? In your example (in the console) what would that >> path be?
> Not surprisingly this returns queries pretty quickly, but should be enough
> to get the structure in place. I've added some noise in there to include
> some paths that shouldn't be followed.
> Hopefully it's clear enough - TypeA is the start point (node 4) and it's
> an entity. It has 3 direct type relationships - to TypeB, Type AC and Type
> AD. And then there's downstream type relationships, some of which are known
> entity types (reflected by both an EntityTypeId property, and a
> relationship back to the EntityType nodes). TypeA has entity relationships
> to 2 and 3, but not 4 - as that's via TypeAD
> So initially what I'd like is the most efficient cypher query for
> detecting the direct entity relationships for TypeA - EntityType2 and
> EntityType3 in my example. That is, I'd like to see the following
> relationships returned
> TypeA-TypeD
> TypeA-TypeAC
> TypeA-TypeAD
> Thanks again for your help
> On Friday, 16 November 2012 07:53:37 UTC+11, Ian Tompsett wrote:
>> Thanks Peter
>> I'll have a look at the best way to get a sample for you - I should be
>> able to choose a subset DLLs that still have the appropriate depth. I
>> haven't played with the console, and my initial data load was done via a
>> bit of gremlin from within neo4jclient so that might take me a little
>> while. There'll also be a timezone lag too - I'm in Australia.
>> In the meantime, re "the first down each path": my cypher queries return
>> a (large) number of paths. Each of those paths follows a tree of type
>> dependencies. Within each of those paths some of the nodes (ie types) will
>> be my known entities. So far each path, I only want the path down as far as
>> the first matching entity - even though the path could extend much further.
>> For example given a single path of type usage: *A*
>> ->uses->B->uses->C->uses->*D*-**>uses->E->uses->*F*->uses->G->**uses->H
>> And say A, D and F are known entity types, when starting from A I would
>> like my query to return the path to D, without the need to get to F. I
>> would then create a direct relationship between A and D, but no direct
>> relationship between A and F. Separately I would create a relationship
>> between D and F.
>> So far I've only managed to get my queries to return all of the paths in
>> their entirety and then perform the filtering afterwards (using a where
>> clause). This means that all of the paths are hugely deep and have a lot of
>> repeated information.
>> Hope that makes it a bit clearer.
>> Thanks
>> Ian
>> On Friday, November 16, 2012 2:29:04 AM UTC+11, Peter Neubauer wrote:
>>> Ian,
>>> I think we can figure this out. Could you do a samll-ish sample over at
>>> http://console.neo4j.org/**usage.html<http://console.neo4j.org/usage.html>,
>>> so we have a dataset to play with? Then, we could reason about this. To
>>> start with, what do you mean by "the first one down each path"? In your
>>> example (in the console) what would that path be?
What I'm after is a reusable way of only picking up the types that are entities that are directly related to TypeA. By directly related, I mean there are no other entities between the type in question and TypeA.
If you try the following
start t1=node(5) match p=t1-[:TypeUsedByType*1..]->t2-[:Entity]->e1 return t1.name, e1.name,t2.name, length(p) as l, extract (n in nodes(p) : n.name)
You'll see the TypeAF appears in the list. You've managed to filter out by basically knowing the results up front (ie the Limit 3). What I'd like to do is filter it out by detecting that there is another entity (TypeAD) between TypeAF and TypeA.
In the cypher query I've played around with trying a second match clause to exclude t2 values that satisfy the first match, but do not have an entity between them and t1. So that would exclude TypeAF in our example.
Hope that makes sense? Any ideas how to cypher that? Something like
Something like: start t1=node(5) match p=t1-[:TypeUsedByType*1..]->t2-[:Entity]->e1 , p2=t2<-[:TypeUsedByType*1..]-t3<-[:TypeUsedByType]-t1, p3=t3-[:Entity]-e2 where p1 exists but p2 and p3 don't??? return t1.name, e1.name,t2.name, length(p) as l, extract (n in nodes(p) : n.name)
I think I'm looking for an outer join that fails sort of concept here, to use SQL parlance.
>> Not surprisingly this returns queries pretty quickly, but should be >> enough to get the structure in place. I've added some noise in there to >> include some paths that shouldn't be followed.
>> Hopefully it's clear enough - TypeA is the start point (node 4) and it's >> an entity. It has 3 direct type relationships - to TypeB, Type AC and Type >> AD. And then there's downstream type relationships, some of which are known >> entity types (reflected by both an EntityTypeId property, and a >> relationship back to the EntityType nodes). TypeA has entity relationships >> to 2 and 3, but not 4 - as that's via TypeAD
>> So initially what I'd like is the most efficient cypher query for >> detecting the direct entity relationships for TypeA - EntityType2 and >> EntityType3 in my example. That is, I'd like to see the following >> relationships returned
>> TypeA-TypeD >> TypeA-TypeAC >> TypeA-TypeAD
>> Thanks again for your help
>> On Friday, 16 November 2012 07:53:37 UTC+11, Ian Tompsett wrote:
>>> Thanks Peter
>>> I'll have a look at the best way to get a sample for you - I should be >>> able to choose a subset DLLs that still have the appropriate depth. I >>> haven't played with the console, and my initial data load was done via a >>> bit of gremlin from within neo4jclient so that might take me a little >>> while. There'll also be a timezone lag too - I'm in Australia.
>>> In the meantime, re "the first down each path": my cypher queries return >>> a (large) number of paths. Each of those paths follows a tree of type >>> dependencies. Within each of those paths some of the nodes (ie types) will >>> be my known entities. So far each path, I only want the path down as far as >>> the first matching entity - even though the path could extend much further.
>>> For example given a single path of type usage: *A* >>> ->uses->B->uses->C->uses->*D*-**>uses->E->uses->*F*->uses->G->**uses->H
>>> And say A, D and F are known entity types, when starting from A I would >>> like my query to return the path to D, without the need to get to F. I >>> would then create a direct relationship between A and D, but no direct >>> relationship between A and F. Separately I would create a relationship >>> between D and F.
>>> So far I've only managed to get my queries to return all of the paths in >>> their entirety and then perform the filtering afterwards (using a where >>> clause). This means that all of the paths are hugely deep and have a lot of >>> repeated information.
>>> Hope that makes it a bit clearer.
>>> Thanks >>> Ian >>> On Friday, November 16, 2012 2:29:04 AM UTC+11, Peter Neubauer wrote:
>>>> Ian, >>>> I think we can figure this out. Could you do a samll-ish sample over at >>>> http://console.neo4j.org/**usage.html<http://console.neo4j.org/usage.html>, >>>> so we have a dataset to play with? Then, we could reason about this. To >>>> start with, what do you mean by "the first one down each path"? In your >>>> example (in the console) what would that path be?
Have you had a chance to look at this? I'm getting concerned that the answer to my initial question is "no" - especially given Jens' comment.
Whilst I do have a deep hierarchy to contend with, I don't need to look at it all, I'd just like to find a way of getting my cypher query to understand how early it can stop.
On Monday, November 19, 2012 1:22:54 AM UTC+11, Ian Tompsett wrote:
> Thanks Peter
> What I'm after is a reusable way of only picking up the types that are > entities that are directly related to TypeA. By directly related, I mean > there are no other entities between the type in question and TypeA.
> If you try the following
> start t1=node(5) > match p=t1-[:TypeUsedByType*1..]->t2-[:Entity]->e1 > return t1.name, e1.name,t2.name, length(p) as l, extract (n in nodes(p) : > n.name)
> You'll see the TypeAF appears in the list. You've managed to filter out by > basically knowing the results up front (ie the Limit 3). What I'd like to > do is filter it out by detecting that there is another entity (TypeAD) > between TypeAF and TypeA.
> In the cypher query I've played around with trying a second match clause > to exclude t2 values that satisfy the first match, but do not have an > entity between them and t1. So that would exclude TypeAF in our example.
> Hope that makes sense? Any ideas how to cypher that? Something like
> Something like: > start t1=node(5) > match p=t1-[:TypeUsedByType*1..]->t2-[:Entity]->e1 , > p2=t2<-[:TypeUsedByType*1..]-t3<-[:TypeUsedByType]-t1, > p3=t3-[:Entity]-e2 > where p1 exists but p2 and p3 don't??? > return t1.name, e1.name,t2.name, length(p) as l, extract (n in nodes(p) : > n.name)
> I think I'm looking for an outer join that fails sort of concept here, to > use SQL parlance.
> Thanks
> On Sunday, November 18, 2012 2:21:14 AM UTC+11, Peter Neubauer wrote:
>> Ian, >> I am not sure I am totally with you, but this query gives you the results >> you want:
>> start t1=node(5) >> match p=t1-[:TypeUsedByType*1..]-t2-[:Entity]->e1 >> return t1.name, e1.name,t2.name, length(p) as l >> order by l asc >> limit 3
>>> Not surprisingly this returns queries pretty quickly, but should be >>> enough to get the structure in place. I've added some noise in there to >>> include some paths that shouldn't be followed.
>>> Hopefully it's clear enough - TypeA is the start point (node 4) and it's >>> an entity. It has 3 direct type relationships - to TypeB, Type AC and Type >>> AD. And then there's downstream type relationships, some of which are known >>> entity types (reflected by both an EntityTypeId property, and a >>> relationship back to the EntityType nodes). TypeA has entity relationships >>> to 2 and 3, but not 4 - as that's via TypeAD
>>> So initially what I'd like is the most efficient cypher query for >>> detecting the direct entity relationships for TypeA - EntityType2 and >>> EntityType3 in my example. That is, I'd like to see the following >>> relationships returned
>>> TypeA-TypeD >>> TypeA-TypeAC >>> TypeA-TypeAD
>>> Thanks again for your help
>>> On Friday, 16 November 2012 07:53:37 UTC+11, Ian Tompsett wrote:
>>>> Thanks Peter
>>>> I'll have a look at the best way to get a sample for you - I should be >>>> able to choose a subset DLLs that still have the appropriate depth. I >>>> haven't played with the console, and my initial data load was done via a >>>> bit of gremlin from within neo4jclient so that might take me a little >>>> while. There'll also be a timezone lag too - I'm in Australia.
>>>> In the meantime, re "the first down each path": my cypher queries >>>> return a (large) number of paths. Each of those paths follows a tree of >>>> type dependencies. Within each of those paths some of the nodes (ie types) >>>> will be my known entities. So far each path, I only want the path down as >>>> far as the first matching entity - even though the path could extend much >>>> further.
>>>> For example given a single path of type usage: *A* >>>> ->uses->B->uses->C->uses->*D*-**>uses->E->uses->*F*->uses->G->**uses->H
>>>> And say A, D and F are known entity types, when starting from A I would >>>> like my query to return the path to D, without the need to get to F. I >>>> would then create a direct relationship between A and D, but no direct >>>> relationship between A and F. Separately I would create a relationship >>>> between D and F.
>>>> So far I've only managed to get my queries to return all of the paths >>>> in their entirety and then perform the filtering afterwards (using a where >>>> clause). This means that all of the paths are hugely deep and have a lot of >>>> repeated information.
>>>> Hope that makes it a bit clearer.
>>>> Thanks >>>> Ian >>>> On Friday, November 16, 2012 2:29:04 AM UTC+11, Peter Neubauer wrote:
>>>>> Ian, >>>>> I think we can figure this out. Could you do a samll-ish sample over >>>>> at http://console.neo4j.org/**usage.html<http://console.neo4j.org/usage.html>, >>>>> so we have a dataset to play with? Then, we could reason about this. To >>>>> start with, what do you mean by "the first one down each path"? In your >>>>> example (in the console) what would that path be?
I think you want to filter paths, depending on the nodes that are in there:
something like this:
start t1=node(5) match p=t1-[:TypeUsedByType*1..]->t2,t2-[:Entity]->e1
where SINGLE(n in tail(nodes(p)) where has(n.EntityTypeId))
return t1.name, e1.name,t2.name, length(p) as l, extract (n in nodes(p) : n.name)
I check if there is only a single node (aka t2) that has an EntityType property
tail(nodes(p)) because otherwise t1 would also be in there.
and e1 also has a property EntityType so I split the path :)
> Have you had a chance to look at this? I'm getting concerned that the answer to my initial question is "no" - especially given Jens' comment.
> Whilst I do have a deep hierarchy to contend with, I don't need to look at it all, I'd just like to find a way of getting my cypher query to understand how early it can stop.
> Unless my only option is the traversal API?
> Thanks
> Ian
> On Monday, November 19, 2012 1:22:54 AM UTC+11, Ian Tompsett wrote:
> Thanks Peter
> What I'm after is a reusable way of only picking up the types that are entities that are directly related to TypeA. By directly related, I mean there are no other entities between the type in question and TypeA.
> If you try the following
> start t1=node(5) > match p=t1-[:TypeUsedByType*1..]->t2-[:Entity]->e1 > return t1.name, e1.name,t2.name, length(p) as l, extract (n in nodes(p) : n.name)
> You'll see the TypeAF appears in the list. You've managed to filter out by basically knowing the results up front (ie the Limit 3). What I'd like to do is filter it out by detecting that there is another entity (TypeAD) between TypeAF and TypeA.
> In the cypher query I've played around with trying a second match clause to exclude t2 values that satisfy the first match, but do not have an entity between them and t1. So that would exclude TypeAF in our example.
> Hope that makes sense? Any ideas how to cypher that? Something like
> Something like:
> start t1=node(5) > match p=t1-[:TypeUsedByType*1..]->t2-[:Entity]->e1 ,
> p2=t2<-[:TypeUsedByType*1..]-t3<-[:TypeUsedByType]-t1, p3=t3-[:Entity]-e2
> where p1 exists but p2 and p3 don't???
> return t1.name, e1.name,t2.name, length(p) as l, extract (n in nodes(p) : n.name)
> I think I'm looking for an outer join that fails sort of concept here, to use SQL parlance.
> Thanks
> On Sunday, November 18, 2012 2:21:14 AM UTC+11, Peter Neubauer wrote:
> Ian,
> I am not sure I am totally with you, but this query gives you the results you want:
> start t1=node(5) > match p=t1-[:TypeUsedByType*1..]-t2-[:Entity]->e1 > return t1.name, e1.name,t2.name, length(p) as l > order by l asc > limit 3
> On Fri, Nov 16, 2012 at 6:09 AM, Ian Tompsett <iantom...@gmail.com> wrote:
> OK, I think I've created a suitable (albeit very small) test case at http://tinyurl.com/chyu863
> Not surprisingly this returns queries pretty quickly, but should be enough to get the structure in place. I've added some noise in there to include some paths that shouldn't be followed.
> Hopefully it's clear enough - TypeA is the start point (node 4) and it's an entity. It has 3 direct type relationships - to TypeB, Type AC and Type AD. And then there's downstream type relationships, some of which are known entity types (reflected by both an EntityTypeId property, and a relationship back to the EntityType nodes). TypeA has entity relationships to 2 and 3, but not 4 - as that's via TypeAD
> So initially what I'd like is the most efficient cypher query for detecting the direct entity relationships for TypeA - EntityType2 and EntityType3 in my example. That is, I'd like to see the following relationships returned
> TypeA-TypeD
> TypeA-TypeAC
> TypeA-TypeAD
> Thanks again for your help
> On Friday, 16 November 2012 07:53:37 UTC+11, Ian Tompsett wrote:
> Thanks Peter
> I'll have a look at the best way to get a sample for you - I should be able to choose a subset DLLs that still have the appropriate depth. I haven't played with the console, and my initial data load was done via a bit of gremlin from within neo4jclient so that might take me a little while. There'll also be a timezone lag too - I'm in Australia.
> In the meantime, re "the first down each path": my cypher queries return a (large) number of paths. Each of those paths follows a tree of type dependencies. Within each of those paths some of the nodes (ie types) will be my known entities. So far each path, I only want the path down as far as the first matching entity - even though the path could extend much further.
> For example given a single path of type usage: A->uses->B->uses->C->uses->D->uses->E->uses->F->uses->G->uses->H
> And say A, D and F are known entity types, when starting from A I would like my query to return the path to D, without the need to get to F. I would then create a direct relationship between A and D, but no direct relationship between A and F. Separately I would create a relationship between D and F.
> So far I've only managed to get my queries to return all of the paths in their entirety and then perform the filtering afterwards (using a where clause). This means that all of the paths are hugely deep and have a lot of repeated information.
> Hope that makes it a bit clearer.
> Thanks
> Ian
> On Friday, November 16, 2012 2:29:04 AM UTC+11, Peter Neubauer wrote:
> Ian,
> I think we can figure this out. Could you do a samll-ish sample over at http://console.neo4j.org/usage.html, so we have a dataset to play with? Then, we could reason about this. To start with, what do you mean by "the first one down each path"? In your example (in the console) what would that path be?
> Have you had a chance to look at this? I'm getting concerned that the
> answer to my initial question is "no" - especially given Jens' comment.
> Whilst I do have a deep hierarchy to contend with, I don't need to look at
> it all, I'd just like to find a way of getting my cypher query to
> understand how early it can stop.
> Unless my only option is the traversal API?
> Thanks
> Ian
> On Monday, November 19, 2012 1:22:54 AM UTC+11, Ian Tompsett wrote:
>> Thanks Peter
>> What I'm after is a reusable way of only picking up the types that are
>> entities that are directly related to TypeA. By directly related, I mean
>> there are no other entities between the type in question and TypeA.
>> If you try the following
>> start t1=node(5)
>> match p=t1-[:TypeUsedByType*1..]->**t2-[:Entity]->e1
>> return t1.name, e1.name,t2.name, length(p) as l, extract (n in nodes(p)
>> : n.name)
>> You'll see the TypeAF appears in the list. You've managed to filter out
>> by basically knowing the results up front (ie the Limit 3). What I'd like
>> to do is filter it out by detecting that there is another entity (TypeAD)
>> between TypeAF and TypeA.
>> In the cypher query I've played around with trying a second match clause
>> to exclude t2 values that satisfy the first match, but do not have an
>> entity between them and t1. So that would exclude TypeAF in our example.
>> Hope that makes sense? Any ideas how to cypher that? Something like
>> Something like:
>> start t1=node(5)
>> match p=t1-[:TypeUsedByType*1..]->**t2-[:Entity]->e1 ,
>> p2=t2<-[:TypeUsedByType*1..]-**t3<-[:TypeUsedByType]-t1,
>> p3=t3-[:Entity]-e2
>> where p1 exists but p2 and p3 don't???
>> return t1.name, e1.name,t2.name, length(p) as l, extract (n in nodes(p)
>> : n.name)
>> I think I'm looking for an outer join that fails sort of concept here, to
>> use SQL parlance.
>> Thanks
>> On Sunday, November 18, 2012 2:21:14 AM UTC+11, Peter Neubauer wrote:
>>> Ian,
>>> I am not sure I am totally with you, but this query gives you the
>>> results you want:
>>> start t1=node(5)
>>> match p=t1-[:TypeUsedByType*1..]-t2-**[:Entity]->e1
>>> return t1.name, e1.name,t2.name, length(p) as l
>>> order by l asc
>>> limit 3
>>> On Fri, Nov 16, 2012 at 6:09 AM, Ian Tompsett <iantom...@gmail.com>wrote:
>>>> OK, I think I've created a suitable (albeit very small) test case at
>>>> http://tinyurl.com/chyu863
>>>> Not surprisingly this returns queries pretty quickly, but should be
>>>> enough to get the structure in place. I've added some noise in there to
>>>> include some paths that shouldn't be followed.
>>>> Hopefully it's clear enough - TypeA is the start point (node 4) and
>>>> it's an entity. It has 3 direct type relationships - to TypeB, Type AC and
>>>> Type AD. And then there's downstream type relationships, some of which are
>>>> known entity types (reflected by both an EntityTypeId property, and a
>>>> relationship back to the EntityType nodes). TypeA has entity relationships
>>>> to 2 and 3, but not 4 - as that's via TypeAD
>>>> So initially what I'd like is the most efficient cypher query for
>>>> detecting the direct entity relationships for TypeA - EntityType2 and
>>>> EntityType3 in my example. That is, I'd like to see the following
>>>> relationships returned
>>>> On Friday, 16 November 2012 07:53:37 UTC+11, Ian Tompsett wrote:
>>>>> Thanks Peter
>>>>> I'll have a look at the best way to get a sample for you - I should be
>>>>> able to choose a subset DLLs that still have the appropriate depth. I
>>>>> haven't played with the console, and my initial data load was done via a
>>>>> bit of gremlin from within neo4jclient so that might take me a little
>>>>> while. There'll also be a timezone lag too - I'm in Australia.
>>>>> In the meantime, re "the first down each path": my cypher queries
>>>>> return a (large) number of paths. Each of those paths follows a tree of
>>>>> type dependencies. Within each of those paths some of the nodes (ie types)
>>>>> will be my known entities. So far each path, I only want the path down as
>>>>> far as the first matching entity - even though the path could extend much
>>>>> further.
>>>>> For example given a single path of type usage: *A*
>>>>> ->uses->B->uses->C->uses->*D*-****>uses->E->uses->*F*->uses->G->**us**
>>>>> es->H
>>>>> And say A, D and F are known entity types, when starting from A I
>>>>> would like my query to return the path to D, without the need to get to F.
>>>>> I would then create a direct relationship between A and D, but no direct
>>>>> relationship between A and F. Separately I would create a relationship
>>>>> between D and F.
>>>>> So far I've only managed to get my queries to return all of the paths
>>>>> in their entirety and then perform the filtering afterwards (using a where
>>>>> clause). This means that all of the paths are hugely deep and have a lot of
>>>>> repeated information.
>>>>> Hope that makes it a bit clearer.
>>>>> Thanks
>>>>> Ian
>>>>> On Friday, November 16, 2012 2:29:04 AM UTC+11, Peter Neubauer wrote:
>>>>>> Ian,
>>>>>> I think we can figure this out. Could you do a samll-ish sample over
>>>>>> at http://console.neo4j.org/**us**age.html<http://console.neo4j.org/usage.html>,
>>>>>> so we have a dataset to play with? Then, we could reason about this. To
>>>>>> start with, what do you mean by "the first one down each path"? In your
>>>>>> example (in the console) what would that path be?
That looks promising, although I haven't had a chance to try it with my real set of data (I'm not working today) to guage performance. I'll let you know.
On Wednesday, 21 November 2012 10:22:32 UTC+11, Michael Hunger wrote:
> Ian,
> I think you want to filter paths, depending on the nodes that are in there:
> something like this:
> start t1=node(5) > match p=t1-[:TypeUsedByType*1..]->t2,t2-[:Entity]->e1
> where SINGLE(n in tail(nodes(p)) > where has(n.EntityTypeId))
> return t1.name, e1.name,t2.name, length(p) as l, extract (n in nodes(p) : > n.name)
> I check if there is only a single node (aka t2) that has an EntityType > property > tail(nodes(p)) because otherwise t1 would also be in there. > and e1 also has a property EntityType so I split the path :)
> Have you had a chance to look at this? I'm getting concerned that the > answer to my initial question is "no" - especially given Jens' comment.
> Whilst I do have a deep hierarchy to contend with, I don't need to look at > it all, I'd just like to find a way of getting my cypher query to > understand how early it can stop.
> Unless my only option is the traversal API?
> Thanks > Ian
> On Monday, November 19, 2012 1:22:54 AM UTC+11, Ian Tompsett wrote:
>> Thanks Peter
>> What I'm after is a reusable way of only picking up the types that are >> entities that are directly related to TypeA. By directly related, I mean >> there are no other entities between the type in question and TypeA.
>> If you try the following
>> start t1=node(5) >> match p=t1-[:TypeUsedByType*1..]->t2-[:Entity]->e1 >> return t1.name, e1.name,t2.name, length(p) as l, extract (n in nodes(p) >> : n.name)
>> You'll see the TypeAF appears in the list. You've managed to filter out >> by basically knowing the results up front (ie the Limit 3). What I'd like >> to do is filter it out by detecting that there is another entity (TypeAD) >> between TypeAF and TypeA.
>> In the cypher query I've played around with trying a second match clause >> to exclude t2 values that satisfy the first match, but do not have an >> entity between them and t1. So that would exclude TypeAF in our example.
>> Hope that makes sense? Any ideas how to cypher that? Something like
>> Something like: >> start t1=node(5) >> match p=t1-[:TypeUsedByType*1..]->t2-[:Entity]->e1 , >> p2=t2<-[:TypeUsedByType*1..]-t3<-[:TypeUsedByType]-t1, >> p3=t3-[:Entity]-e2 >> where p1 exists but p2 and p3 don't??? >> return t1.name, e1.name,t2.name, length(p) as l, extract (n in nodes(p) >> : n.name)
>> I think I'm looking for an outer join that fails sort of concept here, to >> use SQL parlance.
>> Thanks
>> On Sunday, November 18, 2012 2:21:14 AM UTC+11, Peter Neubauer wrote:
>>> Ian, >>> I am not sure I am totally with you, but this query gives you the >>> results you want:
>>> start t1=node(5) >>> match p=t1-[:TypeUsedByType*1..]-t2-[:Entity]->e1 >>> return t1.name, e1.name,t2.name, length(p) as l >>> order by l asc >>> limit 3
>>> On Fri, Nov 16, 2012 at 6:09 AM, Ian Tompsett <iantom...@gmail.com>wrote:
>>>> OK, I think I've created a suitable (albeit very small) test case at >>>> http://tinyurl.com/chyu863
>>>> Not surprisingly this returns queries pretty quickly, but should be >>>> enough to get the structure in place. I've added some noise in there to >>>> include some paths that shouldn't be followed.
>>>> Hopefully it's clear enough - TypeA is the start point (node 4) and >>>> it's an entity. It has 3 direct type relationships - to TypeB, Type AC and >>>> Type AD. And then there's downstream type relationships, some of which are >>>> known entity types (reflected by both an EntityTypeId property, and a >>>> relationship back to the EntityType nodes). TypeA has entity relationships >>>> to 2 and 3, but not 4 - as that's via TypeAD
>>>> So initially what I'd like is the most efficient cypher query for >>>> detecting the direct entity relationships for TypeA - EntityType2 and >>>> EntityType3 in my example. That is, I'd like to see the following >>>> relationships returned
>>>> On Friday, 16 November 2012 07:53:37 UTC+11, Ian Tompsett wrote:
>>>>> Thanks Peter
>>>>> I'll have a look at the best way to get a sample for you - I should be >>>>> able to choose a subset DLLs that still have the appropriate depth. I >>>>> haven't played with the console, and my initial data load was done via a >>>>> bit of gremlin from within neo4jclient so that might take me a little >>>>> while. There'll also be a timezone lag too - I'm in Australia.
>>>>> In the meantime, re "the first down each path": my cypher queries >>>>> return a (large) number of paths. Each of those paths follows a tree of >>>>> type dependencies. Within each of those paths some of the nodes (ie types) >>>>> will be my known entities. So far each path, I only want the path down as >>>>> far as the first matching entity - even though the path could extend much >>>>> further.
>>>>> For example given a single path of type usage: *A* >>>>> ->uses->B->uses->C->uses->*D*-**>uses->E->uses->*F*->uses->G->** >>>>> uses->H
>>>>> And say A, D and F are known entity types, when starting from A I >>>>> would like my query to return the path to D, without the need to get to F. >>>>> I would then create a direct relationship between A and D, but no direct >>>>> relationship between A and F. Separately I would create a relationship >>>>> between D and F.
>>>>> So far I've only managed to get my queries to return all of the paths >>>>> in their entirety and then perform the filtering afterwards (using a where >>>>> clause). This means that all of the paths are hugely deep and have a lot of >>>>> repeated information.
>>>>> Hope that makes it a bit clearer.
>>>>> Thanks >>>>> Ian >>>>> On Friday, November 16, 2012 2:29:04 AM UTC+11, Peter Neubauer wrote:
>>>>>> Ian, >>>>>> I think we can figure this out. Could you do a samll-ish sample over >>>>>> at http://console.neo4j.org/**usage.html<http://console.neo4j.org/usage.html>, >>>>>> so we have a dataset to play with? Then, we could reason about this. To >>>>>> start with, what do you mean by "the first one down each path"? In your >>>>>> example (in the console) what would that path be?
Firstly, apologies if my reply gets duplicated - I posted a reply (or at least thought I did) but it doesn't seem to have appeared.
I had a little look at gremlin - enough to think it has potential. But it's a bit foreign to me so I haven't quite got my head around how to try it. Don't suppose you've any examples doing something similar that you could point me at? (Gremlin isn't the only lazy one).
On Wednesday, 21 November 2012 09:05:44 UTC+11, Marko Rodriguez wrote:
> Hi,
> Have you looked at Gremlin? It is lazy and has path and tree operators.
> Good luck, > Marko.
> http://markorodriguez.com > On Nov 20, 2012 3:03 PM, "Ian Tompsett" <iantom...@gmail.com <javascript:>> > wrote:
>> HI Peter (or anyone else..)
>> Have you had a chance to look at this? I'm getting concerned that the >> answer to my initial question is "no" - especially given Jens' comment.
>> Whilst I do have a deep hierarchy to contend with, I don't need to look >> at it all, I'd just like to find a way of getting my cypher query to >> understand how early it can stop.
>> Unless my only option is the traversal API?
>> Thanks >> Ian
>> On Monday, November 19, 2012 1:22:54 AM UTC+11, Ian Tompsett wrote:
>>> Thanks Peter
>>> What I'm after is a reusable way of only picking up the types that are >>> entities that are directly related to TypeA. By directly related, I mean >>> there are no other entities between the type in question and TypeA.
>>> If you try the following
>>> start t1=node(5) >>> match p=t1-[:TypeUsedByType*1..]->**t2-[:Entity]->e1 >>> return t1.name, e1.name,t2.name, length(p) as l, extract (n in nodes(p) >>> : n.name)
>>> You'll see the TypeAF appears in the list. You've managed to filter out >>> by basically knowing the results up front (ie the Limit 3). What I'd like >>> to do is filter it out by detecting that there is another entity (TypeAD) >>> between TypeAF and TypeA.
>>> In the cypher query I've played around with trying a second match clause >>> to exclude t2 values that satisfy the first match, but do not have an >>> entity between them and t1. So that would exclude TypeAF in our example.
>>> Hope that makes sense? Any ideas how to cypher that? Something like
>>> Something like: >>> start t1=node(5) >>> match p=t1-[:TypeUsedByType*1..]->**t2-[:Entity]->e1 , >>> p2=t2<-[:TypeUsedByType*1..]-**t3<-[:TypeUsedByType]-t1, >>> p3=t3-[:Entity]-e2 >>> where p1 exists but p2 and p3 don't??? >>> return t1.name, e1.name,t2.name, length(p) as l, extract (n in nodes(p) >>> : n.name)
>>> I think I'm looking for an outer join that fails sort of concept here, >>> to use SQL parlance.
>>> Thanks
>>> On Sunday, November 18, 2012 2:21:14 AM UTC+11, Peter Neubauer wrote:
>>>> Ian, >>>> I am not sure I am totally with you, but this query gives you the >>>> results you want:
>>>> start t1=node(5) >>>> match p=t1-[:TypeUsedByType*1..]-t2-**[:Entity]->e1 >>>> return t1.name, e1.name,t2.name, length(p) as l >>>> order by l asc >>>> limit 3
>>>> On Fri, Nov 16, 2012 at 6:09 AM, Ian Tompsett <iantom...@gmail.com>wrote:
>>>>> OK, I think I've created a suitable (albeit very small) test case at >>>>> http://tinyurl.com/chyu863
>>>>> Not surprisingly this returns queries pretty quickly, but should be >>>>> enough to get the structure in place. I've added some noise in there to >>>>> include some paths that shouldn't be followed.
>>>>> Hopefully it's clear enough - TypeA is the start point (node 4) and >>>>> it's an entity. It has 3 direct type relationships - to TypeB, Type AC and >>>>> Type AD. And then there's downstream type relationships, some of which are >>>>> known entity types (reflected by both an EntityTypeId property, and a >>>>> relationship back to the EntityType nodes). TypeA has entity relationships >>>>> to 2 and 3, but not 4 - as that's via TypeAD
>>>>> So initially what I'd like is the most efficient cypher query for >>>>> detecting the direct entity relationships for TypeA - EntityType2 and >>>>> EntityType3 in my example. That is, I'd like to see the following >>>>> relationships returned
>>>>> On Friday, 16 November 2012 07:53:37 UTC+11, Ian Tompsett wrote:
>>>>>> Thanks Peter
>>>>>> I'll have a look at the best way to get a sample for you - I should >>>>>> be able to choose a subset DLLs that still have the appropriate depth. I >>>>>> haven't played with the console, and my initial data load was done via a >>>>>> bit of gremlin from within neo4jclient so that might take me a little >>>>>> while. There'll also be a timezone lag too - I'm in Australia.
>>>>>> In the meantime, re "the first down each path": my cypher queries >>>>>> return a (large) number of paths. Each of those paths follows a tree of >>>>>> type dependencies. Within each of those paths some of the nodes (ie types) >>>>>> will be my known entities. So far each path, I only want the path down as >>>>>> far as the first matching entity - even though the path could extend much >>>>>> further.
>>>>>> For example given a single path of type usage: *A* >>>>>> ->uses->B->uses->C->uses->*D*-****>uses->E->uses->*F*->uses->G->**us* >>>>>> *es->H
>>>>>> And say A, D and F are known entity types, when starting from A I >>>>>> would like my query to return the path to D, without the need to get to F. >>>>>> I would then create a direct relationship between A and D, but no direct >>>>>> relationship between A and F. Separately I would create a relationship >>>>>> between D and F.
>>>>>> So far I've only managed to get my queries to return all of the paths >>>>>> in their entirety and then perform the filtering afterwards (using a where >>>>>> clause). This means that all of the paths are hugely deep and have a lot of >>>>>> repeated information.
>>>>>> Hope that makes it a bit clearer.
>>>>>> Thanks >>>>>> Ian >>>>>> On Friday, November 16, 2012 2:29:04 AM UTC+11, Peter Neubauer wrote:
>>>>>>> Ian, >>>>>>> I think we can figure this out. Could you do a samll-ish sample over >>>>>>> at http://console.neo4j.org/**us**age.html<http://console.neo4j.org/usage.html>, >>>>>>> so we have a dataset to play with? Then, we could reason about this. To >>>>>>> start with, what do you mean by "the first one down each path"? In your >>>>>>> example (in the console) what would that path be?
Well I've finally managed to try it with my real data. It worked out for a simple case - a type that had one Entity related to it directly. That type wasn't using many others. Then I went a little up the type usage hierarchy, but by no means to the top, and it collapsed in a heap. I tried it once and went to lunch (after it had been going for about half an hour). When I came back it had stopped with an "Invalid Query" error (why it couldn't tell me that straight away I don't quite understand). I tried again and several hours later still no result.
The type I chose used 5 entities directly (detected by leaving out the '*1..' from your query), but I don't know how deep the hierarchy would be underneath - looking at the hierarchy in an unbounded manner like that seems a recipe for failure.
This I'm doing via the Data browser web interface.
I used your suggested query verbatim (apart from changing the start node).
I think you now understand my data requirements, do you think there is another way I should structure things (eg create different relationships) that would help?
Or perhaps I'm faced with the choice of trying to work out how to do this using Gremlin or just accepting that this isn't Neo4J's thing and looking elsewhere.
On Wednesday, 21 November 2012 23:13:52 UTC+11, Ian Tompsett wrote:
> Thanks Michael
> That looks promising, although I haven't had a chance to try it with my > real set of data (I'm not working today) to guage performance. I'll let you > know.
> Ian
> On Wednesday, 21 November 2012 10:22:32 UTC+11, Michael Hunger wrote:
>> Ian,
>> I think you want to filter paths, depending on the nodes that are in >> there:
>> something like this:
>> start t1=node(5) >> match p=t1-[:TypeUsedByType*1..]->t2,t2-[:Entity]->e1
>> where SINGLE(n in tail(nodes(p)) >> where has(n.EntityTypeId))
>> return t1.name, e1.name,t2.name, length(p) as l, extract (n in nodes(p) >> : n.name)
>> I check if there is only a single node (aka t2) that has an EntityType >> property >> tail(nodes(p)) because otherwise t1 would also be in there. >> and e1 also has a property EntityType so I split the path :)
>> Have you had a chance to look at this? I'm getting concerned that the >> answer to my initial question is "no" - especially given Jens' comment.
>> Whilst I do have a deep hierarchy to contend with, I don't need to look >> at it all, I'd just like to find a way of getting my cypher query to >> understand how early it can stop.
>> Unless my only option is the traversal API?
>> Thanks >> Ian
>> On Monday, November 19, 2012 1:22:54 AM UTC+11, Ian Tompsett wrote:
>>> Thanks Peter
>>> What I'm after is a reusable way of only picking up the types that are >>> entities that are directly related to TypeA. By directly related, I mean >>> there are no other entities between the type in question and TypeA.
>>> If you try the following
>>> start t1=node(5) >>> match p=t1-[:TypeUsedByType*1..]->t2-[:Entity]->e1 >>> return t1.name, e1.name,t2.name, length(p) as l, extract (n in nodes(p) >>> : n.name)
>>> You'll see the TypeAF appears in the list. You've managed to filter out >>> by basically knowing the results up front (ie the Limit 3). What I'd like >>> to do is filter it out by detecting that there is another entity (TypeAD) >>> between TypeAF and TypeA.
>>> In the cypher query I've played around with trying a second match clause >>> to exclude t2 values that satisfy the first match, but do not have an >>> entity between them and t1. So that would exclude TypeAF in our example.
>>> Hope that makes sense? Any ideas how to cypher that? Something like
>>> Something like: >>> start t1=node(5) >>> match p=t1-[:TypeUsedByType*1..]->t2-[:Entity]->e1 , >>> p2=t2<-[:TypeUsedByType*1..]-t3<-[:TypeUsedByType]-t1, >>> p3=t3-[:Entity]-e2 >>> where p1 exists but p2 and p3 don't??? >>> return t1.name, e1.name,t2.name, length(p) as l, extract (n in nodes(p) >>> : n.name)
>>> I think I'm looking for an outer join that fails sort of concept here, >>> to use SQL parlance.
>>> Thanks
>>> On Sunday, November 18, 2012 2:21:14 AM UTC+11, Peter Neubauer wrote:
>>>> Ian, >>>> I am not sure I am totally with you, but this query gives you the >>>> results you want:
>>>> start t1=node(5) >>>> match p=t1-[:TypeUsedByType*1..]-t2-[:Entity]->e1 >>>> return t1.name, e1.name,t2.name, length(p) as l >>>> order by l asc >>>> limit 3
>>>> On Fri, Nov 16, 2012 at 6:09 AM, Ian Tompsett <iantom...@gmail.com>wrote:
>>>>> OK, I think I've created a suitable (albeit very small) test case at >>>>> http://tinyurl.com/chyu863
>>>>> Not surprisingly this returns queries pretty quickly, but should be >>>>> enough to get the structure in place. I've added some noise in there to >>>>> include some paths that shouldn't be followed.
>>>>> Hopefully it's clear enough - TypeA is the start point (node 4) and >>>>> it's an entity. It has 3 direct type relationships - to TypeB, Type AC and >>>>> Type AD. And then there's downstream type relationships, some of which are >>>>> known entity types (reflected by both an EntityTypeId property, and a >>>>> relationship back to the EntityType nodes). TypeA has entity relationships >>>>> to 2 and 3, but not 4 - as that's via TypeAD
>>>>> So initially what I'd like is the most efficient cypher query for >>>>> detecting the direct entity relationships for TypeA - EntityType2 and >>>>> EntityType3 in my example. That is, I'd like to see the following >>>>> relationships returned
>>>>> On Friday, 16 November 2012 07:53:37 UTC+11, Ian Tompsett wrote:
>>>>>> Thanks Peter
>>>>>> I'll have a look at the best way to get a sample for you - I should >>>>>> be able to choose a subset DLLs that still have the appropriate depth. I >>>>>> haven't played with the console, and my initial data load was done via a >>>>>> bit of gremlin from within neo4jclient so that might take me a little >>>>>> while. There'll also be a timezone lag too - I'm in Australia.
>>>>>> In the meantime, re "the first down each path": my cypher queries >>>>>> return a (large) number of paths. Each of those paths follows a tree of >>>>>> type dependencies. Within each of those paths some of the nodes (ie types) >>>>>> will be my known entities. So far each path, I only want the path down as >>>>>> far as the first matching entity - even though the path could extend much >>>>>> further.
>>>>>> For example given a single path of type usage: *A* >>>>>> ->uses->B->uses->C->uses->*D*-**>uses->E->uses->*F*->uses->G->** >>>>>> uses->H
>>>>>> And say A, D and F are known entity types, when starting from A I >>>>>> would like my query to return the path to D, without the need to get to F. >>>>>> I would then create a direct relationship between A and D, but no direct >>>>>> relationship between A and F. Separately I would create a relationship >>>>>> between D and F.
>>>>>> So far I've only managed to get my queries to return all of the paths >>>>>> in their entirety and then perform the filtering afterwards (using a where >>>>>> clause). This means that all of the paths are hugely deep and have a lot of >>>>>> repeated information.
>>>>>> Hope that makes it a bit clearer.
>>>>>> Thanks >>>>>> Ian >>>>>> On Friday, November 16, 2012 2:29:04 AM UTC+11, Peter Neubauer wrote:
>>>>>>> Ian, >>>>>>> I think we can figure this out. Could you do a samll-ish sample over >>>>>>> at http://console.neo4j.org/**usage.html<http://console.neo4j.org/usage.html>, >>>>>>> so we have a dataset to play with? Then, we could reason about this. To >>>>>>> start with, what do you mean by "the first one down each path"? In your >>>>>>> example (in the console) what would that path be?