On Wednesday, September 3, 2014 3:29:34 AM UTC-7, Mark Findlater wrote:
I guess that my question as a developer is why I would want the inverse relationship label defined?
With the triple stores I have used there is usually an inference engine which then expose to queries the union of both the asserted triples (the data that you explicitly added) and the inferred triples (the ones that rules were used to create). The important thing about the inferred triples is that they are ephemeral meaning that they do not impact the underlying datastore (although you could choose the materialize them) which stops unnecessary bloat (at the cost of processing).
As these new relationships are sugar which is ultimately bloat would you ever want to persist them? I also suspect that the directional trick you suggest could have serious implications to the performance of the query engine (but that's just a hunch).
The Sibling/Spouse example has limited mileage for me without some clever chaining (e.g. how would I represent cousins? Harden SIBLING and PARENT/SPOUSE relationships and then chain SIBLING->PARENT->SIBLING->CHILD nodes, hardening along the way?) and again I think I would see more value in adding this a level above the underlying store.
On Wednesday, September 3, 2014 3:29:34 AM UTC-7, Mark Findlater wrote:This is definitely about making the schema/engine smarter about at "semantics".I guess that my question as a developer is why I would want the inverse relationship label defined?I can see where a single app written by a single developer or two who are all the combined schema master, data integrity maintainer, and code author likely wouldn't need "help" to translate into their schema to query it; they wrote it and likewise if they want to extend the app, they can just write more code to extend the app; if they need a new relationship, they just invent it. I'd argue that even at that small scale, while it's not "needed" it's still helpful.With the triple stores I have used there is usually an inference engine which then expose to queries the union of both the asserted triples (the data that you explicitly added) and the inferred triples (the ones that rules were used to create). The important thing about the inferred triples is that they are ephemeral meaning that they do not impact the underlying datastore (although you could choose the materialize them) which stops unnecessary bloat (at the cost of processing).Well storage-wise, the db is storing a numeric relationship type identifier, and then separately storing the relationship type label information.So storage-wise, adding multiple relationship type entries for the same type id isn't a problem.The on-disk structure is totally capable of handling this, it's just changing the type of block pointed to in the RelatioshipTypeStore from String to String[].It's the code that loads this from the disk that would need to be updated to understand it's getting a String[] back and not just a String.I'm now however convinced that updating the code to detect a String versus a String[] in the RelationshipTypeStore is going to be way easier than all the splitting and parsing gymnastics I was thinking of earlier! :)Though I still might encode the "direction" as part of the string.As these new relationships are sugar which is ultimately bloat would you ever want to persist them? I also suspect that the directional trick you suggest could have serious implications to the performance of the query engine (but that's just a hunch).We can query (a)<-[:REL]-(b) for the same performance as (a)-[:REL]->(b) so the only trick is getting the planner to know what's being expressed, and that's what I think the parser's job is.All relationships in neo4j have a from/to direction by design. The label for that direction is rather arbitrary created at design time. Being able to define the inverse labels for a relationship type eliminates that required design time arbitrary selection and I can't see how it has any performance impact. The planner still knows it's looking for a numeric relationship id (e.g. 62), and what it needs to know about from and to; I suspect most of the heavy lifting on that was created in the parser. From what I've gathered a "directionless" relationship match actually becomes two matches under the hood, one in each direction.This would just make a similar kind of translation.The Sibling/Spouse example has limited mileage for me without some clever chaining (e.g. how would I represent cousins? Harden SIBLING and PARENT/SPOUSE relationships and then chain SIBLING->PARENT->SIBLING->CHILD nodes, hardening along the way?) and again I think I would see more value in adding this a level above the underlying store.You jumped the gun on me! :)I'm actually working on a "relationships as sub-queries" plan and [:COUSIN], [:NIECE], [:NEPHEW] are perfect examples of that.I guess I just ought to have simply proposed these named sub-queries as part of the initial discussion. :)It's cool that you used the PARENT/CHILD pairing in the example (n)-[:PARENT]->(p)-[:SIBLING]->(s)-[:CHILD]->(c). :)I see using the PARENT/CHILD pair as way more straightforward to read/follow than (n)-[:PARENT]->(p)-[:SIBLING]->(s)<-[:PARENT]-(c). :)
So let's talk about how this can be used to create the (n)-[:COUSIN]->(c) query;This would be a new kind of relationship type.If the Cousin type captured a MATCH query like from above, then it would be a kindred spirit to a SQL VIEW or FUNCTION as part of the DB schema. Again it's primarily just creating a syntactic shortcut for a larger query; but it's one that makes the database more comprehensible to those interfacing with it.
This new relationship type could even be used to create a really powerful result set reduction replacement function.Imagine taking a graph result from the prior query and running the [:COUSIN] relationship replacement function on it.So if we take a row from the results from the above query,e.g. (r1) = "ME", [:PARENT], "MYDAD", [:SIBLING], "DAD'S BROTHER", [:CHILD], "DAD'S BROTHER'S DAUGHTER"and run GraphMapReduce(r1, [:COUSIN]), then anywhere the intermediate nodes match the query provided by [:COUSIN] they get replaced with the relationship [:COUSIN].Giving us:(r2) = "ME"-[:COUSIN]->"DAD'S BROTHER'S DAUGHTER"If the [:COUSIN] type also provided some kind of MAP/REDUCE type functions as part it's description/definition, then it could build even its own properties dynamically from the underlying properties on the intervening nodes.This would be especially useful if we were to create a "correct" implementation of Cousin which really means everyone you are directly related to through marriage or parentage. Your third cousin twice removed for example is still technically a "Cousin", the map/reduce function on "Cousin" could take the intervening Path P, count the number of Parent hops involved to get the cousin's "order" and then count the balance of the parent links up/down to get the cousin's "removal" and make those both properties part of the "COUSIN" relationship connecting us.So the idea is that you pass a sub-graph or a result set to a named relationship type (which is actually a query).It operates on each row in the result set, when the row matches the query, the links between the matching endpoint nodes are replaced by the relationship and the results of the relationship query become properties on the relationship.So the final results of this would be that we started with this:(r1) = "ME", [:PARENT], "MYDAD", [:SIBLING], "DAD'S BROTHER", [:CHILD], "DAD'S BROTHER'S DAUGHTER"and reduced it to this:(r3) = "ME"-[:COUSIN {order: 1, removed: 0}]->"DAD'S BROTHER'S DAUGHTER"This, at least in my mind's eye, would involve adding a new field to the relationship type to persist across reboots (though technically it is still just a "String").
hood stuff as I do not know enough about how Neo maintains it's indexes, how the query/relationship/object caches work and whether there is any performance impact of creating twice as many relationships (I expect that if I do not get smarter with my data I would reach maximum capacity of Neo in 3 years, if I doubled the relationships I suppose I would half that, but that is very naive!).
Running with a team of developers I would not expect all or even many of them to be operating on raw queries and similar to not producing inverse methods all over the place (although you could cite Commons StringUtils isEmpty/isNotEmpty type design) I expect people to be comfortable with negation, or applying more complex logical analysis. Which is what I meant about exposing the raw query interface to the world at which point I might expect less of what people are comfortable with!
If you could define the inferred relationships such that the planner could use them (he said :SIBLING but he means :SISTER or :BROTHER) or, as you touched on below defining the rules and then explicitly running a MapReduce type function to expose the data sound very interesting.
--
You received this message because you are subscribed to the Google Groups "Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Definitely food for thought and the ideas definitely have merit! I was once developing an algorithm to identify groups based on clusters of in-common friends links; and as you touched in your post; naming/identifying these fledging anonymous groups was one of the first things I had to deal with. I hadn't even considered adding an optimizer/classifier to detect often matched relationships. I mean it makes sense "these two nodes with this intervening path comes up often in queries; perhaps we can ask to name it?"
The other cool thing you touched on was a practical mechanism for how to track and manage these derived graphs; basically "create a new graph for them!". That's way better (and not too mention easier) than what I was thinking of. I could even see using the hooks mechanism to create and maintain these derived graph links.
As far as applications go; might brain breaks if I think about it too much; the implications are way too staggering. Especially if the thing actually performs even half-way decently! What's also great is family trees are complicated enough; yet has answers that are thouroughly comprehended enough to model the "correct" behaviour for a lot of this derived link stuff!
In the meantime, what's fantastic is I think neo4j already has the necessary query infrastructure for expanding these sub-queries at runtime through CYPHER's support of multiple starting points using aggregates from earlier clauses. I'm thinking that's going to help.
Mike
You received this message because you are subscribed to a topic in the Google Groups "Neo4j" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/neo4j/swNSvsKtqf8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to neo4j+un...@googlegroups.com.