Question about Relationships number and their influence on algorithms

36 views
Skip to first unread message

Angelo Immediata

unread,
May 19, 2014, 3:04:51 AM5/19/14
to ne...@googlegroups.com
Hi there

With my colleague, we are are buillding a route system by using neo4j 2.0.3; so we are suing A* and Dijkstra algorithms in order to calculate the shortest path,
I was wondering if the relationships number can affect the algorithm perfomance. I mean, we have a graph with around 1 million (or more) of nodes and 50 million of relationships. We have several types of relationship; specifically we have:
  • relationships for cars: the most of relationships are of this type
  • relationships for bikes
  • relationships for pedestrian
  • relationships for public transports
When we execute Dijkstra and/or A* we can specify, in our PathExpander, the type of the relationships we want to consider during the traverser, so, my sensation is that the relationships number should not affect algorithm performance since we will sparsely (almost never) consider all the relationships types. Am I right?

Thak you
Angelo

Angelo Immediata

unread,
May 19, 2014, 6:40:42 AM5/19/14
to ne...@googlegroups.com
Hi there

We have some questions about A* Implementation; we were thinking to use the TraversalAStar, since we really like the traversal framework and it's simpler to wotk on it
By the way, also by using the not traversal AStar (the classical implementatin), as estimateevaluator we used the CommonEvaluators.geoEstimateEvaluator; now in our routing calculation, we noticed some strange paths (we checked for several points and we compared the result with google and other routing systems)
We, then, created a custom estimateevaluator; we simply created the following:
 EstimateEvaluator<Double> estimateEvaluator = new EstimateEvaluator<Double>()
       
{
           
public Double getCost( final Node node, final Node goal )
           
{
               
double dx = (Double) node.getProperty(OSMAttribute.LONGITUDE_PROPERTY ) - (Double) goal.getProperty( OSMAttribute.LONGITUDE_PROPERTY );
               
double dy = (Double) node.getProperty( OSMAttribute.LATITUDE_PROPERTY ) - (Double) goal.getProperty( OSMAttribute.LATITUDE_PROPERTY );
               
double result = Math.sqrt( Math.pow( dx, 2 ) + Math.pow( dy, 2 ) );
               
return result;
           
}
       
};

Well by using this evaluator the returned path are pretty similar to the ones returned by google and the other softwares. Does anybody know the reason of this behaviour? Are we missing anything?

Moreover.....may you tell us the maturity of the TraversalAStar implementation? In the source code we saw that it's in experimental status but for the few tests we did it seems to work good....is it possible to use it in production environment?

Thank you
Angelo

Michael Hunger

unread,
May 19, 2014, 7:22:47 AM5/19/14
to ne...@googlegroups.com
Is this for a hot dataset, or one that has to be fetched from disk?
How many rels do you usually have per node?


--
You received this message because you are subscribed to the Google Groups "Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Angelo Immediata

unread,
May 19, 2014, 8:30:23 AM5/19/14
to ne...@googlegroups.com
Hi Michael

For hot dataset do you mean a dataset stored in memory? Well we tried both for in memory dataset and for dataset on the disk
Well I don't know exactly how many relationships can have a node....the worst case is the each node contains 4 relationship each one with direction BOTH


--
You received this message because you are subscribed to a topic in the Google Groups "Neo4j" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/neo4j/YtOt_rNy9sA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to neo4j+un...@googlegroups.com.

Mattias Persson

unread,
May 26, 2014, 5:24:40 PM5/26/14
to Neo4j Development
Hi,

About the number of relationships per node and how that affects things: it affects the load performance of a node the first time its relationships are touched after the node being freshly loaded into memory (either first time, or after eviction from cache). Iterating over all of a certain type, as in your case, requires all relationships to be loaded for that node. In 2.1 there will be a store format change where only relationships of the requested type and direction are loaded. Nodes that go over a certain threshold number of relationships will take advantage of such a representation on disk, so in your case where a worst case is 4 relationships, or more specifically 1 relationship of each type, there will no gain by that store format change.

The geo estimate evaluator in neo4j takes into consideration the fact that the earth aint flat, but if a simpler version works, then by all means :)

I think it's safe to say that the TraversalAStar isn't really experimental anymore, it's probably a notice left in there by mistake.

Mattias Persson
Neo4j Hacker at Neo Technology

Michael Hunger

unread,
May 26, 2014, 5:25:28 PM5/26/14
to ne...@googlegroups.com
Actually there is no direction both in neo4j's datastore, you can use BOTH when querying.

Angelo Immediata

unread,
May 27, 2014, 4:14:58 AM5/27/14
to ne...@googlegroups.com
Hi Mattias, hi Michael
Thank you for answering me

@Mattias: I was sure about what you wrote about relationships and influence on algorithms, but now I'm confident that I was right :). But, on the other side, now I'm totally confused on the reason why the AStar algorithm has so poor performance (at least in my case); on Stackoverflow I had a rapid chat with Stefan and he was surprised too about this poor performance....and honestly I don't know what else to try to improve performance (I posted a topic where you can find a sample project on github and test what I tested)
@Michael: I know that BOTH direction is for querying and/or traverser...I was wrong in writing :) 


Angelo
Il giorno lunedì 19 maggio 2014 09:04:51 UTC+2, Angelo Immediata ha scritto:

Craig Taverner

unread,
May 27, 2014, 5:33:00 AM5/27/14
to ne...@googlegroups.com
Hi,

I just took a peek at the algorthm for the cost at https://github.com/neo4j/neo4j/blob/master/community/graph-algo/src/main/java/org/neo4j/graphalgo/impl/util/GeoEstimateEvaluator.java.

I did not double-check the maths, but this does not look like a distance function over a sphere. This looks like a 3D pythagoras, which means it does not fully take into account the curvature of the earth. But in principle, for most cases, it should work exactly like the 2D pythagoras that Angelo coded himself.

The two main differences between Angelos code and the GeoEstimateEvaluator are:
  • GeoEstimateEvaluator does 3D instead of 2D. This will give better results if parts of the route are noticably closer to the poles than other parts (or some routes go closer to the poles than others). The 2D evaluator will underestimate the costs of routes closer to the poles. For most local routing, the difference should not matter.
  • GeoEstimateEvaluator converts the cost to meters. Angelo does not do anything with the units. I think this should not affect the algorithm in any way. One advantage of Angelo's version is that it does not matter what units are in the nodes. In GeoEstimateEvaluator it assumes that lat and lon are in decimal degrees.
Some suggestions:
  • Double check the maths in GeoEstimateEvaluator. If this is giving strange results, it might be worth checking why.
  • Support alternative units (not everyone uses decimal degrees for location). I think we should not have code that makes assumptions about the units of properties like this. Or at least allow the user to set the units if they are not the expected defaults
  • Consider writing a better distance function that is not pythagoras, but rather distance of the surface of the sphere. This will allow for accurate calculations of much larger distances than the current code will.
Regards, Craig


--
Reply all
Reply to author
Forward
0 new messages