Neo4j A Star algorithm performance questions

477 views
Skip to first unread message

Angelo Immediata

unread,
May 19, 2014, 1:41:05 PM5/19/14
to ne...@googlegroups.com
Hi there

We are using Neo4j 2.0.3 in order to build a routing system. We are using the classical A Start implementation, tough we saw that there is a TraversalAstar implementation but it seems to be in experimental status yet.

By the way, we read an OSM file and create our graph; actually we have around 1 million of points. around 2 million of relationships and around 5 million of properites on relationships; by reading neo4j documentation it seems to me that this DB dimension is really small and it's really well handled by neo4j but in my case this doesn't seem to be real (maybe I'm missing something)

Actually our node creation code is the following one:

 private long createGraphNode(OsmNodeWrapper osmNodeWrapper)
 
{
 
try
 
{
 
Map<String, Object> nodeProps = new HashMap<String, Object>();
 
double x = osmNodeWrapper.getLongitude();
 
double y = osmNodeWrapper.getLatitude();
 nodeProps
.put(OSMAttribute.LATITUDE_PROPERTY, y);
 nodeProps
.put(OSMAttribute.LONGITUDE_PROPERTY, x);
 nodeProps
.put(OSMAttribute.OSM_NODE_ID_PROPERTY, osmNodeWrapper.getOsmNodeId());
 
if (osmNodeWrapper.isTrafficSignal())
 nodeProps
.put(OSMAttribute.TRAFFIC_SIGNAL, true);
 
// Creo il nodo
 
long graphNodeId = inserter.createNode(nodeProps, mainNodeLabel);
 
// aggiungo l'identificativo del nodo sul grafo
 osmNodeWrapper
.setGraphNodeId(graphNodeId);
 
return graphNodeId;
 
}

In our graph, we have only 2 types of relationships:
  • CAR_ONE_WAY_RELATION for one way road with Direction OUTCOMING
  • CAR_BIDIRECTIONAL_WAY_RELATION for bidirectional road with Direction BOTH
This is the relationship creation code:
//Create RelationShip
Map<String, Object> relationProps = new HashMap<String, Object>();
relationProps
.put(OSMAttribute.EDGE_LENGTH_PROPERTY, lunghezzaArco);
....//other properties
long relId = inserter.createRelationship(startNode, endNode, "CAR_ONE_WAY_RELATION", relationProps);

osmWayIdPropertyIndex
.add(relId, relationProps)

We configured the neo4j embedded by using the following parameters:

neostore.nodestore.db.mapped_memory=100M
neostore
.relationshipstore.db.mapped_memory=3G
neostore
.propertystore.db.mapped_memory=100M
neostore
.propertystore.db.strings.mapped_memory=200M
neostore
.propertystore.db.arrays.mapped_memory=50M
cache_type
=strong

As you can see we use a strong cache type; this means we cache all data in memory. Moreover when out application starts we load all nodes and properties in memory; n order to do this we use this code:


 private void loadWholeGraphInMemory() throws PinfGraphServiceException{
 
StopWatch sw = new StopWatch();
 sw
.start();
 
Node start;
 
for ( Node n : ggo.getAllNodes() ) {
 n
.getPropertyKeys();
 
for ( Relationship relationship : n.getRelationships() ) {
 start
= relationship.getStartNode();
 
}
 
}
 
for ( Relationship r : ggo.getAllRelationships() ) {
 start
= r.getStartNode();
 r
.getPropertyKeys();
 
}
 sw
.stop();
 logger
.info("Grafo caricato in "+sw.getLastTaskTimeMillis()+" millis");


 
}

We tried to execute the AStar algorithm between 2 points. The distance between points is around 130 kilometers. In order to execute the algorithm Neo4j takes around 4 seconds for returning a path and it seems to me really too much; if i compare it to other software on the same OSM I have really really really really different performance.
Is there any way to improve the Algorithm performance? Where can we act to have better performance? would be better to use the Traversal AStar? Moreover is there any plan to implement bi-directional AStar and/or Dijkstra?

I guess that with this kind of perfomance, Neo4j is not suitable for my scenario

Thank you
Angelo




Angelo Immediata

unread,
May 20, 2014, 12:39:46 PM5/20/14
to ne...@googlegroups.com
Any tip on how and if i can improve algorithm perfomances?

Antonio Grimaldi

unread,
May 23, 2014, 5:23:11 AM5/23/14
to ne...@googlegroups.com
No one can answer???


Il giorno lunedì 19 maggio 2014 19:41:05 UTC+2, Angelo Immediata ha scritto:

Angelo Immediata

unread,
May 23, 2014, 10:36:14 AM5/23/14
to ne...@googlegroups.com
Hi there

We are still facing the low performance issue we talked in this post.
In order to let you to do some tests, I reproduced a test project based on maven; you can find it here: https://github.com/angeloimm/neo4jAstarTest 
By doing my own tests I had the following results:

  • A* from node 1 to node 2: 1416 millis
  • A* from node 1 to node 300000: 3428 millis
  • A* from node 1 to node 525440: 4128 millis

As you can see, times are really huge; you can find my laptop information in the file LaptopInformation.html generated with hardinfo tool
Are these times normal? Can I improve them?
Thank you
Angelo

Il giorno lunedì 19 maggio 2014 19:41:05 UTC+2, Angelo Immediata ha scritto:

Angelo Immediata

unread,
May 23, 2014, 10:41:37 AM5/23/14
to ne...@googlegroups.com
Ah I forgot to add that in the configuration file you can see that I configured neo4j ith this settings:

  • nodestore_mapped_memory_size=250M
  • relationshipstore_mapped_memory_size=3G
  • nodestore_propertystore_mapped_memory_size=250M
  • strings_mapped_memory_size=500M
  • arrays_mapped_memory_size=50
  • cache_type=strong

The neo4j version is 2.0.3

Any tips would be really appreciate.

Thank you

Il giorno lunedì 19 maggio 2014 19:41:05 UTC+2, Angelo Immediata ha scritto:

Davide

unread,
May 24, 2014, 9:06:15 AM5/24/14
to ne...@googlegroups.com
On Mon, May 19, 2014 at 7:41 PM, Angelo Immediata <ange...@gmail.com> wrote:
>
> We tried to execute the AStar algorithm between 2 points. The distance between points is around 130 kilometers. In order to execute the algorithm > Neo4j takes around 4 seconds for returning a path and it seems to me really too much; if i compare it to other software on the same OSM I have
> really really really really different performance.

Other software usually don't use just A*, they also implement
Contraction hierarchies
(http://en.wikipedia.org/wiki/Contraction_hierarchies), you should try
to do the same on Neo4j.

--
Davide Savazzi

Angelo Immediata

unread,
May 24, 2014, 9:42:59 AM5/24/14
to ne...@googlegroups.com
Hi Davide

Thank you for the feedback; I know about Contraciton Herarchies (CH); sadly this is not suitable for my scenario becouse my customer wants that the routing system takes care of some events and it must be able in providing the best route in that moment and by excluding events; by using CH, every time there is an event I should re-create the contracted graph and this can be a very long operation and it's not the best solution for me.
By the way I saw softwares where you can choose to use only Dijkstra (ot classical or bi-directional) or A* (both classical and bi-directional); I loaded the the same OSM file and performance are really really really different; for the same points the longest time (from point 0 to poin 525440) was 0.3 millis...
I really can't understand this difference of performances; moreover for Neo4j I use a cache_type strong and I load the full graph in memory when my server starts.....  


Il giorno lunedì 19 maggio 2014 19:41:05 UTC+2, Angelo Immediata ha scritto:

Angelo Immediata

unread,
May 26, 2014, 12:10:01 PM5/26/14
to ne...@googlegroups.com
Any idea about these low performances?


Il giorno lunedì 19 maggio 2014 19:41:05 UTC+2, Angelo Immediata ha scritto:

Craig Taverner

unread,
May 27, 2014, 5:51:22 AM5/27/14
to ne...@googlegroups.com
Reading your previous messages I see a comment that "A* from node 1 to node 2: 1416 millis". Are you saying that A* from a node to a directly connected node is taking 1416ms? That does seem insanely long. I could only imagine two possible reasons:
  • Your JVM is using OS swap space (ie. heap is too big, due to the strong cache. Java performance really sucks if it uses OS swap space, consider not caching so much, or add more RAM). 
  • Serious bug in A* or use of it. I'm not familiar with the implementation of A* in neo4j, so I cannot comment further, but I assume running a profiler on the code might highlight something useful.
Regarding contraction hierarchies. I thought they would only need to be recalculated if the road network changed. What kind of events are you getting that would change the road network and require this recalculation?



--
You received this message because you are subscribed to the Google Groups "Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Angelo Immediata

unread,
May 27, 2014, 6:01:54 AM5/27/14
to ne...@googlegroups.com
Hi Craig

thank you for answering to me

Actually I tested the code with this JVM settings: -Xms750m -Xmx4G  -XX:+UseConcMarkSweepGC (on my laptop I have 8G of RAM with Ubuntu OS)
I used this kind of settings for neo4j:
nodestore_mapped_memory_size=250M
relationshipstore_mapped_memory_size
=3G
nodestore_propertystore_mapped_memory_size
=250M
strings_mapped_memory_size
=500M
arrays_mapped_memory_size
=50
cache_type
=strong

Note these are values in a properties file; I use them by this code:

PropertiesLoader pl = PropertiesLoader.getInstance();
 
GraphDatabaseFactory gdbf = new GraphDatabaseFactory();
 
GraphDatabaseBuilder gdbb = gdbf.newEmbeddedDatabaseBuilder(pl.getNeo4jDbPath());
 gdbb
.setConfig(GraphDatabaseSettings.nodestore_mapped_memory_size, pl.getNodestoreMappedMemorySize());
 gdbb
.setConfig(GraphDatabaseSettings.relationshipstore_mapped_memory_size, pl.getRelationshipstoreMappedMemorySize());
 gdbb
.setConfig(GraphDatabaseSettings.nodestore_propertystore_mapped_memory_size, pl.getNodestorePropertystoreMappedMemorySize());
 gdbb
.setConfig(GraphDatabaseSettings.strings_mapped_memory_size, pl.getStringsMappedMemorySize());
 gdbb
.setConfig(GraphDatabaseSettings.arrays_mapped_memory_size, pl.getArraysMappedMemorySize());
 gdbb
.setConfig(GraphDatabaseSettings.cache_type,pl.getCacheType());
 gdbb
.setConfig(GraphDatabaseSettings.use_memory_mapped_buffers, "true");
 graphDbService
= gdbb.newGraphDatabase();

And, as you said Craig, I'm insisting on this issue becouse the AStar times seem to me really too much (also for 2 close nodes the ones with ID 1 and ID 2)
Actually I really don't know what else to try....

Angelo

Angelo Immediata

unread,
May 27, 2014, 6:08:27 AM5/27/14
to ne...@googlegroups.com
Ah Craig...two last info.....:

I was wrong previously....I'm not sure if node 1 and node 2 are really directly connected...it depends on the OSM file; what I'm sure about is that I reduced to the minimum the complexity of my code by considering only 2 kind of relationships (the ones I put in the A* test)
Moreover I use this code for A* code

 public String testAStar(long idNodeStart, long idNodeEnd)
 
{


 
Transaction tx = this.graphDbService.beginTx();
 
try
 
{
 
Node startNode = graphDbService.getNodeById(idNodeStart);
 
Node endNode = graphDbService.getNodeById(idNodeEnd);
 
EstimateEvaluator<Double> estimateEvaluator = new EstimateEvaluator<Double>()
 
{
 
public Double getCost(final Node node, final Node goal)
 
{
 
double dx = (Double) node.getProperty(OSMAttribute.LONGITUDE_PROPERTY) - (Double) goal.getProperty(OSMAttribute.LONGITUDE_PROPERTY);
 
double dy = (Double) node.getProperty(OSMAttribute.LATITUDE_PROPERTY) - (Double) goal.getProperty(OSMAttribute.LATITUDE_PROPERTY);
 
double result = Math.sqrt(Math.pow(dx, 2) + Math.pow(dy, 2));
 
return result;
 
}
 
};
 
PathExpander expander = PathExpanders.forTypesAndDirections(RelTypes.CAR_BIDIRECTIONAL_RELATION, Direction.BOTH, RelTypes.CAR_ONEWAY_RELATION, Direction.OUTGOING);
 
List<WeightedPath> paths = new ArrayList<WeightedPath>();
 
WeightedPath path = GraphAlgoFactory.aStar(expander, CommonEvaluators.doubleCostEvaluator(OSMAttribute.EDGE_LENGTH_PROPERTY), estimateEvaluator).findSinglePath(startNode, endNode);// tas.findSinglePath(startNode,
 
String result = null;
 
if( path != null )
 
{
 result
= path.toString();
 
}
 tx
.success();
 
return result;
 
}
 
catch (Exception e)
 
{


 tx
.failure();
 logger
.fatal(e.getMessage(), e);
 
throw new IllegalStateException(e);
 
}
 
finally
 
{
 
if (tx != null)
 
{


 tx
.close();
 
}
 
}
 
}


So I don't know if I'm wrong in using it or less...it seems to me I'm not wrong
Anyway....I guess times are really too big...

Il giorno lunedì 19 maggio 2014 19:41:05 UTC+2, Angelo Immediata ha scritto:

Angelo Immediata

unread,
May 28, 2014, 7:55:41 AM5/28/14
to ne...@googlegroups.com
Hi

Craig regarding to your question about which events I have to consider in my scenario...I can have events that tell me a road is blocked...so I need to present to the customer a different shortestpath in order to arrive to destination....so in this case it's like if costs relationships between nodes change and this mean that potentially the minimum path can change.....as far as I know if there are changes on the minimum path, the contracted graph should be re-generated

Anyway....from what I saw in the theory, Dijkstra and A* should have these low performances when there are around 1 Million of nodes and several million of relationships; in our case we have around 570000 nodes and 660000 relationships....I really can't figure why this behaviour....

Angelo 


Il giorno lunedì 19 maggio 2014 19:41:05 UTC+2, Angelo Immediata ha scritto:

Angelo Immediata

unread,
May 30, 2014, 4:35:21 AM5/30/14
to ne...@googlegroups.com
Hi there

I'm still facing this very big issue....did anybody check my test project on github?

Angelo

Il giorno lunedì 19 maggio 2014 19:41:05 UTC+2, Angelo Immediata ha scritto:

Angelo Immediata

unread,
Jun 2, 2014, 7:14:01 AM6/2/14
to ne...@googlegroups.com
No update?


Il giorno lunedì 19 maggio 2014 19:41:05 UTC+2, Angelo Immediata ha scritto:

Craig Taverner

unread,
Jun 2, 2014, 7:34:50 AM6/2/14
to ne...@googlegroups.com
I did git clone your project, but have not had the chance to try run it yet. I still think we need to actually run a performance profiler on it to track what it is actually doing to take so much time. Have you tried that yourself yet?


Angelo Immediata

unread,
Jun 3, 2014, 4:05:38 AM6/3/14
to ne...@googlegroups.com
I tired to investigate but I was not able in finding the issue....


Il giorno lunedì 19 maggio 2014 19:41:05 UTC+2, Angelo Immediata ha scritto:

Angelo Immediata

unread,
Jun 9, 2014, 12:07:22 PM6/9/14
to ne...@googlegroups.com
Hi there

any feedback regarding this big issue? We are still facing this issue and since we have no feedback we were thinking to use some workaround and/or alternative....

Any feedabck is really appreciated
Angelo


Il giorno lunedì 19 maggio 2014 19:41:05 UTC+2, Angelo Immediata ha scritto:

Angelo Immediata

unread,
Jun 15, 2014, 2:50:49 PM6/15/14
to ne...@googlegroups.com
well i tried ti figure it out....but sadly the only workaround i could find....is to no more use neo4j in my project.....anyway i'm wondering if u have any update regarding this very very very big issue (at least according to me)

Angelo

Il giorno lunedì 19 maggio 2014 19:41:05 UTC+2, Angelo Immediata ha scritto:

Angelo Immediata

unread,
Jul 3, 2014, 3:44:36 AM7/3/14
to ne...@googlegroups.com
Hi there

any update on this issue?

Angelo

Il giorno lunedì 19 maggio 2014 19:41:05 UTC+2, Angelo Immediata ha scritto:
Reply all
Reply to author
Forward
0 new messages