Spring Data Neo4j performance issue using template.fetch()

347 views
Skip to first unread message

Ümit Seren

unread,
May 13, 2013, 8:21:24 AM5/13/13
to ne...@googlegroups.com

I have a DAG like data-structure that is currently stored in a PostgreSQL database and accessed via a Spring web-application using Spring Data JPA (Hibernate).

There are basically two important tables: Term and Term2Term. Term2Term is the many-to-many reference table that contains the id of the parent (Term) and child (Term) together with a relationship type (is_a and part_of).
AFAIK this seems to be a good data-structure to store in Neo4j so I imported the data into Neo4j. Currently there are 1310 nodes with 2825 properties and 1449 relationships with 2 relationship types (so the data is quite small).

The Term Entity and Term2Term (RelationshipEntity) looks like this:

@NodeEntity
public class Term {

    @GraphId
    private Long nodeId;

    @Indexed(unique=true)
    private String id;
    private String name;
    private String definition;
    private String synonyms;

    @RelatedToVia(type="is_a",direction=Direction.INCOMING)
    private Set<IsATerm> is_a_children;

    @RelatedToVia(type="is_a",direction=Direction.OUTGOING)
    Set<IsATerm> is_a_parents;

    @RelatedToVia(type="part_of",direction = Direction.INCOMING)
    Set<PartOfTerm> part_of_children;

    @RelatedToVia(type="part_of",direction = Direction.OUTGOING)
    Set<PartOfTerm> part_of_parents;
}
@RelationshipEntity
public class Term2Term {
   @GraphId
    private Long nodeId;

    @StartNode
    private Term child;
    @EndNode
    private Term parent;
}

I configured Spring Data Neo4j`s graphDataBaseService to use the SpringRestGraphDatabase.


<neo4j:config graphDatabaseService="graphDatabaseService"/>
    <bean id="graphDatabaseService"
          class="org.springframework.data.neo4j.rest.SpringRestGraphDatabase">
        <constructor-arg index="0" value="${neo4j.host}" />
    </bean>
    <bean id="typeRepresentationStrategyFactory" class="org.springframework.data.neo4j.support.typerepresentation.TypeRepresentationStrategyFactory">
        <constructor-arg index="0" ref="graphDatabaseService"/>
        <constructor-arg index="1" value="Noop"/>
    </bean>
    <neo4j:repositories base-package="com.gmi.nordborglab.browser.server.repository.ontology" />

I want to display the data in a Tree-Widget. This tree widgets supports asynchronous loading of the data.
So initially I only display the root level nodes. When the user open a specific node it calls the backend and loads the n+1 level nodes. I also have to load the n+2 level nodes in order to display the number of child nodes.
The tree widget looks something like this:

-Term 1 (3) - Term 1.1 (1) (is_a) -Term 1.1.1 (0) (part_of) - Term 1.2 (0) (part_of) - Term 1.3 (2) (is_a) - Term 1.3.1 (0) (part_of) - Term 1.3.2.(0) (is_a)

This is how I load a specific node of the Tree widget:

@Override
 public Term2Term findOneTerm2Term(Long id) {
     Term2Term term2Term = term2TermRepository.findOne(id);
     template.fetch(term2Term.getChild())
     for (Term2Term subTerm2Term:term2Term.getChild().getChildren()) {
           template.fetch(subTerm2Term.getChild());
     }
     return term2Term;
 }

The JPA code is similar except I don't have to manually call fetch on the children because Hibernate supports Lazy Loading.

I compared the code and it seems that the JPA/PostgreSQL database seems to be almost 6 times faster (I know that this is not a precise benchmark but my Neo4j implementation is significantly slower).

I am looking into ways to improve performance.
I guess the initial findOne() call should be equally fast in Spring Data JPA and Spring Data Neo4j. To me it seems that calling template.fetch on every child for n+1 and n+2 levels is causing the performance issues.

Does adding a @Fetch annotation to the collection improve the performance?
When I tried to add @Fetch I ended up with a infinite loop and a stackoverflow (I guess because of the cyclic dependency between Term's).

I also guess that switching from a SpringRestGraphDatabase configuration to an Embedded Neo4j database should also improvide performance because there is no overhead of json serialization?
Currently I have to use SpringRestGraphDataBase because my backend also has dependency on Elasticsearch which has transitive dependencies to Lucene 4 which doesn't work with Neo4j.

Versions used:

Spring Data Neo4j: 2.2.0.RELEASE

Neo4j: 1.8.2
Spring: 3.2.2.RELEASE

Michael Hunger

unread,
May 13, 2013, 9:13:46 AM5/13/13
to ne...@googlegroups.com
The current rest-integration of SDN is not really performant, there are too many network interactions.

I would recommend you use a repository with a query method that returns the full next level of your DAG, this can be an annotated finder or a derived finder method.

Best is to return raw data as the entities will be populated in a more expensive way.

We'll work on that with the Neo4j 2.0 version of SDN where we can use the transactional Cypher endpoint to perform the mapping more efficiently.

With embedded it will be very fast as it directly accesses the db and even multiple requests don't have any penalty.

Michael

--
You received this message because you are subscribed to the Google Groups "Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+un...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Reply all
Reply to author
Forward
0 new messages