How to reindex Spring Data neo4j indexes?

762 views
Skip to first unread message

Hendy Irawan

unread,
Dec 21, 2011, 1:21:57 PM12/21/11
to ne...@googlegroups.com
Hi Neo4j developers,

In my app, I need to do data integration using Pentaho Data Integration (PDI aka Kettle) so I use the very cool Neo4j REST API to insert nodes.

Problem is, my nodes (and rels) are not indexed "properly" like when using Spring Data Neo4j API. And using Spring Data Neo4j in for data integration scenario is impractical because of overhead code needed and also due to performance (thousands of records for now, and will grow).

What I mean by indexes are:
  1. __types__ node index
  2. __rel_types__ relationship index
  3. custom @Indexed properties on nodes and relationships

As Michael Hunger suggested here (https://twitter.com/#!/mesirii/status/149517922707054593), for #3, either:

  1. use auto-indexes, or
  2. load and persist the new nodes using Spring Data Neo4j
However #1 and #2 is proprietary Spring Data Neo4j structure (see https://jira.springsource.org/browse/DATAGRAPH-160 ).

Is there a way so I don't need to update these indexes, and let Spring Data neo4j do the job? Perhaps something like GraphRepository.reindex() method ?

Thank you.


Hendy

Hendy Irawan

unread,
Dec 21, 2011, 1:54:27 PM12/21/11
to Neo4j
I tried this :

public void reindexNodeProperties() throws ClassNotFoundException {
Iterable<Node> allNodes = gds.getAllNodes();
for (Node node : allNodes) {
logger.info("Reindexing node {}: {}", node.getId(),
node.getProperty("__type__"));
Class clazz = InterestsAdmin.class.forName((String)
node.getProperty("__type__"));
NodeBacked obj = (NodeBacked) neo4j.findOne(node.getId(), clazz);
logger.info("Persisting {}: {}", node.getId(), obj);
obj.persist();
}
}

but doesn't work :

Caused by: java.lang.UnsupportedOperationException
at
org.neo4j.rest.graphdb.AbstractRemoteDatabase.getAllNodes(AbstractRemoteDatabase.java:
72) [neo4j-rest-graphdb-1.5.jar:]
at
org.neo4j.rest.graphdb.RestGraphDatabase.getAllNodes(RestGraphDatabase.java:
32) [neo4j-rest-graphdb-1.5.jar:]
at
com.satukancinta.ui.admin.InterestsAdmin.reindex(InterestsAdmin.java:
88) [classes:]
at com.satukancinta.ui.admin.InterestsAdmin$Proxy$_$
$_WeldClientProxy.reindex(InterestsAdmin$Proxy$_$
$_WeldClientProxy.java) [classes:]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [:
1.7.0_147-icedtea]
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:
57) [:1.7.0_147-icedtea]
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:
43) [:1.7.0_147-icedtea]
at java.lang.reflect.Method.invoke(Method.java:601) [:1.7.0_147-
icedtea]
at org.apache.el.parser.AstValue.invoke(AstValue.java:196)
[jbossweb-7.0.1.Final.jar:7.0.2.Final]
at
org.apache.el.MethodExpressionImpl.invoke(MethodExpressionImpl.java:
276) [jbossweb-7.0.1.Final.jar:7.0.2.Final]
at
org.jboss.weld.util.el.ForwardingMethodExpression.invoke(ForwardingMethodExpression.java:
43) [weld-core-1.1.2.Final.jar:2011-07-26 15:02]
at
org.jboss.weld.el.WeldMethodExpression.invoke(WeldMethodExpression.java:
56) [weld-core-1.1.2.Final.jar:2011-07-26 15:02]
at
com.sun.faces.facelets.el.TagMethodExpression.invoke(TagMethodExpression.java:
105) [jsf-impl-2.1.3-b02-jbossorg-2.jar:2.1.3-SNAPSHOT]
at
javax.faces.component.MethodBindingMethodExpressionAdapter.invoke(MethodBindingMethodExpressionAdapter.java:
88) [jboss-jsf-api_2.1_spec-2.0.0.Beta1.jar:2.0.0.Beta1]
... 32 more


On Dec 22, 1:21 am, Hendy Irawan <ceefour...@gmail.com> wrote:
> Hi Neo4j developers,
>
> In my app, I need to do data integration using Pentaho Data Integration
> (PDI aka Kettle) so I use the very cool Neo4j REST API to insert nodes.
>
> Problem is, my nodes (and rels) are not indexed "properly" like when using
> Spring Data Neo4j API. And using Spring Data Neo4j in for data integration
> scenario is impractical because of overhead code needed and also due to
> performance (thousands of records for now, and will grow).
>
> What I mean by indexes are:
>
>    1. __types__ node index
>    2. __rel_types__ relationship index
>    3. custom @Indexed properties on nodes and relationships
>
> As Michael Hunger suggested here (https://twitter.com/#!/mesirii/status/149517922707054593), for #3, either:
>
>    1. use auto-indexes, or
>    2. load and persist the new nodes using Spring Data Neo4j

Hendy Irawan

unread,
Dec 21, 2011, 2:31:20 PM12/21/11
to Neo4j
I finally have a "working" poor man's reindexer :

// Poor man's reindex
Transaction tx = neo4j.beginTx();
try {
Index<Node> typesIndex = gds.index().forNodes("__types__");
for (long id = 1; id <= 1500; id++) {
try {
Node node = gds.getNodeById(id);
final Object className = node.getProperty("__type__");
try {
logger.info("Reindexing node {}: {}", id, className);
Class clazz = InterestsAdmin.class.forName((String) className);
NodeBacked obj = (NodeBacked) neo4j.findOne(id, clazz);
// reindex __types__
typesIndex.remove(node);
typesIndex.add(node, "_id_", id);
typesIndex.add(node, "className", className);
// reindex felds
logger.info("Persisting {}: {}", id, obj);
neo4j.save(obj);
} catch (ClassNotFoundException e) {
logger.error("ClassNotFound {}: {}", id, className);
}
} catch (NotFoundException e) {
// skip
}
}
tx.success();
} finally {
tx.finish();
}

It uses a hardcoded upper limit of IDs. It definitely does not scale
(there's no way I'm brute-forcing a 64-bit integer space!)
And it's *VERY* slow (about 5 nodes/sec on a Core i7 2630QM) :-(

Michael Hunger

unread,
Dec 21, 2011, 7:40:50 PM12/21/11
to ne...@googlegroups.com
Hendy,

the most sensible thing to do would be to write a server side extension that you can deploy to your server that uses just the normal SDN code to accept and write entities for the nodes (or rather batches of nodes) to be created.

the "reindex()" method that you are looking for is on Neo4jTemplate and called postEntityCreation();
But it does just the "type"-indexing.
For indexing the properties themselves the save() you did is needed.

Perhaps we can figure out a way together to solve your problem in a good way.

Cheers

Michael

Reply all
Reply to author
Forward
0 new messages