[Tinkerpop 3] Trying to create indices with already existing data

739 views
Skip to first unread message

Damian Wloch

unread,
Oct 5, 2015, 9:22:58 AM10/5/15
to Gremlin-users
Hello,

I'm running Titan 1.0.0 bundled with Tinkerpop 3.0.1-incubating and I can define a property using the management system, build an index then add a vertex using the indexed property and query it by the index just fine. When I try to define a property, create a vertex with it present and then build the index, it seems to get stuck with status "INSTALLED" and I can't really seem to do anything with it such as enabling it, removing it or anything else. Here's how I was doing it:
mgmt = graph.openManagement()
notname
= mgmt.makePropertyKey('notname').dataType(String.class).cardinality(Cardinality.SINGLE).make()
mgmt
.commit()
graph
.addVertex("notname","me")
graph
.tx().commit()
mgmt
= graph.openManagement()
mgmt
.buildIndex('bynotname', Vertex.class).addKey(notname).buildCompositeIndex()
//tried the below after commiting, before, waiting on graph index status always gave "INSTALLED" and the reindex always failed
mgmt
.updateIndex(mgmt.getGraphIndex("bynotname"), SchemaAction.REINDEX).get()
mgmt
.commit()


Effy

unread,
Oct 5, 2015, 9:49:23 AM10/5/15
to Gremlin-users
+1 on that issue.
I managed to get it working on Berkeley, but couldn't get it working on Cassandara.
I read on another post, that it might work when using Gremlin-Console instead of Gremlin-Server, but I haven't managed to get Cassandra running on Gremlin-Console yet.

Effy

Stephen Mallette

unread,
Oct 7, 2015, 7:31:07 AM10/7/15
to Gremlin-users
Presumably it should work in either place, but I'd say that using Gremlin Console for schema updates is the recommended convention.  Not sure why it isn't working for you both - perhaps others can chime in to help.

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/d242c432-5c33-4c39-8e7e-c8d7f79a49e0%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Jonathan Kelsey

unread,
Oct 7, 2015, 8:02:19 AM10/7/15
to Gremlin-users
I've tried the code in gremlin-console with Cassandra and can confirm it doesn't work as expected.

Stephen Mallette

unread,
Oct 7, 2015, 8:50:17 AM10/7/15
to Gremlin-users
Here's what worked for me on BerkeleyDB and Cassandra in Gremlin Console:

graph = TitanFactory.open('conf/titan-cassandra-embedded.properties')
mgmt = graph.openManagement()
notname = mgmt.makePropertyKey('notname').dataType(String.class).cardinality(Cardinality.SINGLE).make()
mgmt.commit()
graph.addVertex("notname","me")
graph.tx().commit()
mgmt = graph.openManagement()
notname = mgmt.getPropertyKey("notname")
mgmt.buildIndex('bynotname', Vertex.class).addKey(notname).buildCompositeIndex()
mgmt.commit()
com.thinkaurelius.titan.graphdb.database.management.ManagementSystem.awaitGraphIndexStatus(graph, 'bynotname').status(SchemaStatus.REGISTERED).call()
mgmt = graph.openManagement()
mgmt.updateIndex(mgmt.getGraphIndex("bynotname"), SchemaAction.REINDEX).get()
mgmt.commit()

Note the use of awaitGraphIndexStatus() after buildIndex() and before updateIndex().



-------------------------
The information in this message is private and confidential and may be legally privileged. If you have received this message in error, please notify us and remove it from your system. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful.

For more information:
     Twitter: @semblent1
     LinkedIn: https://www.linkedin.com/company/semblent
     The Internet: www.semblent.com





-------------------------
The information in this message is private and confidential and may be legally privileged. If you have received this message in error, please notify us and remove it from your system. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful.

For more information:
     Twitter: @semblent1
     LinkedIn: https://www.linkedin.com/company/semblent
     The Internet: www.semblent.com




--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.

Damian Wloch

unread,
Oct 7, 2015, 10:07:36 AM10/7/15
to Gremlin-users
Hi Stephen,

I tried that step as well, I've attached a stack trace.
registered.log

Stephen Mallette

unread,
Oct 7, 2015, 10:11:17 AM10/7/15
to Gremlin-users
the interesting difference here of course is this:

gremlin> com.thinkaurelius.titan.graphdb.database.management.ManagementSystem.awaitGraphIndexStatus(graph, 'bynotname').status(SchemaStatus.REGISTERED).call()
==>GraphIndexStatusReport[success=true, indexName='bynotname', targetStatus=REGISTERED, notConverged={}, converged={notname=REGISTERED}, elapsed=PT6.065S]

note that my output reads "success=true" and yours is "false", but i don't know why yours is failing.  And you are using Cassandra?

Anyone know how to get the reason for failure on awaitGraphIndexStatus?

Damian Wloch

unread,
Oct 7, 2015, 10:20:24 AM10/7/15
to Gremlin-users
Yep, Cassandra from DSE 4.7.3, I'll try getting a hold of a normal Cassandra as well, see if that makes a difference. What version are you running?

Dan LaRocque

unread,
Oct 7, 2015, 11:10:44 AM10/7/15
to gremli...@googlegroups.com
Hi Damian,
 
You must close transactions between defining a new index and attempting to reindex it.  Don't use the sequence of management operations quoted above.  Attempting to define and reindex in the same transaction is where you run into trouble.  Here's the relevant doc chapter: http://s3.thinkaurelius.com/docs/titan/1.0.0/indexes.html.  Also, here are some commands showing how to approach this:
 
 
// cd titan-1.0.0-hadoop1
// rm -rf ./db
// bin/titan.sh start
// bin/gremlin.sh
 
// Start Titan against C* + ES and define a propert key called "foo"
graph = TitanFactory.open("conf/titan-cassandra-es.properties")
mgmt = graph.openManagement()
pkey = mgmt.makePropertyKey("foo").dataType(String.class).cardinality(Cardinality.SINGLE).make()
mgmt.commit()
 
// Add a vertex with the property key set to "bar"
graph.addVertex("foo","bar")
graph.tx().commit()
 
// Define a composite graph index on the property key created above
mgmt = graph.openManagement()
pkey = mgmt.getPropertyKey("foo")
mgmt.buildIndex("index", Vertex.class).addKey(pkey).buildCompositeIndex()
mgmt.commit()
 
// Wait for the index to transition from INSTALLED to REGISTERED
import com.thinkaurelius.titan.graphdb.database.management.ManagementSystem
ManagementSystem.awaitGraphIndexStatus(graph, "index").call()
// The await call() must return something like this:
// ==>GraphIndexStatusReport[success=true, indexName='index', targetStatus=REGISTERED, notConverged={}, converged={foo=REGISTERED}, elapsed=PT0.009S]
// the success property must be true (implying that notConverged is empty)
 
// Retrieve a vertex by property condition foo=bar;
// this logs a linear scan warning because we haven't reindexed yet,
// and the index doesn't know about any properties that predate it
graph.query().has("foo", "bar").vertices()
 
// Reindex
mgmt = graph.openManagement()
index = mgmt.getGraphIndex("index")
mgmt.updateIndex(index, SchemaAction.REINDEX).get()
mgmt.commit()
 
// Reopen Titan and retrieve a vertex by property condition foo=bar;
// this hits the index and does not log a linear scan warning
graph.close()
graph = TitanFactory.open("conf/titan-cassandra-es.properties")
graph.query().has("foo", "bar").vertices()
 
// Note: the index's status has transitioned from
// REGISTERED to ENABLED as part of the REINDEX action
mgmt = graph.openManagement()
pkey = mgmt.getPropertyKey("foo")
index = mgmt.getGraphIndex("index")
index.getIndexStatus(pkey) // ENABLED
mgmt.rollback()
 
 
 
thanks,
Dan

Damian Wloch

unread,
Oct 28, 2015, 10:38:19 AM10/28/15
to Gremlin-users
Thank you!

I didn't have time to test reindexing again until now but your solution worked perfectly.

ar...@biginfolabs.com

unread,
Jul 28, 2016, 6:00:35 AM7/28/16
to Gremlin-users
Hi,
 I am facing a similar issue while coding in Java.

#SCENARIO 1 INDEXING DOES NOT WORK
This is what I have observed

TitanManagement mgmt = graph.openManagement();
PropertyKey node = mgmt.makePropertyKey("node").dataType(String.class).make();
mgmt.commit();

mgmt = graph.openManagement();
PropertyKey prop = mgmt.getPropertyKey("node");
TitanManagement.IndexBuilder indexBuilder = mgmt.buildIndex("node",Vertex.class).addKey(prop);
indexBuilder.unique().buildCompositeIndex();
mgmt.commit();

Now when I try 
System.out.println(graph.openManagement().containsGraphIndex("node"));
-> true
But
System.out.println(graph.query().has("node", "123").vertices().iterator().hasNext());
->Queries through all vertices try indexing
-> false

Here even though index exists , query is still iterating through all vertices.

#SCENARIO 2 INDEXING  WORKs by removing intermediate commit
This is what I have observed

TitanManagement mgmt = graph.openManagement();
PropertyKey node = mgmt.makePropertyKey("node").dataType(String.class).make();
TitanManagement.IndexBuilder indexBuilder = mgmt.buildIndex("node",Vertex.class).addKey(node);
indexBuilder.unique().buildCompositeIndex();
mgmt.commit();

Now when I try 
System.out.println(graph.openManagement().containsGraphIndex("node"));
-> true
System.out.println(graph.query().has("node", "123").vertices().iterator().hasNext());
-> false

Here index exists , query does not iterate over all vertices.

But as my use case is such that I have to create the property key first commit and then create index,
I tried the following

#SCENARIO 3 INDEXING DOES NOT WORKs and The status is not changing from INSTALLED to REGISTERED.

TitanManagement mgmt = graph.openManagement();
PropertyKey node = mgmt.makePropertyKey("node").dataType(String.class).make();
mgmt.commit();

mgmt = graph.openManagement();
PropertyKey prop = mgmt.getPropertyKey("node");
TitanManagement.IndexBuilder indexBuilder = mgmt.buildIndex("node",Vertex.class).addKey(prop);
indexBuilder.unique().buildCompositeIndex();
mgmt.commit();

com.thinkaurelius.titan.graphdb.database.management.ManagementSystem
.awaitGraphIndexStatus(graph,"node").call(); //FAILING

mgmt = graph.openManagement();
mgmt.updateIndex(mgmt.getGraphIndex("node"), SchemaAction.REINDEX).get();
mgmt.commit();

The status does not change from INSTALLED to REGISTERED in this scenario.

My question is is there any way to create the property key first, commit and then later do indexing?

Regards
Arun

Ryan Spangler

unread,
Jun 16, 2017, 6:25:25 PM6/16/17
to Gremlin-users
I am also having this problem, for what it is worth.

I have created an index for a key that already exists and contains a lot of
data.

Following the exact steps for reindexing above from Dan, the process fails
at this step:

`ManagementSystem.awaitGraphIndexStatus(graph, "index").call()`

It spins and spins forever but never changes from INSTALLED to REGISTERED.

Is there anything else to try? Is the index just forever stuck now and I
cannot reindex this property??
Reply all
Reply to author
Forward
0 new messages