Not using indexes

137 views
Skip to first unread message

Sudha Subramanian

unread,
Jul 18, 2016, 10:15:10 AM7/18/16
to Aurelius
Hi,

I'm creating my indexes using TitanMangement right after I setup my gremlin server.  I'm using Titan1.0.1 snapshot, with cassandra and elastic search. 

This is how I create my indexes:

Configuration graphConf = new BaseConfiguration();

InputStream in = TitanGraphFactory.class.getResourceAsStream("/titan_full_en.properties");

Properties prop = new Properties();

prop.load(in);

graphConf.setProperty("storage.backend", prop.get("storage.backend"));

graphConf.setProperty("storage.hostname", prop.get("storage.hostname"));

graphConf.setProperty("index.search.backend", prop.get("index.search.backend"));

graphConf.setProperty("index.search.hostname", prop.get("index.search.hostname"));

TitanGraph graph = TitanFactory.open(graphConf);

TitanManagement mgmt = graph.openManagement();


final
PropertyKey type = mgmt.makePropertyKey("type").dataType(String.class).make();

mgmt.buildIndex("byType", Vertex.class).addKey(type).buildCompositeIndex();

ManagementSystem.awaitGraphIndexStatus(graph, "byType").call();

mgmt.updateIndex(mgmt.getGraphIndex("byType"), SchemaAction.ENABLE_INDEX).get();

mgmt.commit();


mgmt.buildIndex("byChannelUniqueIdSearchable", Vertex.class).addKey(channelId, Mapping.TEXT.asParameter()).buildMixedIndex("search");

ManagementSystem.awaitGraphIndexStatus(graph, "byChannelUniqueIdSearchable").call();

mgmt.updateIndex(mgmt.getGraphIndex("byChannelUniqueIdSearchable"), SchemaAction.ENABLE_INDEX).get();

mgmt.commit();


I use gremlin (  org.apache.tinkerpop.gremlin.driver) from my java to access the groovy methods on the server. However, the indexes are not being recognized when I perform g.V().has('type', 'something'). I don't find any records being created in elastic search as well. 

Should I create the indexes using gremlin itself? 

Thanks,
Sudha


Jason Plurad

unread,
Jul 18, 2016, 11:50:47 AM7/18/16
to Aurelius
Hi Sudha,

Not sure what is wrong here. Are you creating the index after you loaded the data? If so, you'd need to do a reindex.

The composite index seems to work fine. Composite indexes are stored in Cassandra, not in Elasticsearch. The mixed index will end up in Elasticsearch.

gremlin> graph = TitanFactory.open('inmemory'); g = graph.traversal()
==>graphtraversalsource[standardtitangraph[inmemory:[127.0.0.1]], standard]
gremlin
> mgmt = graph.openManagement();
==>com.thinkaurelius.titan.graphdb.database.management.ManagementSystem@2849434b
gremlin
>
gremlin
> type = mgmt.makePropertyKey("type").dataType(String.class).make();
==>type
gremlin
> mgmt.buildIndex("byType", Vertex.class).addKey(type).buildCompositeIndex();
==>byType
gremlin
> mgmt.commit();
gremlin
> v = graph.addVertex("type", "person", "name", "sudha")
==>v[4144]
gremlin
> graph.tx().commit()
==>null
gremlin
> g.V().has("type", "person").next()
==>v[4144]

-- Jason

Sudha Subramanian

unread,
Jul 18, 2016, 11:58:32 PM7/18/16
to Aurelius
Hi Jason,

It is a new database. I've gremlin server connect to cassandra and elastic search container. Using a separate java process, I first create the indexes and schema. The actual application uses gremlin driver client to invoke groovy script on the server. 

Do I have to use gremlin to create these indexes? 

Both from gremlin console as well as my application when I issue queries, I get  a warning saying 'Query iterating over all rows'.

Thanks,
Sudha

Jason Plurad

unread,
Jul 19, 2016, 6:53:59 AM7/19/16
to Aurelius
Hi Sudha,

Can you show a full gremlin console session where you create the index then query against it? Your specific queries ate important here because this might be a case where the indexes were created but the queries don't leverage them.

Using driver vs console shouldn't make a difference.

-- Jason
--
You received this message because you are subscribed to the Google Groups "Aurelius" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aureliusgraph...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/aureliusgraphs/d186c468-9428-4906-8122-2c8dc77cbc20%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Sudha Subramanian

unread,
Jul 19, 2016, 9:07:32 AM7/19/16
to Aurelius
Hi Jason,

This is my gremlin console and the logs from gremlin server.  I connect to the server running on my local host. This gremlin server on 8182 connects to cassandra and elastic search that has the indexes pre-defined ( when the schema was created using TitanManagement APIs).  Is this the right usage?

gremlin> :remote connect tinkerpop.server conf/remote.yaml
==>Connected - localhost/127.0.0.1:8182
gremlin> :> g.addV("type", "person", "name", "sudha")
==>v[40964312]
gremlin> :> g.tx().commit()
==>null
gremlin> :> g.V().has("type", "person").next()
==>v[40964312]
gremlin> 

Gremlin console: 
com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx  - Query requires iterating over all vertices [(type = person)]. For better performance, use indexes




Thanks,
Sudha

Jason Plurad

unread,
Jul 19, 2016, 9:37:41 PM7/19/16
to Aurelius
Hi Sudha,

You'll need to double check your configuration settings. Is the gremlin server starting with the same graph properties that you used when you created the schema?

You didn't show any of the index creation in your reply. Anyway, I just tried it with a clean unzipped titan-1.0.0-hadoop1.zip.

First I started gremlin server with ./bin/gremlin-server.sh ./conf/gremlin-server/gremlin-server.yaml

Then I started a gremlin console with ./bin/gremlin.sh

Here's the gremlin console output:

gremlin> :remote connect tinkerpop.server conf/remote.yaml
==>Connected - localhost/127.0.0.1:8182

gremlin
> :> mgmt = graph.openManagement(); type = mgmt.makePropertyKey("type").dataType(String.class).make(); mgmt.buildIndex("byType", Vertex.class).addKey(type).buildCompositeIndex(); mgmt.commit();
==>null
gremlin
> :> graph.addVertex("type", "person", "name", "sudha");
==>v[8400]

gremlin
> :> g.V().has("type", "person").next()
==>v[8400]
gremlin
> :remote close
==>Removed - Gremlin Server - [localhost/127.0.0.1:8182]

Then in the Gremlin Server output, no sign of the query warning:

0    [main] INFO  org.apache.tinkerpop.gremlin.server.GremlinServer  -
         
\,,,/
         
(o o)
-----oOOo-(3)-oOOo-----

100  [main] INFO  org.apache.tinkerpop.gremlin.server.GremlinServer  - Configuring Gremlin Server from ./conf/gremlin-server/gremlin-server.yaml
162  [main] INFO  org.apache.tinkerpop.gremlin.server.util.MetricManager  - Configured Metrics ConsoleReporter configured with report interval=180000ms
164  [main] INFO  org.apache.tinkerpop.gremlin.server.util.MetricManager  - Configured Metrics CsvReporter configured with report interval=180000ms to fileName=/tmp/gremlin-server-metrics.csv
218  [main] INFO  org.apache.tinkerpop.gremlin.server.util.MetricManager  - Configured Metrics JmxReporter configured with domain= and agentId=
219  [main] INFO  org.apache.tinkerpop.gremlin.server.util.MetricManager  - Configured Metrics Slf4jReporter configured with interval=180000ms and loggerName=org.apache.tinkerpop.gremlin.server.Settings$Slf4jReporterMetrics
682  [main] INFO  com.thinkaurelius.titan.graphdb.configuration.GraphDatabaseConfiguration  - Set default timestamp provider MICRO
737  [main] INFO  com.thinkaurelius.titan.graphdb.configuration.GraphDatabaseConfiguration  - Generated unique-instance-id=c0a8000c28716-u1401-local1
769  [main] INFO  com.thinkaurelius.titan.diskstorage.Backend  - Initiated backend operations thread pool of size 16
828  [main] INFO  com.thinkaurelius.titan.diskstorage.log.kcvs.KCVSLog  - Loaded unidentified ReadMarker start time 2016-07-20T01:26:15.031Z into com.thinkaurelius.titan.diskstorage.log.kcvs.KCVSLog$MessagePuller@5505ae1a
828  [main] INFO  org.apache.tinkerpop.gremlin.server.GremlinServer  - Graph [graph] was successfully configured via [conf/gremlin-server/titan-berkeleyje-server.properties].
829  [main] INFO  org.apache.tinkerpop.gremlin.server.util.ServerGremlinExecutor  - Initialized Gremlin thread pool.  Threads in pool named with pattern gremlin-*
1067 [main] INFO  org.apache.tinkerpop.gremlin.groovy.engine.ScriptEngines  - Loaded nashorn ScriptEngine
1319 [main] INFO  org.apache.tinkerpop.gremlin.groovy.engine.ScriptEngines  - Loaded gremlin-groovy ScriptEngine
1885 [main] INFO  org.apache.tinkerpop.gremlin.groovy.engine.GremlinExecutor  - Initialized gremlin-groovy ScriptEngine with scripts/generate-modern.groovy
1885 [main] INFO  org.apache.tinkerpop.gremlin.server.util.ServerGremlinExecutor  - Initialized GremlinExecutor and configured ScriptEngines.
1891 [main] INFO  org.apache.tinkerpop.gremlin.server.util.ServerGremlinExecutor  - A GraphTraversalSource is now bound to [g] with graphtraversalsource[standardtitangraph[berkeleyje:db/berkeley], standard]
1908 [main] INFO  org.apache.tinkerpop.gremlin.server.GremlinServer  - Executing start up LifeCycleHook
1918 [main] INFO  org.apache.tinkerpop.gremlin.server.GremlinServer  - Loading 'modern' graph data, if necessary.
6223 [main] INFO  org.apache.tinkerpop.gremlin.server.AbstractChannelizer  - Configured application/vnd.gremlin-v1.0+gryo with org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0
6224 [main] INFO  org.apache.tinkerpop.gremlin.server.AbstractChannelizer  - Configured application/vnd.gremlin-v1.0+gryo-stringd with org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0
6380 [main] INFO  org.apache.tinkerpop.gremlin.server.AbstractChannelizer  - Configured application/vnd.gremlin-v1.0+json with org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV1d0
6381 [main] INFO  org.apache.tinkerpop.gremlin.server.AbstractChannelizer  - Configured application/json with org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV1d0
6452 [gremlin-server-boss-1] INFO  org.apache.tinkerpop.gremlin.server.GremlinServer  - Gremlin Server configured with worker thread pool of 1, gremlin pool of 8 and boss thread pool of 1.
6452 [gremlin-server-boss-1] INFO  org.apache.tinkerpop.gremlin.server.GremlinServer  - Channel started at port 8182.
33358 [gremlin-server-worker-1] INFO  org.apache.tinkerpop.gremlin.server.op.OpLoader  - Adding the standard OpProcessor.
33359 [gremlin-server-worker-1] INFO  org.apache.tinkerpop.gremlin.server.op.OpLoader  - Adding the control OpProcessor.
33362 [gremlin-server-worker-1] INFO  org.apache.tinkerpop.gremlin.server.op.OpLoader  - Adding the session OpProcessor.
^C79055 [gremlin-server-shutdown] INFO  org.apache.tinkerpop.gremlin.server.GremlinServer  - Shutting down thread pools.
79057 [gremlin-server-stop] INFO  org.apache.tinkerpop.gremlin.server.GremlinServer  - Executing shutdown LifeCycleHook


-- Jason

Sudha Subramanian

unread,
Jul 20, 2016, 10:20:59 AM7/20/16
to Aurelius
Hi Jason,


Thanks a lot for the detailed response.

The configuration settings are ok and gremlin server is using the same graph properties as in schema creation. I can see the vertexes and edges are being created when my application runs. Its only the index that is not being used in queries and I see warning messages. 

I created an index using gremlin console ( the way you have described), it works fine for me. The indexes are recognized and I don't see any warnings. However, I was hoping that the indexes that I created previously on my Cassandra instance would be effective, and that's the reason I did not show any index creation in my previous reply. 

Here is the snippet that worked for me:

gremlin> :> mgmt = graph.openManagement(); token=mgmt.makePropertyKey("token").dataType(String.class).make(); mgmt.buildIndex("byToken", Vertex.class).addKey(token).buildCompositeIndex(); mgmt.commit();
==>null
gremlin> :> graph.addVertex("token", "identifier", "id", "1")
==>v[4240]
gremlin> :> g.V().has("token", "identifier").next()
==>v[4240]
gremlin> :remote close


Now I want to check the cassandra instance for the index that I created above. Following is my csql query: 

SELECT column_name, index_name, index_options, index_type, component_index  FROM system.schema_columns  WHERE keyspace_name='titan';

1. Why am I not seeing any indexes? I was hoping to see the 'byToken' that I created above. I see all nulls.

 column_name | index_name | index_options | index_type | component_index
-------------+------------+---------------+------------+-----------------
     column1 |       null |          null |       null |            null
         key |       null |          null |       null |            null
       value |       null |          null |       null |            null
     column1 |       null |          null |       null |            null
         key |       null |          null |       null |            null
       value |       null |          null |       null |            null



2. Should I create the index via gremlin?

I connect to gremlin using the following construct and submit queries. 

GryoMapper mapper = GryoMapper.build().addRegistry(TitanIoRegistry.INSTANCE).create();

cluster = Cluster.build().serializer(new GryoMessageSerializerV1d0(mapper)).create();

client = cluster.connect();

client.init();

The groovy file in the server has the methods that I invoke:

def globals = [:]

globals << [g : graph.traversal()]
......



In my current implementation, I have a different Java process that creates the indexes for me during installation, as shown below

Configuration graphConf = new BaseConfiguration();

InputStream in = TitanGraphFactory.class.getResourceAsStream("/titan_full_en.properties");

Properties prop = new Properties();

prop.load(in);

graphConf.setProperty("storage.backend", prop.get("storage.backend"));

graphConf.setProperty("storage.hostname", prop.get("storage.hostname"));

graphConf.setProperty("index.search.backend", prop.get("index.search.backend"));

graphConf.setProperty("index.search.hostname", prop.get("index.search.hostname"));

TitanGraph graph = TitanFactory.open(graphConf);

TitanManagement mgmt = graph.openManagement();

final PropertyKey type = mgmt.makePropertyKey("type").dataType(String.class).make();
mgmt.buildIndex("byType", Vertex.class).addKey(type).buildCompositeIndex();

ManagementSystem.awaitGraphIndexStatus(graph, "byType").call();

mgmt.updateIndex(mgmt.getGraphIndex("byType"), SchemaAction.ENABLE_INDEX).get();
mgmt.commit();


Should I create the index using client.submit("mgmt=.....") instead of directly using the Titan APIs? Do you think that's the root cause for the index not being used during queries?

Jason Plurad

unread,
Jul 20, 2016, 2:14:00 PM7/20/16
to Aurelius
Hi Sudha,


1. Why am I not seeing any indexes? I was hoping to see the 'byToken' that I created above. I see all nulls.

Because that's not how the indexes are stored. I think you'd have to poke around in the graphindex table in your titan keyspace, but then you'd find out that everything is non-readable from cqlsh anyway.

If you want to verify that the index is there, go through TitanManagement

gremlin> :> mgmt = graph.openManagement(); hasIndex = mgmt.containsGraphIndex("byType"); mgmt.rollback(); hasIndex
==>true

gremlin
> :> mgmt = graph.openManagement(); idx = mgmt.getGraphIndex("byType"); pk = idx.getFieldKeys()[0]; s = "index name="+idx.name()+" isCompositeIndex="+idx.isCompositeIndex()+" fieldKey="+pk+" status="+idx.getIndexStatus(pk); mgmt.rollback(); s
==>index name=byType isCompositeIndex=true fieldKey=type status=ENABLED


2. Should I create the index via gremlin? Should I create the index using client.submit("mgmt=.....") instead of directly using the Titan APIs? Do you think that's the root cause for the index not being used during queries?

It shouldn't make a difference whether you create the index via the Gremlin Console or via a Gremlin Driver program or via direct java calls to TitanManagement -- ultimately it is the same exact code. Note that many people on the mailing lists use Gremlin Console in responses just because it's easy to try out quickly.

You can remove these 2 lines from your index creation code. Composite indexes are enabled immediately after mgmt.commit() if you are indexing a property key that hasn't been used previously.


ManagementSystem.awaitGraphIndexStatus(graph, "byType").call();
mgmt
.updateIndex(mgmt.getGraphIndex("byType"), SchemaAction.ENABLE_INDEX).get();


-- Jason

Sudha Subramanian

unread,
Jul 21, 2016, 11:01:14 AM7/21/16
to Aurelius
Thanks Jason. 

I've created all my indexes using console for now. That worked for me. 

Thanks again,
Sudha
Reply all
Reply to author
Forward
0 new messages