Multiple writers cause inconsistent vertices

182 views
Skip to first unread message

Ajay Srivastava

unread,
Sep 23, 2017, 6:36:27 AM9/23/17
to JanusGraph users
Hi,

I am using janusgraph-0.1.1 with HBase.
The data is being loaded in graph using three clients connecting to same gremlin server. The clients are executing same code that checks if vertex is not already present in the graph then it inserts the vertex.
I was verifying the data and found following problem -

scala> graph.V().hasLabel("Root").toList
15:27:22,361  WARN StandardJanusGraphTx:1273 - Query requires iterating over all vertices [(~label = Root)]. For better performance, use indexes
res11: List[gremlin.scala.Vertex] = List(v[737304], v[4136], v[442432])
Results is three vertices.

scala> graph.V().hasLabel("Root").properties("URI").toList
15:27:52,275  WARN StandardJanusGraphTx:1273 - Query requires iterating over all vertices [(~label = Root)]. For better performance, use indexes
res13: List[gremlin.scala.Property[Any]] = List(vp[URI->Root], vp[URI->Root], vp[URI->Root])
Result is three vertices having same URI.

scala> val uri = Key[String]("URI")
scala> graph.V().has(uri, "Root").toList
res12: List[gremlin.scala.Vertex] = List(v[442432])
Since vertices are uniquely indexed on URI, this result is correct. Janusgraph should not have allowed to insert vertices having same URI but it did as displayed in above two outputs.

I am new to janusgraph and have many questions -

1) What am I doing wrong here ?
2) Multiple clients writing to same gremlin server may create problem ?
3) How to read back the schema created by me ?
4) Below is the code for creating schema. Is this correct ?

/* Creating three types of vertices having same properties and indexed on same property URI */
    def createVertexSchema : Boolean = {
        val vertexLabels = Array("Root", "Lang", "Cocpt")

        val GUID     = mgt.makePropertyKey("GUID").dataType(classOf[String]).make
        val Name = mgt.makePropertyKey("Name").dataType(classOf[String]).make
        val URI      = mgt.makePropertyKey("URI").dataType(classOf[String]).make

        vertexLabels.foreach {
            vertexLabel =>
                val vLabel   = mgt.makeVertexLabel(vertexLabel).make
        }

        mgt.buildIndex("UniqueURI", classOf[Vertex]).addKey(URI).unique().buildCompositeIndex()
        true
    }

Regards,
Ajay

Robert Dale

unread,
Sep 23, 2017, 7:52:51 AM9/23/17
to Ajay Srivastava, JanusGraph users
Do you `mgt.commit()`?  Do you `mgt.awaitGraphIndexStatus(graph, 'UniqueURI').call()`?

You can use this script in the console to help see the state of the index - https://gist.github.com/robertdale/ad4c63910009dd1118abe67b33ce41e1


Robert Dale

--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/D8F1F502-3482-4A8E-AB9A-5021273DC697%40guavus.com.
For more options, visit https://groups.google.com/d/optout.

Ajay Srivastava

unread,
Sep 24, 2017, 5:47:46 AM9/24/17
to Robert Dale, JanusGraph users
Thanks Robert.
I have commit and await in my code. Here is more information -
scala> val mgt = graph.openManagement()
mgt: org.janusgraph.core.schema.JanusGraphManagement = org.janusgraph.graphdb.database.management.ManagementSystem@46f31564

scala> val index = mgt.getGraphIndexes(classOf[Vertex]).iterator.next
index: org.janusgraph.core.schema.JanusGraphIndex = UniqueURI

scala> val properties = index.getFieldKeys
properties: Array[org.janusgraph.core.PropertyKey] = Array(URI)

scala> properties.toList
res30: List[org.janusgraph.core.PropertyKey] = List(URI)

scala> index.getIndexStatus(properties(0))
res29: org.janusgraph.core.schema.SchemaStatus = ENABLED


So, the status of index is “ENABLED”. Should it be “REGISTERED" ?
I deleted db and recreated schema again, the awaitGraphIndexStatus call times out -
14:03:04,435  INFO GraphIndexStatusWatcher:81 - Some key(s) on index UniqueURI do not currently have status REGISTERED: URI=ENABLED
14:03:04,435  INFO GraphIndexStatusWatcher:90 - Timed out (PT1M) while waiting for index UniqueURI to converge on status REGISTERED

I waited for half an hour but the status remains as “ENABLED”.
Note that there are no records in db and I create all vertex/edge properties and indexes in one transaction. I have tried creating only vertex properties, labels and index in one transaction and that also is not working.


Regards,
Ajay

Robert Dale

unread,
Sep 24, 2017, 7:33:08 AM9/24/17
to Ajay Srivastava, JanusGraph users
Enabled is good.

Robert Dale

Robert Dale

unread,
Sep 24, 2017, 7:34:10 AM9/24/17
to Ajay Srivastava, JanusGraph users
Have you read through this section on data consistency?  http://docs.janusgraph.org/latest/eventual-consistency.html

Robert Dale

Kevin Schmidt

unread,
Sep 24, 2017, 10:25:47 AM9/24/17
to Robert Dale, Ajay Srivastava, JanusGraph users
That is the pertinent section, but also see https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/aureliusgraphs/z6kyGSlifXE/aLc2Zwb_BAAJ which was with Titan 1.0 and Cassandra, but probably still applies.

Ajay Srivastava

unread,
Sep 24, 2017, 10:38:25 AM9/24/17
to Robert Dale, JanusGraph users
Taking the lock on property and index solved the consistency problem but it took 10 mins. to ingest data from three clients while this data can be added from one client in less than 2 min.
Our system will eventually be 99% read and 1% write. Single write client would be enough for that. But I need to ingest millions of files to initialise the graph database. With this speed, it will take years to ingest data. 


Regards,
Ajay


To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/CABed_4ocmXy0-fMdPG1-WihDaU9Qha%2B_PYehRkt5Tq8K9j8K0g%40mail.gmail.com.

Ajay Srivastava

unread,
Sep 24, 2017, 10:56:17 AM9/24/17
to Kevin Schmidt, Robert Dale, JanusGraph users
Hi Kevin,
This is the exact problem I am facing.
So, how are you handling duplicates ? I assume that dedup() will not work, as duplicate vertices will have different Ids. And if indexes are not used then read queries are going to be slower.


Regards,
Ajay

Ted Wilmes

unread,
Oct 2, 2017, 10:30:28 AM10/2/17
to JanusGraph users
Hi Ajay,
If at all possible, I usually try to remove the need for unique constraints on indexes. You mentioned that eventually 
you'll have about a 99:1 r/w ratio. Does this mean that you'll do a big bulk load up front? If so, could you structure your load
so that you do not need to have the unique index enabled and you can instead build it after the load? For example, maybe
you could load all of your vertices first, and then load the edges. This would require some preprocessing but would speed
things up greatly. This is how the TinkerPop BulkLoaderVertexProgram [1] that can be run against Janus works, granted, you
must put your data in one of the support adjacency list formats first or provide a custom reader.

If you can't load the vertices separate, maybe you could partition your input data so that you could isolate
reads and writes for any specific vertex to the same thread, this would let you safely perform a read before write
to check for existence without having to worry about race conditions. Combine this with an in-thread cache of
what vertices have already been inserted and their corresponding Janus IDs and you'll speed things up.

--Ted

Kevin Schmidt

unread,
Oct 2, 2017, 10:38:59 AM10/2/17
to Ajay Srivastava, Robert Dale, JanusGraph users
The way we handled it was to not use locks or a unique index, but do keep a non-unique index, but then accept that there may be duplicate vertices and either construct our traversals to handle it, or periodically check/detect the duplicates and remove/fix them.

Ajay Srivastava

unread,
Oct 4, 2017, 6:54:47 AM10/4/17
to Ted Wilmes, JanusGraph users
Thanks Ted.
I am working on it.


Regards,
Ajay

To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/37dc6059-a06d-45f8-8ee7-c7bd75b821a7%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages