Janusgraph transaction was closed All of a sudden

31 views
Skip to first unread message

naresh...@gmail.com

unread,
Sep 25, 2020, 8:52:34 AM9/25/20
to JanusGraph users

Hi,
am using spark for parallel processing with mix of batch loading(at transaction level) and normal transaction.

case 1# some cases am using bulk loading at transaction level
txn = janusGraph.buildTransaction().enableBatchLoading().start();
..
create vertices and edges
..
txn.commit()

case 2# with normal transaction
txn = janusGraph.newTransaction();
..
create vertices and edges
..
txn.commit()

got below exception in the middle of processing and transaction did not commit hence failed to create vertices.

java.lang.IllegalStateException: Cannot access element because its enclosing transaction is closed and unbound at org.janusgraph.graphdb.transaction.StandardJanusGraphTx.getNextTx(StandardJanusGraphTx.java:305) at org.janusgraph.graphdb.vertices.AbstractVertex.tx(AbstractVertex.java:60) at org.janusgraph.graphdb.vertices.AbstractVertex.property(AbstractVertex.java:152) at org.janusgraph.core.JanusGraphVertex.property(JanusGraphVertex.java:72) at org.janusgraph.core.JanusGraphVertex.property(JanusGraphVertex.java:33) 


it happens very rare and not sure which case it will happen

can you please suggest, is there any case where janusgraph can commit/close transaction automatically?
we are explicitly opening, commiting and closing  txns, so no the other place where we can close/commit in the middle of processing.

Thanks,
Naresh

HadoopMarc

unread,
Sep 25, 2020, 11:39:45 AM9/25/20
to JanusGraph users
Hi Naresh,

It is the responsibility of the application to commit transactions. One application example is gremlin-server which can do that for you, but this may be not be the most convenient for bulk loading.

If you use spark, a nice way is to use the RDD.mapPartions() function. If you have partitions of the size of a single transaction (1000-10000 vertices), you can catch any exceptions and rollback the transaction on failure and commit on success. Spark will automatically retry a failed partition and by using mapPartitions() you are sure that there is exactly one succesful run for any partition.

Reasons for occasional failure may be too large transactions or an indexing backend that cannot keep up with the ingestion. ID block exhaustion generates its own exceptions.

HTH,    Marc

Op vrijdag 25 september 2020 om 14:52:34 UTC+2 schreef naresh...@gmail.com:

naresh...@gmail.com

unread,
Sep 25, 2020, 5:29:16 PM9/25/20
to JanusGraph users
Thanks Marc,
         i think i got answer from you. it might be because of too many transactions or indexing backend that cannot keep up with the ingestion. but i have few questions on this.

am using janusgraph client 0.3.2 with Hbase(8 region servers), elastic

1)what is the suggestible number of transactions per janugraph instance? and i hope should be able to replicate it by creating too many transactions or any other best way to replicate and test ?
2)  an indexing backend that cannot keep up with the ingestion -- any idea which case it will happen? please suggest any best way to replicate and test ?


and thanks for suggestions on spark RDD.mapPartions() function
Yes we have enough partitions with each 500 vertices max.
not using exactly  RDD.mapPartions(),  but using forEachPartition () and vertices will be created in spark action/operation i.e stream.forEachRDD -> forEachPartition (.. creating vertices here...). please suggest if this is not the right way?

Thanks,
Naresh

HadoopMarc

unread,
Sep 26, 2020, 7:47:21 AM9/26/20
to JanusGraph users
Hi Narish,

First about the spark RDD, you are absolutely right that RDD.forEachPartition() is the right method to use, my bad. Because it returns void there are no later spark steps that could trigger a second execution. But does that mean that your spark job did not finish succesfully, despite the few transaction failures? I would expect that spark would reschedule the corresponding task until it succeeds. The only problem you can have then is that transactions are not properly closed (the reason for the exception you showed?), so that is why I suggested to catch the exception, rollback the transaction and raise your own exception towards spark.

Your other questions.
1) If you use spark, I would expect that you have a singleton object per spark executor that contains the janusgraph connection and that you manage parallelism on the spark executor with the number of cores per executor. If you use more transactions per spark task/core, you loose the option to rollback the transaction if needed and have spark reschedule the task.

2) It is just something that people sometimes complain about. I guess this should be recognizable from the exceptions raised. Of course it will not hurt to monitor CPU and ram usage of your elasticsearch instances. It will only happen if the elastic cluster is the weakest link in the chain, that is if janusgraph and HBase can process more transactions than elastic can handle.

Last remark, it is not unusual that a few spark tasks fail, it is just something that happens for all kinds of reasons in complex distributed setups. Your application must simple be able to handle these failures and reschedule the task.

Best wishes,      Marc

Op vrijdag 25 september 2020 om 23:29:16 UTC+2 schreef naresh...@gmail.com:

Naresh Babu Y

unread,
Sep 26, 2020, 8:44:23 AM9/26/20
to janusgra...@googlegroups.com
Hi Marc, thanks for replying so quickly

I agree with your remark, spark task is able to retry Hance it can be handled easily.

Am mostly focusing on root cause,so that we can fix actual problem

My Comments on same questions
1) Yes, jausgraph instance is singleton. And we create one transaction per task.
Since "too many transactions" been raised,am just checking will it be any issue if we have many tasks/cores (indirectly too many transactions per single jausgraph instance at same time) and any limit on this?

2) Yes, good to monitor elastic.
    Will it happen only for writes or it can happen for reads also?

Bad thing is, no other errors printed apart from illegalstateexception and not able to replicate to see exactly what's happening


Thanks,
Naresh

--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/88a4ac3e-22dd-45cf-8dd3-754070f7f6e3n%40googlegroups.com.

HadoopMarc

unread,
Sep 27, 2020, 2:38:51 PM9/27/20
to JanusGraph users
Hi Naresh,

As said, for finding a root cause first be sure that failing transactions are rolled back (you did not confirm that you took care of that).

1) I would expect that you use just 4 or 8 cores per executor, so this number of parallel transactions is insignificant to janusgraph. If there would be a problem, spark would simply raise an OOM exception. Also note that janusgraph instances in different executors are mostly independent, apart from the id-manager and their load on the backends.

2) Good question on read/write load in elastic. My mind was on the writing, because of the processing related to indexing, but during ingestion of edges there may also a be lot of lookups for the vertices that are not present in application memory or in the janusgraph cache. So, I guess both reads and writes matter.

Best wishes,    Marc
Op zaterdag 26 september 2020 om 14:44:23 UTC+2 schreef naresh...@gmail.com:

Naresh Babu Y

unread,
Sep 27, 2020, 5:44:54 PM9/27/20
to janusgra...@googlegroups.com
Hello Marc,
Yes, am doing rollback for any failures and committing on success.


Thanks,
Naresh

You received this message because you are subscribed to a topic in the Google Groups "JanusGraph users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/janusgraph-users/CWXZ9GuxwwU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to janusgraph-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/b5682751-e364-4501-93a7-825b52a797f2n%40googlegroups.com.

HadoopMarc

unread,
Sep 28, 2020, 2:02:58 AM9/28/20
to JanusGraph users
Hi Naresh,

I did a quick search on the janusgraph issue list. This did not reveal much, but the issue below suggests that the exception you face only occurs after earlier exceptions (in this case a NullPointerException). So, be sure that you did not skip other exceptions in your stacktrace.


Best wishes,    Marc



Op zondag 27 september 2020 om 23:44:54 UTC+2 schreef naresh...@gmail.com:
Reply all
Reply to author
Forward
0 new messages