co.cask.tephra.TransactionNotInProgressException

358 views
Skip to first unread message

Jyotirmoy Sundi

unread,
Dec 17, 2014, 9:41:01 PM12/17/14
to cdap...@googlegroups.com
See a lot transaction commit errors, do we need to bump some config for this ?

2014-12-18T02:16:04,578Z WARN  c.c.t.TransactionContext [ip] [FlowletProcessDriver-flowletTest-3-executor] TransactionContext:commit(TransactionContext.java:201) - Transaction 1418868875970000003 is not in progress.
co.cask.tephra.TransactionNotInProgressException: canCommit() is called for transaction 1418868875970000003 that is not in progress (it is known to be invalid)
at co.cask.tephra.distributed.TransactionServiceClient$5.execute(TransactionServiceClient.java:335)
at co.cask.tephra.distributed.TransactionServiceClient$5.execute(TransactionServiceClient.java:328)
at co.cask.tephra.distributed.TransactionServiceClient.execute(TransactionServiceClient.java:219)
at co.cask.tephra.distributed.TransactionServiceClient.execute(TransactionServiceClient.java:184)
at co.cask.tephra.distributed.TransactionServiceClient.commit(TransactionServiceClient.java:327)
at co.cask.tephra.TransactionContext.commit(TransactionContext.java:198)
at co.cask.tephra.TransactionContext.finish(TransactionContext.java:80)
at co.cask.cdap.internal.app.runtime.flow.FlowletProcessDriver.postProcess(FlowletProcessDriver.java:326)
at co.cask.cdap.internal.app.runtime.flow.FlowletProcessDriver.handleProcessEntry(FlowletProcessDriver.java:287)
at co.cask.cdap.internal.app.runtime.flow.FlowletProcessDriver.access$000(FlowletProcessDriver.java:61)
at co.cask.cdap.internal.app.runtime.flow.FlowletProcessDriver$1.run(FlowletProcessDriver.java:230)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:662)
2014-12-18T02:16:04,583Z ERROR c.c.c.i.a.r.f.FlowletProcessDriver [ip] [FlowletProcessDriver-flowletTest-3-executor] FlowletProcessDriver:postProcess(FlowletProcessDriver.java:333) - Transaction operation failed: Transaction 1418868875970000003 is not in progress.
co.cask.tephra.TransactionFailureException: Transaction 1418868875970000003 is not in progress.
at co.cask.tephra.TransactionContext.commit(TransactionContext.java:202)
at co.cask.tephra.TransactionContext.finish(TransactionContext.java:80)
at co.cask.cdap.internal.app.runtime.flow.FlowletProcessDriver.postProcess(FlowletProcessDriver.java:326)
at co.cask.cdap.internal.app.runtime.flow.FlowletProcessDriver.handleProcessEntry(FlowletProcessDriver.java:287)
at co.cask.cdap.internal.app.runtime.flow.FlowletProcessDriver.access$000(FlowletProcessDriver.java:61)
at co.cask.cdap.internal.app.runtime.flow.FlowletProcessDriver$1.run(FlowletProcessDriver.java:230)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:662)
Caused by: co.cask.tephra.TransactionNotInProgressException: canCommit() is called for transaction 1418868875970000003 that is not in progress (it is known to be invalid)
at co.cask.tephra.distributed.TransactionServiceClient$5.execute(TransactionServiceClient.java:335)
at co.cask.tephra.distributed.TransactionServiceClient$5.execute(TransactionServiceClient.java:328)
at co.cask.tephra.distributed.TransactionServiceClient.execute(TransactionServiceClient.java:219)
at co.cask.tephra.distributed.TransactionServiceClient.execute(TransactionServiceClient.java:184)
at co.cask.tephra.distributed.TransactionServiceClient.commit(TransactionServiceClient.java:327)
at co.cask.tephra.TransactionContext.commit(TransactionContext.java:198)
at co.cask.tephra.TransactionContext.finish(TransactionContext.java:80)
at co.cask.cdap.internal.app.runtime.flow.FlowletProcessDriver.postProcess(FlowletProcessDriver.java:326)
at co.cask.cdap.internal.app.runtime.flow.FlowletProcessDriver.handleProcessEntry(FlowletProcessDriver.java:287)
at co.cask.cdap.internal.app.runtime.flow.FlowletProcessDriver.access$000(FlowletProcessDriver.java:61)
at co.cask.cdap.internal.app.runtime.flow.FlowletProcessDriver$1.run(FlowletProcessDriver.java:230)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:662)

Terence Yim

unread,
Dec 17, 2014, 10:28:32 PM12/17/14
to Jyotirmoy Sundi, cdap...@googlegroups.com
Hi Sundi,

Sorry that the error message is not clear enough. This basically means the flowlet method (whether it’s @Tick or @ProcessInput method) takes longer than 30 seconds to return (30 seconds is the default timeout limit of a flowlet transaction). Try to reduce the fetch size to see if that helps. We are also working on a kakfa-flowlet-pack improvement to fetch Kafka fetch outside of transaction to reduce the chance of having transaction timeout.

Terence

--
You received this message because you are subscribed to the Google Groups "CDAP User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cdap-user+...@googlegroups.com.
To post to this group, send email to cdap...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cdap-user/4a045fa4-4fd0-4c23-9722-e4b44eab8254%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jyotirmoy Sundi

unread,
Dec 18, 2014, 12:09:22 AM12/18/14
to Terence Yim, cdap...@googlegroups.com
Hi Terence,
          Thanks Terence, is it possible to get the link to the JIRA.
We iterate through a trovemap and write to the dataset for the keys. Seems like this might cause the issue. Is it possible to do batch puts through a Dataset ?
Is it advisable/possible to have a async thread to do the writes and not worry about the 30sec timeout ?
--
Best Regards,
Jyotirmoy Sundi

Jyotirmoy Sundi

unread,
Dec 18, 2014, 11:25:05 AM12/18/14
to Terence Yim, cdap...@googlegroups.com
Hi Terence,
               
1.which config should be used to increase default timeout limit of a flowlet transaction. ?
2. is it possible to do batch emits using the OutputEmitter ?

Thanks
Sundi

Terence Yim

unread,
Dec 18, 2014, 2:54:32 PM12/18/14
to Jyotirmoy Sundi, cdap...@googlegroups.com
Hi Sundi,

You can watch for the issue at https://issues.cask.co/browse/CDAP-1067

What type of write are you doing on the Dataset? Is it increment or just simple put? All put updates to Dataset from Flowlet are batched in a in-memory buffer and only writes to backing store upon commit.
Currently we only support transactional update to Dataset, meaning it has to be done in the flowlet method. However, depending on your Flow logic, you can have the Flowlet that poll from Kafka simply emit
object to the next Flowlet (using hash partition) and let the next Flowlet to perform writes to Dataset so that you can scale the 2nd Flowlet to parallelize writes.

Terence

Terence Yim

unread,
Dec 18, 2014, 3:01:53 PM12/18/14
to Jyotirmoy Sundi, cdap...@googlegroups.com
Hi Sundi,

1. The transaction timeout is controlled by a property called “data.tx.timeout”, which you can set it in cdap-site.xml (unit in seconds). E.g.

    <property>
        <name>data.tx.timeout</name>
        <value>40</value>
    </property>

2. What do mean by batch emit? You can call emit() multiple times from flowlet method. Every object emitted is appended to an internal list and the whole list will be writing to HBase in batch when commit.

Terence

Jyotirmoy Sundi

unread,
Dec 18, 2014, 3:06:28 PM12/18/14
to Terence Yim, cdap...@googlegroups.com
1. Thanks Terence
2. sorry I was unclear, was thinking about batch updates through a dataset with put() https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#put(org.apache.hadoop.hbase.client.Put), but its a in memory buffer, so clear now.


You received this message because you are subscribed to a topic in the Google Groups "CDAP User" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/cdap-user/jq216bAag20/unsubscribe.
To unsubscribe from this group and all its topics, send an email to cdap-user+...@googlegroups.com.

To post to this group, send email to cdap...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Jyotirmoy Sundi

unread,
Dec 18, 2014, 3:09:27 PM12/18/14
to Terence Yim, cdap...@googlegroups.com
1. Thanks Terence
2. sorry I was unclear, was thinking about batch updates through a dataset with put() https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#put(org.apache.hadoop.hbase.client.Put), but its a in memory buffer, so
You received this message because you are subscribed to a topic in the Google Groups "CDAP User" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/cdap-user/jq216bAag20/unsubscribe.
To unsubscribe from this group and all its topics, send an email to cdap-user+...@googlegroups.com.

To post to this group, send email to cdap...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Jyotirmoy Sundi

unread,
Dec 18, 2014, 3:10:23 PM12/18/14
to Terence Yim, cdap...@googlegroups.com
Is the config data.tx.timeout mentioned in some other page, dont see it in http://docs.cdap.io/cdap/2.5.0/en/admin.html .

Terence Yim

unread,
Dec 18, 2014, 3:19:17 PM12/18/14
to Jyotirmoy Sundi, cdap...@googlegroups.com
Hi Sundi,

Looks like we missed that documentation. Can you open a JIRA at https://issues.cask.co ?

Terence

Jyotirmoy Sundi

unread,
Dec 18, 2014, 3:24:51 PM12/18/14
to Terence Yim, cdap...@googlegroups.com

Priyanka Nambiar

unread,
Dec 18, 2014, 3:28:45 PM12/18/14
to Jyotirmoy Sundi, Terence Yim, cdap...@googlegroups.com
Thanks Jyotirmoy! I have updated your ticket to be assigned to our documentation engineer. 

- Priyanka


For more options, visit https://groups.google.com/d/optout.


--

Priyanka Nambiar
Project Manager | Cask 

Dario Gonzalez

unread,
Aug 11, 2017, 3:33:55 PM8/11/17
to CDAP User
Hi,

     The same happened to me. Changing the property in the XML worked for CDAP 4.2.0.

Is there a way to change this timeout value by the UI? And to change it by application?

Thanks

Ali Anwar

unread,
Aug 11, 2017, 3:48:52 PM8/11/17
to cdap...@googlegroups.com
Hi Dario.

Since CDAP 4.0.0, we made this configurable on a per-app or per-program basis (and not system-side): https://issues.cask.co/browse/CDAP-6103.
You can read more about how to do so in the relevant documentation:
https://docs.cask.co/cdap/4.2.0/en/developer-manual/building-blocks/transaction-system.html#controlling-the-transaction-timeout

Regards,
Ali Anwar



--
You received this message because you are subscribed to the Google Groups "CDAP User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cdap-user+unsubscribe@googlegroups.com.

To post to this group, send email to cdap...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages