[ISSUE] (TEPHRA-169) Transactionally deleted cells become visible after few hours

Poorna Chandra (JIRA)

unread,

Feb 5, 2016, 2:32:21 PM2/5/16

to tephr...@googlegroups.com

Poorna Chandra created an issue

Tephra /

TEPHRA-169

Transactionally deleted cells become visible after few hours

Issue Type:	Bug
Affects Versions:	0.6.4
Assignee:	Poorna Chandra
Components:	tephra-hbase-compat-1.0-cdh, tephra-hbase-compat-1.1
Created:	05/Feb/16 11:31 AM
Fix Versions:	0.6.5
Priority:	Blocker
Reporter:	Poorna Chandra

While we were testing Tephra with CDH 5.5.1, we noticed that cells deleted transactionally were becoming visible to clients after a few hours. We traced the cause for this issue to be the StoreScanner.optimize() method added in HBASE-13109 to change SEEKs to SKIPs in certain cases.

Tehpra maintains multiple versions of a cell to account for in-progress transactions. Transaction co-processors filter out in-progress and deleted cells during scans and other read operations. To do this the co-processor sets max versions to INT_MAX for read operations. While going through all versions of a cell, when the co-processor sees that a particular version satisfies transaction visibility, it returns NEXT_COL (for deleted cells) or INCLUDE_AND_NEXT_COL (for non-deleted cells).

However the optimize method is changing the returned value to a SKIP instead, based on next index key (We're completely not sure of the optimize logic yet). This is causing the delete marker to be skipped, and then an older version of the same cell to be passed to the transaction co-processor. Since the older version also passes the visibility check for the transaction, the older version gets a INCLUDE_AND_NEXT_COL. Thus deletes are becoming visible again.

We should safeguard against such issues in the transaction co-processor by remembering the last skipped column, and then returning SKIP to all further versions of the same cell. But this will mean that we would process more versions of a cell than necessary, and is not optimal. But, it is good to have this safeguard nevertheless.

We'll follow up with an HBase ticket.

Add Comment

This message was sent by Atlassian JIRA (v6.1.5#6160-sha1:a61a0fc)

Poorna Chandra (JIRA)

unread,

Feb 5, 2016, 2:36:21 PM2/5/16

to tephr...@googlegroups.com

Poorna Chandra updated an issue

Tephra /

TEPHRA-169

Transactionally deleted cells become visible after few hours

Change By:	Poorna Chandra
Assignee:	Poorna Chandra Sagar Kapare

Add Comment

Terence Yim (JIRA)

unread,

Feb 7, 2016, 7:09:22 PM2/7/16

to tephr...@googlegroups.com

Terence Yim commented on an issue

Re: Transactionally deleted cells become visible after few hours

The code path introduced by HBASE-13109 was there since HBase 1.0, so this mean it affects CDH 5.4 - 5.5 and HDP 2.3 as well, right?

Add Comment

Tephra /

TEPHRA-169

Transactionally deleted cells become visible after few hours

While we were testing Tephra with CDH 5.5.1, we noticed that cells deleted transactionally were becoming visible to clients after a few hours. We traced the cause for this issue to be the StoreScanner.optimize() method added in HBASE-13109 to change SEEKs to SKIPs in certain cases.

Tehpra maintains multiple versions of a cell to account for in-progress...

Terence Yim (JIRA)

unread,

Feb 7, 2016, 7:39:22 PM2/7/16

to tephr...@googlegroups.com

Terence Yim commented on an issue

Re: Transactionally deleted cells become visible after few hours

I think a better fix is to first return INCLUDE followed by SEEK_NEXT_USING_HINT or NEXT_COL depending on Scan/Get, because in those cases, the optimization in HBase-13109 won't kick in.

Add Comment

Poorna Chandra (JIRA)

unread,

Feb 8, 2016, 1:54:22 AM2/8/16

to tephr...@googlegroups.com

Poorna Chandra commented on an issue

Re: Transactionally deleted cells become visible after few hours

Terence Yim I verified that CDH 5.4.9 does not have optimization introduced in HBASE-13109. It was only added from CDH 5.5.0.

Sagar Kapare said he is able to see this issue in HDP 2.3 as well.

Add Comment

Terence Yim (JIRA)

unread,

Feb 8, 2016, 1:55:23 PM2/8/16

to tephr...@googlegroups.com

Terence Yim commented on an issue

Re: Transactionally deleted cells become visible after few hours

Probably the "HBase 1.0" from CDH 5.4 doesn't backport all changes in HBase 1.0 branch.

Add Comment

Terence Yim (JIRA)

unread,

Feb 8, 2016, 1:55:25 PM2/8/16

to tephr...@googlegroups.com

Terence Yim commented on an issue

Re: Transactionally deleted cells become visible after few hours

BTW, I think we should use the SEEK_NEXT_USING_HINT if possible, since we know what columns / rows to fetch, hence using the SEEK_NEXT_USING_HINT might be better than always returning NEXT_COL.

Add Comment

Poorna Chandra (JIRA)

unread,

Feb 8, 2016, 3:16:27 PM2/8/16

to tephr...@googlegroups.com

Poorna Chandra commented on an issue

Re: Transactionally deleted cells become visible after few hours

I agree using SEEK_NEXT_USING_HINT would be better than returning NEXT_COL if the columns are known.

However, TransactionVisibilityFilter is used also by Data Janitor to during compactions and flushes. Changing TransactionVisibilityFilter to use SEEK_NEXT_USING_HINT will need more code changes. As we discussed offline, lets go with returning NEXT_COL for now. I've filed TEPHRA-170 for looking into returning SEEK_NEXT_USING_HINT.

Add Comment

Poorna Chandra (JIRA)

unread,

Feb 10, 2016, 2:37:22 PM2/10/16

to tephr...@googlegroups.com

Poorna Chandra commented on an issue

Re: Transactionally deleted cells become visible after few hours

PR - https://github.com/caskdata/tephra/pull/111

Add Comment

Poorna Chandra (JIRA)

unread,

Feb 10, 2016, 2:39:21 PM2/10/16

to tephr...@googlegroups.com

Poorna Chandra resolved an issue as Fixed

Tephra /

TEPHRA-169

Transactionally deleted cells become visible after few hours

Change By:	Poorna Chandra
Status:	Open Resolved
Resolution:	Fixed

Add Comment

James Taylor (JIRA)

unread,

Feb 10, 2016, 4:39:23 PM2/10/16

to tephr...@googlegroups.com

James Taylor commented on an issue

Re: Transactionally deleted cells become visible after few hours

We're seeing the following test failures in Phoenix with this patch applied:

                                                    Running org.apache.phoenix.end2end.index.txn.MutableRollbackIT
Tests run: 6, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 90.77 sec <<< FAILURE! - in org.apache.phoenix.end2end.index.txn.MutableRollbackIT
testCheckpointAndRollback[localIndex = false](org.apache.phoenix.end2end.index.txn.MutableRollbackIT)  Time elapsed: 13.225 sec  <<< FAILURE!
org.junit.ComparisonFailure: expected:<a[]> but was:<a[a]>
	at org.apache.phoenix.end2end.index.txn.MutableRollbackIT.testCheckpointAndRollback(MutableRollbackIT.java:504)

testMultiRollbackOfUncommittedExistingRowKeyIndexUpdate[localIndex = false](org.apache.phoenix.end2end.index.txn.MutableRollbackIT)  Time elapsed: 0.306 sec  <<< FAILURE!
java.lang.AssertionError
	at org.apache.phoenix.end2end.index.txn.MutableRollbackIT.testMultiRollbackOfUncommittedExistingRowKeyIndexUpdate(MutableRollbackIT.java:405)

testCheckpointAndRollback[localIndex = true](org.apache.phoenix.end2end.index.txn.MutableRollbackIT)  Time elapsed: 6.686 sec  <<< FAILURE!
org.junit.ComparisonFailure: expected:<a[]> but was:<a[a]>
	at org.apache.phoenix.end2end.index.txn.MutableRollbackIT.testCheckpointAndRollback(MutableRollbackIT.java:504)

testMultiRollbackOfUncommittedExistingRowKeyIndexUpdate[localIndex = true](org.apache.phoenix.end2end.index.txn.MutableRollbackIT)  Time elapsed: 0.248 sec  <<< FAILURE!
java.lang.AssertionError
	at org.apache.phoenix.end2end.index.txn.MutableRollbackIT.testMultiRollbackOfUncommittedExistingRowKeyIndexUpdate(MutableRollbackIT.java:405)


Results :

Failed tests: 
  MutableRollbackIT.testCheckpointAndRollback:504 expected:<a[]> but was:<a[a]>
  MutableRollbackIT.testCheckpointAndRollback:504 expected:<a[]> but was:<a[a]>
  MutableRollbackIT.testMultiRollbackOfUncommittedExistingRowKeyIndexUpdate:405
  MutableRollbackIT.testMultiRollbackOfUncommittedExistingRowKeyIndexUpdate:405

Tests run: 6, Failures: 4, Errors: 0, Skipped: 0

                                                

Add Comment

Tephra /

TEPHRA-169

Transactionally deleted cells become visible after few hours

While we were testing Tephra with CDH 5.5.1, we noticed that cells deleted transactionally were becoming visible to clients after a few hours. We traced the cause for this issue to be the StoreScanner.optimize() method added in HBASE-13109 to change SEEKs to SKIPs in certain cases.

Tehpra maintains multiple versions of a cell to account for in-progress...

Poorna Chandra (JIRA)

unread,

Feb 10, 2016, 5:00:23 PM2/10/16

to tephr...@googlegroups.com

Poorna Chandra reopened an issue

Tephra /

TEPHRA-169

Transactionally deleted cells become visible after few hours

Change By:	Poorna Chandra
Resolution:	Fixed
Status:	Resolved Reopened

Add Comment

Poorna Chandra (JIRA)

unread,

Feb 10, 2016, 5:00:25 PM2/10/16

to tephr...@googlegroups.com

Poorna Chandra commented on an issue

Re: Transactionally deleted cells become visible after few hours

Thanks for the update James Taylor, we'll look into this.

Add Comment

Tephra /

TEPHRA-169

Transactionally deleted cells become visible after few hours

While we were testing Tephra with CDH 5.5.1, we noticed that cells deleted transactionally were becoming visible to clients after a few hours. We traced the cause for this issue to be the StoreScanner.optimize() method added in HBASE-13109 to change SEEKs to SKIPs in certain cases.

Tehpra maintains multiple versions of a cell to account for in-progress...

Poorna Chandra (JIRA)

unread,

Feb 11, 2016, 3:44:22 AM2/11/16

to tephr...@googlegroups.com

Poorna Chandra commented on an issue

Re: Transactionally deleted cells become visible after few hours

James Taylor From the above test cases it looks like the rollback of cells in data table is happening correctly. However, the rollback of cells in index table is not happening correctly. Can you briefly explain how the rollback of cells in data table triggers rollback of corresponding cells in index table? It would be helpful if you can include a link to the source where this rollback is done.

The only change PR 111 does is to make sure that older versions of a cell are skipped after transaction visibility filter returns INCLUDE_AND_NEXT_COL or NEXT_COL for a cell. Does a rollback of index table run a scan to fetch versions that need to be deleted?

Add Comment

James Taylor (JIRA)

unread,

Feb 11, 2016, 11:27:23 AM2/11/16

to tephr...@googlegroups.com

James Taylor commented on an issue

Re: Transactionally deleted cells become visible after few hours

See PhoenixTransactionalIndexer.processRollback() which is triggered by the coprocessor when the data table rows are deleted. We use tx.setVisibility(VisibilityLevel.SNAPSHOT_ALL) to see all cell versions that occurred during the commit. Maybe the change in behavior is with VisibilityLevel.SNAPSHOT_ALL?

Add Comment

James Taylor (JIRA)

unread,

Feb 11, 2016, 11:30:22 AM2/11/16

to tephr...@googlegroups.com

James Taylor edited a comment on an issue

Re: Transactionally deleted cells become visible after few hours

See PhoenixTransactionalIndexer.processRollback() which is triggered by the coprocessor when the data table rows are deleted. We use {{tx.setVisibility(VisibilityLevel.SNAPSHOT_ALL)}} to see all cell versions that occurred during the commit. We order the versions in timestamp batches, starting with the earliest and walk through them to generate the corresponding delete for the index row. Maybe the change in behavior is with {{VisibilityLevel.SNAPSHOT_ALL}}?

Add Comment

Poorna Chandra (JIRA)

unread,

Feb 11, 2016, 11:32:23 PM2/11/16

to tephr...@googlegroups.com

Poorna Chandra commented on an issue

Re: Transactionally deleted cells become visible after few hours

James Taylor PhoenixTransactionalIndexer.processRollback() adds SkipScanFilter while scanning for all snapshots. I added some debug logging statements, and this is what I see for scan during rollback with VisibilityLevel.SNAPSHOT_ALL -

TransactionVisibilityFilter's return code for snapshot cell is INCLUDE
TransactionVisibilityFilter's sub-filter SkipScanFilter changes this return code into INCLUDE_AND_NEXT_COL
CellSkipFilter (the new filter added recently) sees INCLUDE_AND_NEXT_COL for a cell, and makes sure earlier versions of the same cell are skipped.

When I changed SkipScanFilter to return INCLUDE instead of INCLUDE_AND_NEXT_COL, the tests pass.

If the SkipScanFilter needs to see additional versions for the same cell then it should return INCLUDE, right? Any reason why it returns INCLUDE_AND_NEXT_COL instead?