[ISSUE] (TEPHRA-169) Transactionally deleted cells become visible after few hours

8 views
Skip to first unread message

Poorna Chandra (JIRA)

unread,
Feb 5, 2016, 2:32:21 PM2/5/16
to tephr...@googlegroups.com
Poorna Chandra created an issue
 
Tephra / Bug TEPHRA-169
Transactionally deleted cells become visible after few hours
Issue Type: Bug Bug
Affects Versions: 0.6.4
Assignee: Poorna Chandra
Components: tephra-hbase-compat-1.0-cdh, tephra-hbase-compat-1.1
Created: 05/Feb/16 11:31 AM
Fix Versions: 0.6.5
Priority: Blocker Blocker
Reporter: Poorna Chandra

While we were testing Tephra with CDH 5.5.1, we noticed that cells deleted transactionally were becoming visible to clients after a few hours. We traced the cause for this issue to be the StoreScanner.optimize() method added in HBASE-13109 to change SEEKs to SKIPs in certain cases.

Tehpra maintains multiple versions of a cell to account for in-progress transactions. Transaction co-processors filter out in-progress and deleted cells during scans and other read operations. To do this the co-processor sets max versions to INT_MAX for read operations. While going through all versions of a cell, when the co-processor sees that a particular version satisfies transaction visibility, it returns NEXT_COL (for deleted cells) or INCLUDE_AND_NEXT_COL (for non-deleted cells).

However the optimize method is changing the returned value to a SKIP instead, based on next index key (We're completely not sure of the optimize logic yet). This is causing the delete marker to be skipped, and then an older version of the same cell to be passed to the transaction co-processor. Since the older version also passes the visibility check for the transaction, the older version gets a INCLUDE_AND_NEXT_COL. Thus deletes are becoming visible again.

We should safeguard against such issues in the transaction co-processor by remembering the last skipped column, and then returning SKIP to all further versions of the same cell. But this will mean that we would process more versions of a cell than necessary, and is not optimal. But, it is good to have this safeguard nevertheless.

We'll follow up with an HBase ticket.

Add Comment Add Comment
 
This message was sent by Atlassian JIRA (v6.1.5#6160-sha1:a61a0fc)
Atlassian logo

Poorna Chandra (JIRA)

unread,
Feb 5, 2016, 2:36:21 PM2/5/16
to tephr...@googlegroups.com
Poorna Chandra updated an issue
Change By: Poorna Chandra
Assignee: Poorna Chandra Sagar Kapare

Terence Yim (JIRA)

unread,
Feb 7, 2016, 7:09:22 PM2/7/16
to tephr...@googlegroups.com
Terence Yim commented on an issue
 
Re: Transactionally deleted cells become visible after few hours

The code path introduced by HBASE-13109 was there since HBase 1.0, so this mean it affects CDH 5.4 - 5.5 and HDP 2.3 as well, right?

While we were testing Tephra with CDH 5.5.1, we noticed that cells deleted transactionally were becoming visible to clients after a few hours. We traced the cause for this issue to be the StoreScanner.optimize() method added in HBASE-13109 to change SEEKs to SKIPs in certain cases.

Tehpra maintains multiple versions of a cell to account for in-progress...

Terence Yim (JIRA)

unread,
Feb 7, 2016, 7:39:22 PM2/7/16
to tephr...@googlegroups.com
Terence Yim commented on an issue

I think a better fix is to first return INCLUDE followed by SEEK_NEXT_USING_HINT or NEXT_COL depending on Scan/Get, because in those cases, the optimization in HBase-13109 won't kick in.

Poorna Chandra (JIRA)

unread,
Feb 8, 2016, 1:54:22 AM2/8/16
to tephr...@googlegroups.com
Poorna Chandra commented on an issue

Terence Yim I verified that CDH 5.4.9 does not have optimization introduced in HBASE-13109. It was only added from CDH 5.5.0.

Sagar Kapare said he is able to see this issue in HDP 2.3 as well.

Terence Yim (JIRA)

unread,
Feb 8, 2016, 1:55:23 PM2/8/16
to tephr...@googlegroups.com
Terence Yim commented on an issue

Probably the "HBase 1.0" from CDH 5.4 doesn't backport all changes in HBase 1.0 branch.

Terence Yim (JIRA)

unread,
Feb 8, 2016, 1:55:25 PM2/8/16
to tephr...@googlegroups.com
Terence Yim commented on an issue

BTW, I think we should use the SEEK_NEXT_USING_HINT if possible, since we know what columns / rows to fetch, hence using the SEEK_NEXT_USING_HINT might be better than always returning NEXT_COL.

Poorna Chandra (JIRA)

unread,
Feb 8, 2016, 3:16:27 PM2/8/16
to tephr...@googlegroups.com
Poorna Chandra commented on an issue

I agree using SEEK_NEXT_USING_HINT would be better than returning NEXT_COL if the columns are known.

However, TransactionVisibilityFilter is used also by Data Janitor to during compactions and flushes. Changing TransactionVisibilityFilter to use SEEK_NEXT_USING_HINT will need more code changes. As we discussed offline, lets go with returning NEXT_COL for now. I've filed TEPHRA-170 for looking into returning SEEK_NEXT_USING_HINT.

Poorna Chandra (JIRA)

unread,
Feb 10, 2016, 2:37:22 PM2/10/16
to tephr...@googlegroups.com

Poorna Chandra (JIRA)

unread,
Feb 10, 2016, 2:39:21 PM2/10/16
to tephr...@googlegroups.com
Poorna Chandra resolved an issue as Fixed
Change By: Poorna Chandra
Status: Open Resolved
Resolution: Fixed

James Taylor (JIRA)

unread,
Feb 10, 2016, 4:39:23 PM2/10/16
to tephr...@googlegroups.com
James Taylor commented on an issue
 
Re: Transactionally deleted cells become visible after few hours

We're seeing the following test failures in Phoenix with this patch applied:

Running org.apache.phoenix.end2end.index.txn.MutableRollbackIT
Tests run: 6, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 90.77 sec <<< FAILURE! - in org.apache.phoenix.end2end.index.txn.MutableRollbackIT
testCheckpointAndRollback[localIndex = false](org.apache.phoenix.end2end.index.txn.MutableRollbackIT)  Time elapsed: 13.225 sec  <<< FAILURE!
org.junit.ComparisonFailure: expected:<a[]> but was:<a[a]>
	at org.apache.phoenix.end2end.index.txn.MutableRollbackIT.testCheckpointAndRollback(MutableRollbackIT.java:504)

testMultiRollbackOfUncommittedExistingRowKeyIndexUpdate[localIndex = false](org.apache.phoenix.end2end.index.txn.MutableRollbackIT)  Time elapsed: 0.306 sec  <<< FAILURE!
java.lang.AssertionError
	at org.apache.phoenix.end2end.index.txn.MutableRollbackIT.testMultiRollbackOfUncommittedExistingRowKeyIndexUpdate(MutableRollbackIT.java:405)

testCheckpointAndRollback[localIndex = true](org.apache.phoenix.end2end.index.txn.MutableRollbackIT)  Time elapsed: 6.686 sec  <<< FAILURE!
org.junit.ComparisonFailure: expected:<a[]> but was:<a[a]>
	at org.apache.phoenix.end2end.index.txn.MutableRollbackIT.testCheckpointAndRollback(MutableRollbackIT.java:504)

testMultiRollbackOfUncommittedExistingRowKeyIndexUpdate[localIndex = true](org.apache.phoenix.end2end.index.txn.MutableRollbackIT)  Time elapsed: 0.248 sec  <<< FAILURE!
java.lang.AssertionError
	at org.apache.phoenix.end2end.index.txn.MutableRollbackIT.testMultiRollbackOfUncommittedExistingRowKeyIndexUpdate(MutableRollbackIT.java:405)


Results :

Failed tests: 
  MutableRollbackIT.testCheckpointAndRollback:504 expected:<a[]> but was:<a[a]>
  MutableRollbackIT.testCheckpointAndRollback:504 expected:<a[]> but was:<a[a]>
  MutableRollbackIT.testMultiRollbackOfUncommittedExistingRowKeyIndexUpdate:405
  MutableRollbackIT.testMultiRollbackOfUncommittedExistingRowKeyIndexUpdate:405

Tests run: 6, Failures: 4, Errors: 0, Skipped: 0
While we were testing Tephra with CDH 5.5.1, we noticed that cells deleted transactionally were becoming visible to clients after a few hours. We traced the cause for this issue to be the StoreScanner.optimize() method added in HBASE-13109 to change SEEKs to SKIPs in certain cases.

Tehpra maintains multiple versions of a cell to account for in-progress...

Poorna Chandra (JIRA)

unread,
Feb 10, 2016, 5:00:23 PM2/10/16
to tephr...@googlegroups.com
Poorna Chandra reopened an issue
Change By: Poorna Chandra
Resolution: Fixed
Status: Resolved Reopened

Poorna Chandra (JIRA)

unread,
Feb 10, 2016, 5:00:25 PM2/10/16
to tephr...@googlegroups.com
Poorna Chandra commented on an issue
 
Re: Transactionally deleted cells become visible after few hours

Thanks for the update James Taylor, we'll look into this.

While we were testing Tephra with CDH 5.5.1, we noticed that cells deleted transactionally were becoming visible to clients after a few hours. We traced the cause for this issue to be the StoreScanner.optimize() method added in HBASE-13109 to change SEEKs to SKIPs in certain cases.

Tehpra maintains multiple versions of a cell to account for in-progress...

Poorna Chandra (JIRA)

unread,
Feb 11, 2016, 3:44:22 AM2/11/16
to tephr...@googlegroups.com
Poorna Chandra commented on an issue

James Taylor From the above test cases it looks like the rollback of cells in data table is happening correctly. However, the rollback of cells in index table is not happening correctly. Can you briefly explain how the rollback of cells in data table triggers rollback of corresponding cells in index table? It would be helpful if you can include a link to the source where this rollback is done.

The only change PR 111 does is to make sure that older versions of a cell are skipped after transaction visibility filter returns INCLUDE_AND_NEXT_COL or NEXT_COL for a cell. Does a rollback of index table run a scan to fetch versions that need to be deleted?

James Taylor (JIRA)

unread,
Feb 11, 2016, 11:27:23 AM2/11/16
to tephr...@googlegroups.com
James Taylor commented on an issue

See PhoenixTransactionalIndexer.processRollback() which is triggered by the coprocessor when the data table rows are deleted. We use tx.setVisibility(VisibilityLevel.SNAPSHOT_ALL) to see all cell versions that occurred during the commit. Maybe the change in behavior is with VisibilityLevel.SNAPSHOT_ALL?

James Taylor (JIRA)

unread,
Feb 11, 2016, 11:30:22 AM2/11/16
to tephr...@googlegroups.com
James Taylor edited a comment on an issue
See PhoenixTransactionalIndexer.processRollback() which is triggered by the coprocessor when the data table rows are deleted. We use {{tx.setVisibility(VisibilityLevel.SNAPSHOT_ALL)}} to see all cell versions that occurred during the commit.  We order the versions in timestamp batches, starting with the earliest and walk through them to generate the corresponding delete for the index row.  Maybe the change in behavior is with {{VisibilityLevel.SNAPSHOT_ALL}}?

Poorna Chandra (JIRA)

unread,
Feb 11, 2016, 11:32:23 PM2/11/16
to tephr...@googlegroups.com
Poorna Chandra commented on an issue

James Taylor PhoenixTransactionalIndexer.processRollback() adds SkipScanFilter while scanning for all snapshots. I added some debug logging statements, and this is what I see for scan during rollback with VisibilityLevel.SNAPSHOT_ALL -

  • TransactionVisibilityFilter's return code for snapshot cell is INCLUDE
  • TransactionVisibilityFilter's sub-filter SkipScanFilter changes this return code into INCLUDE_AND_NEXT_COL
  • CellSkipFilter (the new filter added recently) sees INCLUDE_AND_NEXT_COL for a cell, and makes sure earlier versions of the same cell are skipped.

When I changed SkipScanFilter to return INCLUDE instead of INCLUDE_AND_NEXT_COL, the tests pass.

If the SkipScanFilter needs to see additional versions for the same cell then it should return INCLUDE, right? Any reason why it returns INCLUDE_AND_NEXT_COL instead?

James Taylor (JIRA)

unread,
Feb 12, 2016, 1:46:21 AM2/12/16
to tephr...@googlegroups.com
James Taylor commented on an issue

Thanks so much, Poorna Chandra - good find. I agree - that's the issue. I'll fix it on our end.

Poorna Chandra (JIRA)

unread,
Feb 12, 2016, 2:35:25 PM2/12/16
to tephr...@googlegroups.com

Poorna Chandra (JIRA)

unread,
Feb 12, 2016, 2:35:26 PM2/12/16
to tephr...@googlegroups.com
Poorna Chandra resolved an issue as Fixed
Change By: Poorna Chandra
Status: Reopened Resolved
Resolution: Fixed
Reply all
Reply to author
Forward
0 new messages