[ISSUE] (TEPHRA-35) Prune invalid transaction set once all data for a given invalid transaction has been dropped

1 view
Skip to first unread message

James Taylor (JIRA)

unread,
Jan 9, 2015, 8:23:47 PM1/9/15
to tephr...@googlegroups.com
James Taylor commented on an issue
 
Re: Prune invalid transaction set once all data for a given invalid transaction has been dropped

This would seem to inherently limit the scalability of Tephra. Any insight on when this will be implemented?

Add Comment Add Comment
 
Tephra / Bug TEPHRA-35
Prune invalid transaction set once all data for a given invalid transaction has been dropped
In addition to dropping the data from invalid transactions we need to be able to prune the invalid set of any transactions where data cleanup has been completely performed. Without this, the invalid set will grow indefinitely and become a greater and greater cost to in-progress transactions over time.

To do this correctly, the TransactionDataJanitor copr...
This message was sent by Atlassian JIRA (v6.1.5#6160-sha1:a61a0fc)
Atlassian logo

Alex Baranau (JIRA)

unread,
Jan 9, 2015, 10:04:47 PM1/9/15
to tephr...@googlegroups.com
Alex Baranau commented on an issue

James Taylor There's no target fix version or date set for it.

Note that in a healthy system it is very rare that transaction becomes invalid. Usually only if there's a client process crash or a datastore crash. In normal situation even if commit fails, the transaction gets rolled back and not put into invalid list. It is highly unlikely to accumulate a big size of invalid list. Though it can happen, hence the priority of fixing it is one of the highest.

James Taylor (JIRA)

unread,
Jan 9, 2015, 10:25:47 PM1/9/15
to tephr...@googlegroups.com
James Taylor commented on an issue

Hmm. That's like saying transactions aren't important because usually everything works correctly. Unhealthy systems are one of the big reasons people want transactions.

Gary Helmling (JIRA)

unread,
Jan 9, 2015, 10:31:48 PM1/9/15
to tephr...@googlegroups.com
Gary Helmling commented on an issue

This is definitely a scalability concern and we are working on a plan for addressing it. However, we don't yet have a target date.

There are two approaches we can take to mitigate this. The first is an operational approach, where Tephra would provide the ability for an administrator to truncate the invalid list up to a given point. The idea is that, as part of a normal operational policy handling major compactions, an admin would know up to what time all tables in the cluster have been major compacted. Since Tephra transaction IDs are time based, you could then manually issue a command to truncate the invalid list up to this time, since you know, by virtue of the major compactions completing, that any data from invalid transactions prior to that point have been purged. This isn't ideal, as it requires some operational coordination, but it is doable.

The second approach would build this processing and tracking into Tephra itself, so that it could make the determination to automatically truncate the invalid list. This will require quite a bit more complexity to do the tracking, and needs a detailed design around it.

Gary Helmling (JIRA)

unread,
Jan 9, 2015, 10:35:48 PM1/9/15
to tephr...@googlegroups.com
Gary Helmling commented on an issue

James Taylor I don't think anyone is saying this issue isn't important. It is high priority.

As I mentioned, there are two approaches with differing levels of complexity. We will likely implement this in those two stages.

Alex Baranau (JIRA)

unread,
Jan 10, 2015, 4:30:47 PM1/10/15
to tephr...@googlegroups.com
Alex Baranau commented on an issue

James Taylor I don't think anyone is saying this issue isn't important. It is high priority.

Yes! Exactly:

Though it can happen, hence the priority of fixing it is one of the highest.

Gary Helmling (JIRA)

unread,
Jan 30, 2015, 3:44:50 PM1/30/15
to tephr...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages