Tephra has its own column family attribute to specify TTL. The application of TTL is not an issue. After this PR, Tephra will always apply the right TTL to both pre-existing cells and transactional cells. The co-processor makes the determination on how to apply the TTL by looking at the timestamp of a cell. If the cell's timestamp is in current time millis range, then the cell is considered as pre-existing cell. If not, the cell is transactional.
The issue is while doing user scans. In user scans, we apply an extra filter to exclude expired cells. Since the timestamp of pre-existing cells are of much smaller range than trasactional cells, any change to this filter to include pre-exiisting cells' timestamp will render this filter ineffective for purely transactional tables. Also, since this filter is applied before a cell reaches the co-processor, we cannot influence this filter based on cell time range like we do for TTL.
Having a table attribute like "data.tx.read.pre.existing=true", will allow us to determine whether to add a scan filter that will allow pre-existing cells or not. Not having to read expired cells while doing user scans was the performance optimization I was talking about. |