Dead Series management

88 views
Skip to first unread message

Steven Le Roux

unread,
Feb 26, 2020, 1:37:52 PM2/26/20
to Warp 10 users
(This topic is a follow up for the Github issue : https://github.com/senx/warp10-platform/issues/674)

There are few ways to manage data retention, and one we’ve pushed in the past was to support TTL so that datapoint can be stored with internal HBase insert time according to this TTL.

In case an operator implement a TTL based data eviction policy, there is a situation that can occur where if a series don’t have any new data points pushed during the TTL period, then  there is no point anymore for a series, but where the series still exist.

I’ve called this the Dead Series pattern, and we've though of different ways to answer this need. 


The first one would be to track the TTL from the token to the metadata.
Then the directory would be TTL aware and could process some routine to garbage collect dead series that have a TTL in their metadata structure. Then we can process a find on a selector and apply a comparison between a LA and a TTL field.

The second one would be to add a specific Egress call.
Alongside w/ /update /meta /delete /fetch /find, for example a /clean?ttl= , so that the TTL is actually not forged into the metadata structure but passed as a parameter to a specific method. This way we can still implement the cleaning process inside the directory directly which would : scan the series like a FIND, compare the LastActivity with the given TTL, and delete the series directly. The problem here is that it would require to query over each directory to make it happen. Then I propose that this routine could be enable on a special directory that could be specialized on this job, and push a delete message on Kafka metadata topic so that all directories will consume it.


I feel that the second proposition is more efficient and less intrusive than the first one. The first one requires to modify the token and metadata structures and offers less flexibility where the second could enable a clean process by a user on an arbitrary TTL value (one week for example, while on the operator side, it could be the TTL defined on the platform).


Also, since we rely on TTL based on the LSM implementation in HBase, we decorrelate the series from the data points, but TTL applies on the datapoints only. This mecanism is a proposition to help customers manage the entire dataset, by completing the metadata part. 

What do you think ?

Mathias Herberts

unread,
Feb 28, 2020, 2:54:08 AM2/28/20
to Warp 10 users
Hi,

the TTL is not linked to the GTS itself but to each datapoint pushed to it. As the TTL can be set in the Token (see https://blog.senx.io/all-there-is-to-know-about-warp-10-tokens/), a single GTS can have datapoints with differing TTLs.

As of today the purge of what you call dead series can be performed with a combination of last activity, FIND, META and DELETE (or /find, /meta, /delete) in the following way:

1) Identify the series with no activity after a cut off timestamp (via FIND or /find)
2) Mark  those GTS with a special attribute (via META or /meta)
3) Delete fully those GTS you select using the special attribute set in 2 (via DELETE or /delete)

The overall process could be made a little simpler if support for quiet after/active after was added to the /delete endpoint, so far it was withheld intentionally to avoid accidental deletes by a misinterpretation of the last activity window semantics.

Mathias Herberts

unread,
Feb 28, 2020, 2:57:11 AM2/28/20
to Warp 10 users
The other important point is that last activity tracks when the GTS were last updated (or had their attributes modified), but it does not tell anything about what datapoints were written, meaning that a series updated 2 years ago with datapoints which had a TTL set to 1 year could very well have data in the current 1 year period ending now if the datapoints written 2 years ago were then in the future with a HBase cell timestamp set to that of the datapoints (again, see https://blog.senx.io/all-there-is-to-know-about-warp-10-tokens/ and more specifically the .dpts attribute).

Steven Le Roux

unread,
Mar 3, 2020, 4:23:48 AM3/3/20
to Warp 10 users
I'm well aware of all this :)

Few points here :

1/ having 1 GTS with different retention per points should be banned since on the operator side it's very hard to manage. From our experience, we will enforce a single retention policy per user account. You can still have the same GTS with different retention policies, but spreaded in different accounts. This way we can have an autonomous system that clean accounts based on the accound defined TTL.

2/ the process you're proposing is already what we do, but it doesn't work. It's way too slow for highly dynamic environnment where you create more series than you delete. For big account with more than dozens of millions of series, the FIND/META/DELETE process is just not working hence the idea to identify deletes on the directory itself through the internal scanner.

3/ For the example with a LA, it's perfeclty acceptable to me that a batched produced data pushed 2 years ago with a TTL of one year could have its datapoints purged. If the user wants a bigger TTL, it's up to him to define it, but the TTL should also be associated with the time which you pushed datapoints, not their own timestamp value. In analytics for example, if you have a forensic job and you need to compute datapoints for the next 6 month, your series could have 10 years lifetime, but at the end, you know that when you're finished, the job is done, so the TTL is here to help the customer clean its dataset.

Like you said with the .dpts, since its customer custom, the clean process should also be customer scoped.
The proposed solution may not be the best but for sure there is no existing solution currently to this problem. Still I'm open to any other idea that ease delete operations if you see any alternative.

Otherwise, if you agree on the statement, we can start working on a PR.

Mathias Herberts

unread,
Mar 3, 2020, 10:33:10 AM3/3/20
to Warp 10 users
Can you be more specific about what "doesn't work"?

A. Hébert

unread,
Mar 6, 2020, 11:54:25 AM3/6/20
to Warp 10 users
Hello, technically it's working as expected, however for large accounts (around 50 Millions of series), deleting empty series with this method takes time (FIND with an high cardinality is slow (5-60s depending of the selector), then you have to produce a META message before applying the DELETE). This also means that the Directory have 3 messages to process to be able to complete a clean of those empty series. That’s why we are thinking about how we can simplify it and came with the idea of a /clean directly inside the directory.

The idea of the "clean" endpoint is really to be able to clean series according to a selector and a TTL. As mentioned, it can remove a series that still have points, for me it's a valid trade-off, and can even be great to delete only the META entry of an unused series and letting HBase purged the data-points once they reach their TTL.

Mathias Herberts

unread,
Mar 7, 2020, 9:11:42 AM3/7/20
to Warp 10 users
Can you give some details as to number of classes, number of GTS per class, etc. The output of FINDSTATS and the size of arrays returned by FINDSETS. Do your FIND requests contain activeafter/quietafter specifiers?

A. Hébert

unread,
Mar 9, 2020, 6:05:45 AM3/9/20
to Warp 10 users
Sorry if my answer lead you to think we had a performance issue. This isn't the idea of this topic, the DELETE endpoint is great to delete DATA.
But how do we handle massive META deletes ? The actual process with the current delete isn't trivial for a Warp 10 user.
The proposal of a clean endpoint allow a user to simply delete series and their META.
For example, he can delete empty/unvalid/unused series (that still have point that will be deleted later once they expired with a TTL).
Using the TTL on such an entry point can be considered here as optional, it's more acting as a security to ensure that no recent series are deleted.

A. Hébert

unread,
Mar 9, 2020, 2:08:54 PM3/9/20
to Warp 10 users
Hello I just open a first PR about this subject : https://github.com/senx/warp10-platform/pull/687. Let-me know your thought about it.

A. Hébert

unread,
Apr 6, 2020, 9:46:14 AM4/6/20
to Warp 10 users
We tested and are currently running the new delete version. And it's working great, we are able to delete time series based on the last activity. However, I didn't find any way to send several selectors to a delete endpoint, is it possible?

Mathias Herberts

unread,
Apr 10, 2020, 3:33:23 AM4/10/20
to Warp 10 users
It is not possible to specify multiple selectors in a /delete or DELETE request. The reason is that those multiple selectors could match the same GTS multiple times and as no deduplication is done on the Directory side, multiple delete messages could be sent for the same GTS which would not be efficient.

The selector can contain regular expressions for both class and labels so you can use this if what you are trying to match can be expressed as a regexp, or you can issue multiple /delete or DELETE queries if not.
Reply all
Reply to author
Forward
0 new messages