How to invalidate historical node cache after certain time?

915 views
Skip to first unread message

Dayananda MR

unread,
Sep 3, 2014, 12:23:51 PM9/3/14
to druid-de...@googlegroups.com
Hi,

I have an use case in which I need to invalidate the historical node cache after certain time.

Currently when a historical node notices a new load queue entry in its load queue path, it will first check a local disk directory (cache) for the information about segment. If no information about the segment exists in the cache, the historical node will download metadata about the new segment to serve from Zookeeper.

But I don't want historical node to use the local disk directory(cache) if that segment is 90 days old. Instead it has to download the metadata and continue with  further steps. How to do this? Also, How do I flush invalidated(more then 90 days old) data from cache?

Appreciate any help

Thanks
Daya

Xavier Léauté

unread,
Sep 3, 2014, 12:44:00 PM9/3/14
to druid-de...@googlegroups.com
Hi Daya, can you describe your use-case a little bit more? It's not clear to me in which kind of situations you would want to do that.


--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/63ff2fd9-a2e2-4196-8131-23726a3a2643%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Dayananda MR

unread,
Sep 3, 2014, 1:18:52 PM9/3/14
to druid-de...@googlegroups.com
Xavier, thank you for your response.

My use case is, people query different kind of segments. The possibility of querying same segment after 90 days is nill. But that segment will be burden to HN cache space. An I don't want to keep that data forever and I want to clear that space.

Thanks
Daya

Xavier Léauté

unread,
Sep 3, 2014, 1:53:27 PM9/3/14
to druid-de...@googlegroups.com
You can configure the coordinator to tell historical nodes to drop data after 90 days, this will remove the data from the cluster and free up space.


Dayananda MR

unread,
Sep 3, 2014, 2:34:19 PM9/3/14
to druid-de...@googlegroups.com
Xavier, thank you for your response.

My use case is, people query different kind of segments. The possibility of querying same segment after 90 days is nill. But that segment will be burden to HN cache space. An I don't want to keep that data forever and I want to clear that space.

Thanks
Daya


On Wednesday, September 3, 2014 11:44:00 AM UTC-5, Xavier wrote:

Dayananda MR

unread,
Sep 3, 2014, 3:04:50 PM9/3/14
to druid-de...@googlegroups.com
Thank you again. My understanding is, we can configure the date interval to drop the segments. But is there a way to associate the validity of each segment loaded to HN? (Kind of Queue)

Fangjin Yang

unread,
Sep 4, 2014, 12:44:10 PM9/4/14
to druid-de...@googlegroups.com
Hi Dayananda, can you clarify what you mean by associating the validity of each segment loaded  to HN?

Dayananda MR

unread,
Sep 11, 2014, 3:09:43 PM9/11/14
to druid-de...@googlegroups.com
We have different data sets in historical nodes..For example Network, Region, Property level etc. I would like to fix the different validity (ex: 90 days, 100 days, 120 days) for each of these data sets.  As soon as the data sets completes respective number of days, I expect that data set flush out from HN. It would be helpful if you explain how to do this. - Thanks

Nishant Bangarwa

unread,
Sep 12, 2014, 12:25:27 PM9/12/14
to druid-de...@googlegroups.com
Hi Dayananda, 

one way to achieve this is to ingest your different datasets as separate dataSources in druid and configure separate loadRules for each dataSource in coordinator node. 



For more options, visit https://groups.google.com/d/optout.

Dayananda MR

unread,
Sep 23, 2014, 10:50:56 AM9/23/14
to druid-de...@googlegroups.com
Hi Nishant,

In

http://druid.io/docs/latest/Rule-Configuration.html, for "dropByPeriod" it states that

"The interval of a segment will be compared against the specified period. The period is from some time in the past to the current time. The rule matches if the period contains the interval"

If you consider the below example,

Current time is: 31st September
dataSource Segment interval is: 15th/16th Sept

If i consider the rule { "type" : "dropByPeriod",  "period" : "P1M" }

then period will be
1st Sept - 31st Sept.

Segment interval matches with the period which is mentioned above and the segment will be dropped. But the segment is not one moth old yet.

My question is,

How to drop only the segments which are N month old?

Thanks
Daya

Fangjin Yang

unread,
Sep 23, 2014, 2:33:28 PM9/23/14
to druid-de...@googlegroups.com
Dayanda: set your rules in this order:

loadByPeriod: P1M
dropForever

all segments in the most recent month are loaded.
all other segments are dropped.

Krzysztof Zarzycki

unread,
Jul 6, 2015, 9:19:07 AM7/6/15
to druid-de...@googlegroups.com
Hi there, 
I was looking for information how to drop segments from cache on historical nodes and I noticed this post. But as I understand, in your answers you're telling how to drop segments from the cluster *including* deep storage - by Coordinator rules.  What if I would like to drop segments only from historical cache, leaving them in the deep storage? And not load them until they are really needed again? Either automatically or manually ? (that means they are pretty cold)
Is it possible with Druid? If not, it would mean for me that deep storage is only for backing up the cache or for coordination, not for larger-than-cache storage. 

There might be a chance that disabling the segment might be the manual way to go, am I right? 

Please tell me if I'm wrong,
Thanks!
Krzysztof

Nishant Bangarwa

unread,
Jul 6, 2015, 10:09:08 AM7/6/15
to druid-de...@googlegroups.com
Hi Krzysztof, 
When you modify rules to drop a segment it means that the segment will not be loaded on the historical anymore but will still remain in deep storage and can be loaded again by historical node if needed.   
Druid never deleted segments from deep storage unless it is explicitly told to do so via KillTask or by invoking coordinator end points to delete the segment. 



For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages