Load rule doesn't honor intervals properly

47 views
Skip to first unread message

pala.m...@airbnb.com

unread,
Mar 19, 2018, 1:54:17 PM3/19/18
to Druid Development
Hi,

In our deployment, we enabled background segment merging and found that some of the data within the load period was actually getting dropped. 

My suspicion was that when a merged segment only partially overlaps with a period (e.g: Rule says keep data from Jan 1st onwards, and i have a segment that spans Dec 25th - Jan 2nd), for correctness that segment should be kept but current implementation seems to drop it.

I checked the code and found indeed Rules.eligibleForLoad() only keeps segments that overlap fully. 

Is this a bug, or is there other reason behind this? In our case, we do have data sources that are highly aggregated and therefore a single segment could span a month for example. 

I can submit a patch but wanted to get proper context.


Thanks,
pala

Pala Muthiah

unread,
Mar 29, 2018, 8:43:52 PM3/29/18
to druid-de...@googlegroups.com
Hello folks,

Anybody have insight on the below? Curious to know if there would be unforeseen side effects if we count even partial overlap as valid.

--
You received this message because you are subscribed to a topic in the Google Groups "Druid Development" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/druid-development/QYMhjGup2RI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to druid-development+unsubscribe@googlegroups.com.
To post to this group, send email to druid-development@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/d8e3df1d-699e-4747-a681-ebba91327388%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Gian Merlino

unread,
Mar 30, 2018, 5:12:37 PM3/30/18
to druid-de...@googlegroups.com, d...@druid.incubator.apache.org
Hi Pala,

That sounds like a bug to me - a patch would be welcome!

Btw, since we are trying to migrate the dev mailing list to Apache, please cross post this sort of thing with d...@druid.incubator.apache.org, or even only post to that list.

Gian

On Thu, Mar 29, 2018 at 5:43 PM, 'Pala Muthiah' via Druid Development <druid-development@googlegroups.com> wrote:
Hello folks,

Anybody have insight on the below? Curious to know if there would be unforeseen side effects if we count even partial overlap as valid.

On Mon, Mar 19, 2018 at 10:54 AM, pala.muthiah via Druid Development <druid-development@googlegroups.com> wrote:
Hi,

In our deployment, we enabled background segment merging and found that some of the data within the load period was actually getting dropped. 

My suspicion was that when a merged segment only partially overlaps with a period (e.g: Rule says keep data from Jan 1st onwards, and i have a segment that spans Dec 25th - Jan 2nd), for correctness that segment should be kept but current implementation seems to drop it.

I checked the code and found indeed Rules.eligibleForLoad() only keeps segments that overlap fully. 

Is this a bug, or is there other reason behind this? In our case, we do have data sources that are highly aggregated and therefore a single segment could span a month for example. 

I can submit a patch but wanted to get proper context.


Thanks,
pala

--
You received this message because you are subscribed to a topic in the Google Groups "Druid Development" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/druid-development/QYMhjGup2RI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to druid-development+unsubscribe@googlegroups.com.
To post to this group, send email to druid-development@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/d8e3df1d-699e-4747-a681-ebba91327388%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-development+unsubscribe@googlegroups.com.

To post to this group, send email to druid-development@googlegroups.com.

Pala Muthiah

unread,
Apr 8, 2018, 11:04:40 PM4/8/18
to d...@druid.incubator.apache.org, druid-de...@googlegroups.com
Hi Gian,

Thanks for following up. I have submitted a patch: https://github.com/druid-io/druid/pull/5595.

Whoever is the right owner please take a look - let me know if i should @ a specific person and i can do that.


Thanks,
pala



Gian Merlino

unread,
Apr 9, 2018, 1:38:56 PM4/9/18
to druid-de...@googlegroups.com, d...@druid.incubator.apache.org
Looks like it is under review right now. Thanks for the patch.

Gian

Reply all
Reply to author
Forward
0 new messages