Compaction with filter creates empty segment, but won't overshadow original segment

19 views
Skip to first unread message

Daniel Nash

unread,
Mar 28, 2025, 4:19:19 PMMar 28
to Druid User
Hey All,

I'm doing compaction with a filter to remove some data from a datasource, but I've found that, if the filter causes all the rows in the time chunk to be removed, the compaction task publishes 0 segments and the original segment that still has the bad data remains.  I feel like there must be a flag or something I could specify to cause the original segment to be dropped if the compaction tasks causes all the rows to be removed, but I haven't found it in the documentation yet?  Does anyone have any thoughts?

I see the `dropExisting` flag, but, it's still marked as beta and normally isn't needed for compaction even with filtering.  Everything works as intended with the filtering so long as there is remaining data in the segment after the compaction with filtering runs.  The old segment is overshadowed and dropped and the newly compacted segment takes its place.

Thanks,
Dan

John Kowtko

unread,
Mar 28, 2025, 4:55:50 PMMar 28
to druid...@googlegroups.com
Hi Daniel,  

Try using dropExisting=true ... it should create a tombstone (empty) segment to overshadow the existing one.  

If that doesn't work then try the same using an MSQ SQL "REPLACE" statement ... MSQ may create the tombstones by default.

Thanks.  John

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/druid-user/597ca43b-4cd4-4947-ab5a-2649c4a1b864n%40googlegroups.com.


--
John Kowtko
Senior Customer Architect
Reply all
Reply to author
Forward
0 new messages