This is super helpful, Jihoon! I have a couple questions.
Just to be super explicit: when you say "timeChunk can be further
partitioned into segments" and that the segments "belongs" to the same
interval — when we look at the name of any of these segments, we will
see the same timeChunk represented, right? It might be that the
segments in the time chunk are (roughly) split up by time (because the
index task handed off the segment before the end of the time chunk)
but we won't observe that from looking at the segment's given
interval/version, right?
And the overall effect of having multiple segments per partition in
KIS is primarily about:
- letting the hand-off from index task to historical be more
incremental rather than one fell swoop
- allow late-arriving data to be added to a new segment in a timeChunk
- allowing Kafka partitions to be processed in parallelel
... but not about minimizing the work done at query time by allowing
you to separate data that differs along
frequently-filtered-at-query-time dimensions?
I do see that there's something that sounds like the latter
(PartitionsSpec) but that seems to be specific to Hadoop batch
ingestion (not even native batch ingestion).
--dave
> To view this discussion on the web visit
https://groups.google.com/d/msgid/druid-user/CACZfFK4-0UZTSfNyDAe7%2BWP_2iaeSBUTo0VKdrD42vK0HnBVVQ%40mail.gmail.com.