Hi Kaus,
Q1.
Each update job will create new segments, that overlay on top of the old segments. If the new segments completely cover any older segment, then that older segment is considered "fully overshadowed" and will be automatically dropped from the cluster.
Otherwise if any portion of the old segment is still not overshadowed, then that segment must remain active to serve up its unshadowed portion of the data. So over time segments can pile up and segment count increase, which can start to create a burden on queries and on cluster maintenance activities.
This is where compaction comes in ... compaction will merge the layered sets of segments together into a single set of segments, so you are back to the original number of segments. So as long as you have compaction running then you should be okay here.
Now as the segments are marked as unused, then they will still hang around in the system until they are permanently removed. This is where Kill tasks come in ... they remove unused segments from the system. This prevents record buildup in the metadata DB as well as deep storage
So just make sure you have compaction and kill tasks going and there should be no issues regardless of the volume.
Q2.
Streaming data is append only ... it cannot be used for updates. When people want to receive updates via streaming data there are a couple other options:
a) ingest the data as new record values, then at query time use the Latest() function to pull only the latest values for that event. This leaves all versions of the data in the cluster, which you could view as pro or con. You could eventually remove the old records through reindex batch jobs, even using the batch update method in some cases.
b) if the updates are ingested in a separate stream, ingest them into a side table, then issue a reindex batch job that merges the update data back into the main table using some form of join.
===
Let me know if you have any additional questions on this.
Thanks. John