Hi all,
I am planning an upgrade from Druid 29.0.1 to 30.0.1. Normally we double capacity and replace historical nodes, but currently we can’t provision extra EC2 instances, and decommissioning/replacing historicals one by one would be very time-consuming (each node has ~10–28 TB of local segment cache on instance-store NVMe).
Since this is an application-level upgrade, we’re considering an in-place upgrade for Historicals:
- stop the Historical process
- clean up tmp directories and old Druid 29.x binaries
- keep the local segmentCache directory(data dir) intact
- install Druid 30.x and restart the service
The expectation is that on startup the Historical will re-announce and reuse existing local segments (no deep-storage re-downloads), so downtime per node is ~10–15 minutes.
We have replicas, so no query downtime is expected.
Is this approach supported for major version upgrades? Are there any known issues to watch out for (segment compatibility, tmp directories, map state, or other on-disk artifacts) when reusing the same instance and segment cache across 29.x -> 30.x? Any guidance or confirmation would be appreciated.
Thanks