Do your clients all start up at the same time? If not, it's not clear to me why your setup wouldn't work. If the clients' start times are randomly distributed, then if the server closes each connection after 30m, the connection close times should be just as randomly distributed as the client start times, which means that as soon as the new server comes up, clients should start trickling into it. It may take 30m for the load to fully balance, but the new server should start getting new load immediately, and the load should increase slowly over that 30m period.
I don't know anything about the Kubernetes autoscaling side of things, but maybe there are parameters you can tune there to give the new server more time to accumulate load before Kubernetes kills it?
In general, it's not clear to me that there's a better approach than the one you're already taking. There's always a bit of tension between load balancing and streaming RPCs, because the whole point of a streaming RPC is that it doesn't go through load balancing for each individual message, which means that all of the messages go to the same backend.
I hope this information is helpful.