Agones Fleet Rolling Update: New servers don't spin up until old Allocated servers fully shut down even with max surge

7 views
Skip to first unread message

Devansh Gupta

unread,
Apr 4, 2026, 6:46:34 PMApr 4
to agones-discuss
Hi All,
Hopefully this is the correct place to ask this question - if not please redirect me to where I can. Thank you very much!

  Problem

  We run an Agones Fleet with replicas: 8, where each GameServer handles 500+ concurrent game sessions. Our allocator routes games only to servers labeled canAcceptGames: true. We use allocationOverflow to mark
  old servers as canAcceptGames: false during deployments, so they stop accepting new games and drain existing sessions.

  Fleet config:
  spec:
    replicas: 8
    strategy:
      type: RollingUpdate
      rollingUpdate:
        maxSurge: 99%
        maxUnavailable: 50%
    allocationOverflow:
      labels:
        canAcceptGames: "false"

  Current behavior during deployment

  1. kubectl apply with a new image
  2. allocationOverflow immediately marks 4 old Allocated servers as canAcceptGames: false — they stop accepting new games
  3. Those 4 servers begin draining their existing 500+ game sessions (takes up to 10 minutes)
  4. No new servers spin up until old servers finish draining and call SDK.Shutdown()
  5. Only when an old server dies does Agones create a replacement

  The result: For the entire drain period (up to 10 minutes), only 4 servers are accepting new games instead of 8. All traffic load concentrates on half the fleet. New servers don't appear until old ones fully
  terminate.

  What we expected

  With maxSurge: 99%, we expected Agones to immediately spin up new servers alongside the old ones. The total would temporarily exceed replicas (e.g., 12 servers: 4 old draining + 4 old active + 4 new), then
  converge back to 8 as old servers shut down.

  What we want

  1. When a deployment happens → immediately spin up new servers (e.g., 4) while old servers are still draining
  2. Mark 50% of old servers as canAcceptGames: false via allocationOverflow (this part works)
  3. As old draining servers finish and shut down, spin up remaining new servers until we reach replicas capacity

  This way the fleet never drops below full capacity. New servers absorb load while old servers drain gracefully.

  What we observe

  The new GameServerSet is created but stays at desired: 0. Agones won't scale it up because it can't reduce the old GameServerSet — all old servers are Allocated and can't be removed until they call
  SDK.Shutdown().

  NAME             SCHEDULING   DESIRED   CURRENT   ALLOCATED   READY
  fleet-old-gss    Packed       7         8         8           0       # can't reduce — all Allocated
  fleet-new-gss    Packed       0         0         0           0       # never scales up

  It appears the rolling update algorithm requires old servers to be removed before creating new ones, even when maxSurge should allow headroom for additional servers.

  Our workaround

  We temporarily set replicas: 16 (2x) during deployment so Agones creates new servers to meet the higher count. After new servers are Allocated and receiving traffic, we scale back to replicas: 8. Old servers
  drain and die without being replaced (total is still above 8).

  This works but adds CI/CD orchestration complexity that we'd prefer the rolling update to handle natively.

  Questions

  1. Is this intended behavior? Should maxSurge create new servers even when old Allocated ones can't be removed yet?
  2. Is there a way to tell Agones "create new servers first, then wait for old ones to drain" — rather than the current "reduce old first, then create new"?
  3. Are there plans to support a surge-first rolling update strategy for long-lived Allocated GameServers?

  Environment

  - Agones version: 1.39.0
  - Kubernetes: GKE
  - All 8 GameServers are Allocated at deploy time, each handling 500+ active game sessions
  - Drain time: up to 10 minutes per server

Thank you very much

Mark Mandel

unread,
Apr 4, 2026, 6:47:54 PMApr 4
to agones-...@googlegroups.com
Pretty sure we covered this in your cross post to Slack 

--
You received this message because you are subscribed to the Google Groups "agones-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to agones-discus...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/agones-discuss/2e76803c-931d-4c83-87b0-65d54e7370efn%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages