Long server freeze during Mark-Sweep GC

128 views
Skip to first unread message

Sergei Marochkin

unread,
Oct 19, 2024, 3:53:47 PM10/19/24
to v8-users
  • Gaming server with 1k+ concurrent players online
  • A lot of data in memory: `max heap` ~5.5 GIB, `old_space` - ~4 GiB
  • Flags: `--max-semi-space-size=512 --max_old_space_size=8192`
  • NodeJS v22.8 (V8 12.4.254.21-node.19)
  • Debian 10 x86_64
  • Xeon E-2186G
  • 32 GB RAM

Problem: during Mark-Sweep GC execution the server freezes for a long time.

I made the logging of these freezes this way:

```js
const repeatMs = 1000;

setInterval(() => {
const time = Date.now();

const betweenMs = (time - prevCallTime) - repeatMs;

// log betweenMs

prevCallTime = time;
}, repeatMs);
```

(there is also an implementation via `monitorEventLoopDelay`, but these results are nothing to compare with)

Here are the results for similar time periods with similar load on different days (y-axis is milliseconds, x-axis is time). For comparison, I have attached the results from Bun, where JavaScriptCore is used instead of V8.

NodeJS:
node1.png

Bun:
bun1.png

A little closer:

NodeJS:
node2.png

Bun:
bun2.png

As you can see in the graphs, the execution time of the 1-second timer in NodeJS can slow down by more than a second (max 1186, average 5.04). While in Bun, these slowdowns do not exceed 120 ms (max 112, average 8.92).

But between these freezes (second pair of graphs), NodeJS performs noticeably better (max 57, average 1.23) than Bun (max 33, average 9.92).

The `heapUsed` drop in NodeJS corresponds to a graph of freezes:
node3.png

Bun has noticeably higher memory consumption, more infrequent `heapUsed` release, but no freezes during `heapUsed` clearing.
bun3.png

Tried various flags for NodeJS:
  • --predictable-gc-schedule
  • --optimize-for-size
  • --max-semi-space-size=512
  • --gc-interval=100
  • --use-largepages=on
  • --v8-pool-size=12

These flags except of `--max-semi-space-size` didn't have much positive impact on this situation.

I also tried reducing the number of cyclic links used. I changed the code that was written like this:
```ts
export const vehicles = new Map<number, Vehicle>();
export const players = new Map<number, Player>();

export class Vehicle {
  readonly id: number;
  private player: Player | null = null;

  setPlayer(player: Player | null) {
    this.player = player;
  }
}

export class Player {
  readonly id: number;
  private vehicle: Vehicle | null = null;

  setVehicle(vehicle: Vehicle | null) {
    this.vehicle = vehicle;
  }
}
```
to something like this:
```ts
export const vehicles = new Map<number, Vehicle>();
export const players = new Map<number, Player>();

export class Vehicle {
  readonly id: number;
  private playerId: number | null = null;

  setPlayer(player: Player | null) {
    this.playerId = player?.id ?? null;
  }

  get player() {
    if (this.playerId === null) {
      return null;
    }

    return players.get(this.playerId);
  }
}

export class Player {
  readonly id: number;
  private vehicleId: number | null = null;

  setVehicle(vehicle: Vehicle | null) {
    this.vehicleId = vehicle?.id ?? null;
  }
 
  get vehicle() {
    if (this.vehicleId === null) {
      return null;
    }

    return vehicles.get(this.vehicleId);
  }
}
```
As a result, the frequency of GC calls was reduced from once every 2 minutes to once every 3 minutes, heap usage reduced about for 300 MiB. But it didn't affect the duration of GC operation.

The problem is complicated by the fact that I can't take a `Heap snapshot` through Chrome DevTools because it hangs the server for a few minutes and there is a high probability of DevTools crashing. `Allocation instrumentation on timeline` also causes long hangs, so I can't use it. I can use `Allocation sampling` and `Performance Profiler` they work well.

Trying to take any dump in Bun just crashes the server.

Example gc-verbose log (freeze time 913ms)

```
33273892 ms: [HeapController] factor 1.2 based on mu=0.970, speed_ratio=184 (gc=567994, mutator=3093)
33273892 ms: [HeapController] Limit: old size: 4154105 KB, new limit: 5303847 KB (1.2)
33273892 ms: [GlobalMemoryController] Limit: old size: 4154105 KB, new limit: 5303847 KB (1.2)
33273892 ms: Mark-Compact 5047.6 (6069.3) -> 4056.7 (6055.0) MB, pooled: 14 MB, 135.18 / 0.09 ms  (+ 865.7 ms in 0 steps since start of marking, biggest step 0.0 ms, walltime since start of marking 1063 ms) (average mu = 0.992, current mu = 0.994) finalize incremental marking via stack guard; GC in old space requested
Memory allocator,       used: 6200356 KB, available: 11363292 KB
Read-only space,        used:    107 KB, available:      0 KB, committed:      0 KB
New space,              used:      0 KB, available: 257728 KB, committed: 524288 KB
New large object space, used:      0 KB, available: 262144 KB, committed:      0 KB
Old space,              used: 3757386 KB, available:   1341 KB*, committed: 5271512 KB
Code space,             used:  23834 KB, available:      0 KB*, committed:  28928 KB
Large object space,     used: 360350 KB, available:      0 KB, committed: 361152 KB
Code large object space,     used:    135 KB, available:      0 KB, committed:    152 KB
Trusted space,              used:  12398 KB, available:      0 KB*, committed:  14324 KB
Trusted large object space,     used:      0 KB, available:      0 KB, committed:      0 KB
All spaces,             used: 4154213 KB, available: 11884505 KB*, committed: 6200356 KB
Pool buffering 57 chunks of committed:  14592 KB
External memory reported:  72027 KB
Backing store memory:  74301 KB
External memory global 0 KB
Total time spent in GC  : 71362.6 ms
(*) Sweeping is still in progress, making available sizes inaccurate.
```

How to reduce Mark-Sweep execution time in V8? Why Bun does not have such freezes?

Omer Katz (chromium.org)

unread,
Oct 31, 2024, 4:34:41 AM10/31/24
to v8-u...@googlegroups.com
Can you file a bug for this?
Please include instructions for reproducing it as well. Preferably the repro should be as minimal as possible, and even better if we can reproduce it in d8 without needing NodeJS.

It seems like you're getting longer GC pauses than expected, but the info here is not enough to determine why.
GC should be mostly concurrent. Perhaps, for some reason, your setup is not getting the background GC threads.
Am I right to assume that this part of a long running javascript script? That could affect GC incremental progress on the main thread.

--
--
v8-users mailing list
v8-u...@googlegroups.com
http://groups.google.com/group/v8-users
---
You received this message because you are subscribed to the Google Groups "v8-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to v8-users+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/v8-users/1172da80-1ac4-426c-8ea4-56c24a59df11n%40googlegroups.com.

Sergei Marochkin

unread,
Oct 31, 2024, 6:59:05 AM10/31/24
to v8-users
> Can you file a bug for this?

Sure: https://issues.chromium.org/issues/376493207
Reply all
Reply to author
Forward
0 new messages