Performance question on Gofer cleanup

36 views
Skip to first unread message

Zoey Han

unread,
Dec 3, 2025, 3:23:55 PMDec 3
to gVisor Users [Public]
Hi gVisor Team,

I’ve been investigating a performance issue in gVisor that seems related to file-system cleanup during container exit.

The main bottleneck appears to be a noticeable stall in the Gofer process during the execution worker’s cleanup phase. From the attached gVisor logs, we occasionally observe ~2-second gaps between the `sock read failed, closing connection: EOF entries.`

We also ran experiments with different directory structures. Mounting a folder with 1,000 subdirectories × 1 small file each results in normal cleanup performance, whereas 10 subdirectories × 100 files each (same total file count) sometimes triggers significant slowdown during Gofer cleanup.

Could someone from the team help answer the following questions?
1. Is this proportional, blocking slowdown during Gofer’s cleanup on process exit expected when a mount contains a large amount of file state, and could the directory/file structure also contribute to this behavior?
2. What are the recommended best practices for handling this volume of file state? Are there configurations or flags that allow early cleanup of stale files (e.g., any file flag)?

Thank you for your time and expertise.

Best,
Zoey


Slower gVisor log: 
D1029 22:12:13.522557 1 task_exit.go:215] [ 1: 2] Transitioning from exit state TaskExitNone to TaskExitInitiated D1029 22:12:13.525479 1 connection.go:127] sock read failed, closing connection: EOF D1029 22:12:13.528093 1 connection.go:127] sock read failed, closing connection: EOF D1029 22:12:13.528146 1 connection.go:127] sock read failed, closing connection: EOF D1029 22:12:13.528186 1 connection.go:127] sock read failed, closing connection: EOF D1029 22:12:13.528233 1 connection.go:127] sock read failed, closing connection: EOF D1029 22:12:13.528273 1 connection.go:127] sock read failed, closing connection: EOF D1029 22:12:13.528316 1 connection.go:127] sock read failed, closing connection: EOF D1029 22:12:13.528364 1 connection.go:127] sock read failed, closing connection: EOF D1029 22:12:13.528484 1 connection.go:127] sock read failed, closing connection: EOF D1029 22:12:13.528498 1 connection.go:127] sock read failed, closing connection: EOF D1029 22:12:13.533563 1 sampler.go:191] Time: Adjusting syscall overhead down to 671 D1029 22:12:15.534241 1 sampler.go:191] Time: Adjusting syscall overhead down to 875 D1029 22:12:15.934951 1 connection.go:127] sock read failed, closing connection: EOF D1029 22:12:15.934988 1 connection.go:127] sock read failed, closing connection: EOF :15.935030 1 connection.go:127] sock read failed, closing connection: EOF D1029 22:12:15.935083 1 connection.go:127] sock read failed, closing connection: EOF D1029 22:12:15.935220 1 connection.go:127] sock read failed, closing connection: EOF I1029 22:12:15.935212 1 loader.go:1289] Gofer socket disconnected, killing container ...
Normal gVisor log:
D1029 22:20:04.135268 1 task_exit.go:215] [ 1: 2] Transitioning from exit state TaskExitNone to TaskExitInitiated D1029 22:20:04.138439 1 connection.go:127] sock read failed, closing connection: EOF D1029 22:20:04.138460 1 connection.go:127] sock read failed, closing connection: EOF D1029 22:20:04.138515 1 connection.go:127] sock read failed, closing connection: EOF D1029 22:20:04.138537 1 connection.go:127] sock read failed, closing connection: EOF D1029 22:20:04.138567 1 connection.go:127] sock read failed, closing connection: EOF D1029 22:20:04.138654 1 connection.go:127] sock read failed, closing connection: EOF D1029 22:20:04.138660 1 connection.go:127] sock read failed, closing connection: EOF D1029 22:20:04.138710 1 connection.go:127] sock read failed, closing connection: EOF on: EOF D1029 22:20:04.138797 1 connection.go:127] sock read failed, closing connection: EOF D1029 22:20:04.188498 1 connection.go:127] sock read failed, closing connection: EOF D1029 22:20:04.188534 1 connection.go:127] sock read failed, closing connection: EOF D1029 22:20:04.190730 1 connection.go:127] sock read failed, closing connection: EOF D1029 22:20:04.190764 1 connection.go:127] sock read failed, closing connection: EOF I1029 22:20:04.190882 1 loader.go:1289] Gofer socket disconnected, killing container ...

Ayush Ranjan

unread,
Dec 26, 2025, 1:10:39 AM (4 days ago) Dec 26
to Zoey Han, gVisor Users [Public]
Hi Zoey,

Thanks for the bug report. I tried reproducing this issue:
  • Created 2 directories; one with 1000 sub-directories and 1 small file inside each subdirectory; and one with 10 subdirectories and 100 small files inside each subdirectory.
  • Bind-mounted this directory inside a Docker gVisor container using "docker run --runtime=runsc --rm -v /tmp/random_data_dir:/data ubuntu /run.sh"
  • Bind-mounted a bash script file at /run.sh which basically runs "find /data -type f -exec sh -c 'cat "$@" > /dev/null' _ {} +". This prints all the files, basically pulling the entire directory structure into gVisor's filesystem layer.
  • Checked gofer logs to see slower cleanup. I could not find slower cleanup for either directory structure.
Hence, I could not reproduce the issue. Could you provide a more accurate reproducer? Can you also share the exact flags with which you are running gVisor?

> 1. Is this proportional, blocking slowdown during Gofer’s cleanup on process exit expected when a mount contains a large amount of file state, and could the directory/file structure also contribute to this behavior?

It is certainly possible that the directory structure is somehow impacting the gofer cleanup. However, I can not think of why, would need a reproducer to investigate.

> 2. What are the recommended best practices for handling this volume of file state? Are there configurations or flags that allow early cleanup of stale files (e.g., any file flag)?

The default runsc flags should be good enough. I hypothesize that the "dentry cache" in gVisor might be involved. There is a flag, --dcache, which can control the size of the global dentry cache in gVisor. This is basically the number of leaf nodes (of the filesystem tree) that gVisor will cache before evicting them from the cache. It defaults to 1000 dentries per bind-mount. If you set --dcache=10, it will enforce that across all bind mounts, only 10 leaf dentries are cached. You can try playing with this flag to see if the cleanup slowdown improves. However, reducing this can hurt runtime performance, as if the same file is re-accessed, it will miss our cache and we will need to fetch the file again from the host (slow).

--
You received this message because you are subscribed to the Google Groups "gVisor Users [Public]" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gvisor-users...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/gvisor-users/4be8d2c2-5ca1-404d-8f92-5e8575cbf981n%40googlegroups.com.

--
Ayush Ranjan

Zoey Han

unread,
4:03 AM (7 hours ago) 4:03 AM
to Ayush Ranjan, gVisor Users [Public]
Hey Ayush,

Thanks for the reply! Here are some config that we used, which could potentially impact the gofer process. Also I noticed when I test with runc with the same file structure, it doesn't slow down, hope this additional info helps. 
1. --copy-on-write-memory-allocation-size
2. --gofer-host-fd-translation-marx-bytes=0
3. --pre-allocate-host-fd-table-size

Best,
Zoey

Ayush Ranjan

unread,
4:06 AM (7 hours ago) 4:06 AM
to gVisor Users [Public]
Hey Zoey,

None of the flags you mentioned are part of open-source runsc. So I can't set those. Are you using a fork of gVisor?
Moreover, can you share your reproducer (e.g. the script used to generated the directories/files and the docker/runsc commands you are using to run the sandbox)?

- Ayush
Reply all
Reply to author
Forward
0 new messages