[kubernetes/kubernetes] testing.StartTestServer doesn't tear down cleanly (#49489)

Dan Mace

unread,

Jul 24, 2017, 8:44:17 AM7/24/17

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

What happened:

The new k8s.io/kubernetes/cmd/kube-apiserver/app/testing.StartTestServerOrDie function (introduced in #46865) returns a teardown function which doesn't cleanly shut down the test server. This results in the accumulation of goroutines and log spam which prevents effective use of the test server across multiple test functions within the same process.

What you expected to happen:

Calling the teardown function should gracefully terminate everything that started up when StartTestServerOrDie was called.

How to reproduce it (as minimally and precisely as possible):

Using the following sample integration test code:

import (
	"fmt"
	"runtime"
	"testing"

	apitesting "k8s.io/kubernetes/cmd/kube-apiserver/app/testing"
)

func TestTeardown(t *testing.T) {
	_, tearDown := apitesting.StartTestServerOrDie(t)
	tearDown()
	stack := make([]byte, 8196)
	size := 0
	for {
		size = runtime.Stack(stack, true)
		if size < len(stack) {
			break
		}
		stack = make([]byte, len(stack)*2)
	}
	fmt.Printf("%s\n", string(stack[0:size]))
}

After tearDown() returns, there are several lingering goroutines which forever attempt to maintain etcd connections to the terminated etcd instance. Here's an example stack dump:

https://gist.github.com/ironcladlou/52b3e3306948db3943b426c70ce7f85b

Among all the etcd connection threads, some things you'll notice are lingering Cacher instances (which are created due to the default EnableWatchCache storage setting) which seem to try and hold watches, and configuration_manager (which may or may not hold connections; I'm not sure yet). This seems to indicate various components started during apiserver setup aren't actually shutting down.

Anything else we need to know?:

This is important for enabling integration testing of custom resource garbage collection (#47665).

Environment:

Kubernetes version (use kubectl version): master (as of 088141c)
Cloud provider or hardware configuration**:
OS (e.g. from /etc/os-release): darwin/amd64
Kernel (e.g. uname -a):
Install tools:
Others:

/cc @sttts @caesarxuchao @deads2k @liggitt @kubernetes/sig-api-machinery-bugs
/kind bug
/sig api-machinery

—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

Dr. Stefan Schimanski

unread,

Jul 24, 2017, 8:51:48 AM7/24/17

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

/cc @frobware

fejta-bot

unread,

Jan 10, 2018, 6:19:38 PM1/10/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

Nikhita Raghunath

unread,

Jan 10, 2018, 10:40:56 PM1/10/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

/remove-lifecycle stale

Morgan Bauer

unread,

Jan 19, 2018, 8:22:08 PM1/19/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

We don't use this testing framework in the service-catalog, but we have a similar problem.

fejta-bot

unread,

Apr 19, 2018, 10:02:45 PM4/19/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

Nikhita Raghunath

unread,

Apr 20, 2018, 5:10:10 AM4/20/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

/remove-lifecycle stale

fejta-bot

unread,

Jul 19, 2018, 5:28:03 AM7/19/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

—

Nikhita Raghunath

unread,

Aug 11, 2018, 5:02:19 AM8/11/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

/remove-lifecycle stale

fejta-bot

unread,

Nov 9, 2018, 4:26:29 AM11/9/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

—

fejta-bot

unread,

Dec 9, 2018, 5:09:35 AM12/9/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.

/lifecycle rotten

Nikhita Raghunath

unread,

Dec 10, 2018, 1:39:29 PM12/10/18

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

/lifecycle frozen

Jordan Liggitt

unread,

Sep 5, 2019, 10:12:43 AM9/5/19

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

still an issue. updated reproducer:

package apiserver

import (
	"math"
	"runtime"
	"testing"
	"time"

	apitesting "k8s.io/kubernetes/cmd/kube-apiserver/app/testing"
	"k8s.io/kubernetes/test/integration/framework"
)

func TestCleanTeardown(t *testing.T) {
	etcdConfig := framework.SharedEtcd()

	// Watch goroutines for 5 seconds before starting the server to get a high-water mark
	before := 0
	for i := 0; i < 500; i++ {
		if goroutines := runtime.NumGoroutine(); goroutines > before {
			before = goroutines
		}
		time.Sleep(10 * time.Millisecond)
	}
	beforeStack := make([]byte, 1024*1024)
	size := runtime.Stack(beforeStack, true)
	beforeStack = beforeStack[:size]

	s := apitesting.StartTestServerOrDie(t, nil, nil, etcdConfig)
	s.TearDownFn()

	// Watch goroutines for 5 seconds after stopping the server to get a low-water mark
	after := math.MaxInt32
	for i := 0; i < 500; i++ {
		if goroutines := runtime.NumGoroutine(); goroutines < after {
			after = goroutines
		}
		time.Sleep(10 * time.Millisecond)
	}

	if after > before {
		afterStack := make([]byte, 1024*1024)
		size := runtime.Stack(afterStack, true)
		afterStack = afterStack[:size]

		t.Logf("before:\n%s", string(beforeStack))
		t.Logf("after:\n%s", string(afterStack))
		t.Errorf("expected %d or fewer, got %d", before, after)
	} else {
		t.Logf("expected %d or fewer, got %d", before, after)
	}
}

go test ./test/integration/apiserver -run TestCleanTeardown -v

teardown_test.go:47: expected 6 or fewer, got 60

Steven Powell

unread,

Jun 26, 2025, 6:01:14 PMJun 26

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

spowelljr left a comment (kubernetes/kubernetes#49489)

Seems much better than before, not sure if this is within the threshold of considering resolved.

go test ./test/integration/apiserver -run TestCleanTeardown -v

teardown_test.go:47: expected 9 or fewer, got 10

—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are on a team that was mentioned.

Jordan Liggitt

unread,

Jun 26, 2025, 6:12:15 PMJun 26

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

liggitt left a comment (kubernetes/kubernetes#49489)

It's a lot closer, definitely. Can we identify the particular goroutine still hanging, and make sure that's in the list of specifically recorded exceptions to goroutine checking (should be enumerated in the integration test leak detection code, I think)

At the very least, updating this to point at the specific goroutines would help clarify the remaining work, and then we can decide how important the remaining ones are

—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are on a team that was mentioned.

Steven Powell

unread,

Jun 26, 2025, 6:56:30 PMJun 26

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

spowelljr left a comment (kubernetes/kubernetes#49489)

Hanging goroutine:

        goroutine 156 [select]:
        k8s.io/kubernetes/test/utils/ktesting.withTimeout.func2()
        	/kubernetes/test/utils/ktesting/contexthelper.go:56 +0x9a
        created by k8s.io/kubernetes/test/utils/ktesting.withTimeout in goroutine 155
        	/kubernetes/test/utils/ktesting/contexthelper.go:55 +0x20b

https://github.com/kubernetes/kubernetes/blob/b38dd716fe9409a831e94e4d5fc4e085d711c9ae/test/utils/ktesting/contexthelper.go#L56

63aa261 The new TContext interface combines a normal context and the testing interface, then adds some helper methods. The context gets canceled when the test is done, but that can also be requested earlier via Cancel.

Doesn't seem to be in the goroutine checking exception list.

—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are on a team that was mentioned.

Antonio Ojea

unread,

Jun 30, 2025, 11:34:50 AMJun 30

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

aojea left a comment (kubernetes/kubernetes#49489)

@pohly has better insights on the ktesting plumbing

—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are on a team that was mentioned.

Patrick Ohly

unread,

Jun 30, 2025, 12:40:29 PMJun 30

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

pohly left a comment (kubernetes/kubernetes#49489)

/assign

—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are on a team that was mentioned.

Jordan Liggitt

unread,

Jul 1, 2025, 9:29:01 AMJul 1

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

liggitt left a comment (kubernetes/kubernetes#49489)

ah, the reproducer in #49489 (comment) runs before things that hooked into test cleanup ... I think the ktesting context is cleaned up properly as part of that

I think GoleakCheck / EtcdMain goleak detection covers the remaining goroutines, so we can probably close this issue

—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are on a team that was mentioned.

Patrick Ohly

unread,

Jul 3, 2025, 1:56:54 AMJul 3

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

pohly left a comment (kubernetes/kubernetes#49489)

Indeed. We now use go-leak in each tests's TestMain via EtcdMain. I cleaned up some more leaks a while back and recorded the remaining known exceptions with comments on goleak.IgnoreTopFunction:
https://github.com/kubernetes/kubernetes/blob/7f4f1c29a97de6712d341bf3d724fb122c828772/test/integration/framework/etcd.go#L254-L297

I think those exceptions are acceptable, so we could close this issue.

—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are on a team that was mentioned.

Jordan Liggitt

unread,

Jul 3, 2025, 10:53:15 AMJul 3

to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

Closed #49489 as completed.

—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are on a team that was mentioned.

Reply all

Reply to author

Forward