RFC: Where should Linux-specific kernel integration tests live in kubernetes/kubernetes?

54 views
Skip to first unread message

Kartavya Sonar

unread,
May 13, 2026, 8:01:54 AM (10 days ago) May 13
to kubernetes-...@googlegroups.com
Context:
PR #138993 adds a test that verifies kube-proxy nftables rule installation inside
an unprivileged user+network namespace. This requires Linux and unprivileged user
namespace support. The question is where this category of test should live and how
it should run in CI.

Options considered:

1. Existing unit tests
   Pros: runs locally and fast
   Cons: only works on Linux with user namespaces enabled, not portable

2. Integration tests
   Pros: already assumes Linux
   Cons: already slow, currently means running against an apiserver

3. Standalone target (separate from unit/integration)
   Pros: clean separation
   Cons: maintenance overhead for a new test category

4. e2e tests
   Pros: already handles Linux-specific infra
   Cons: defeats the purpose, goal is lightweight local testing without a full cluster

Question for sig-testing:
What is the recommended home for tests that require Linux kernel features
(unprivileged user namespaces, nftables) but do not need an apiserver, and how
should they be wired into CI?

PR: https://github.com/kubernetes/kubernetes/pull/138993
Related issue: https://github.com/kubernetes/kubernetes/issues/130926

Patrick Ohly

unread,
May 13, 2026, 9:51:33 AM (10 days ago) May 13
to Kartavya Sonar, kubernetes-...@googlegroups.com
Kartavya Sonar <sonark...@gmail.com> writes:

> Context:
> PR #138993 adds a test that verifies kube-proxy nftables rule installation
> inside
> an unprivileged user+network namespace. This requires Linux and
> unprivileged user
> namespace support. The question is where this category of test should live
> and how
> it should run in CI.
>
> Options considered:
>
> 1. Existing unit tests
> Pros: runs locally and fast
> Cons: only works on Linux with user namespaces enabled, not portable
>
> 2. Integration tests
> Pros: already assumes Linux
> Cons: already slow, currently means running against an apiserver

Cons: might run on a Linux host where unprivileged user
namespace support is not enabled - basically the same problem as unit tests.

> 3. Standalone target (separate from unit/integration)
> Pros: clean separation
> Cons: maintenance overhead for a new test category

I've done such a separate target for DRA upgrade/downgrade testing because
it needs special invocation (see the exception list in
hack/make-rules/test.sh). It's then your responsibility to set up a
suitable job and monitor it.

> 4. e2e tests
> Pros: already handles Linux-specific infra
> Cons: defeats the purpose, goal is lightweight local testing without a
> full cluster

Cons: you cannot make assumptions about unprivileged user namespace
support in the Prow cluster where the e2e test runs.

Another alternative: implement it inside e2e_node and use the existing
mechanisms to spin up a VM with the right kernel config. You might not
even need new jobs for it, just write it as an `It(...,
features.UserNamespacesSupport)` (which already exists!) and your test
will be executed together with other E2E node tests in jobs that are
configured to allow tests with that feature dependency - check with SIG Node.

--
Best Regards

Patrick Ohly
Cloud Software Architect

antonio.o...@gmail.com

unread,
May 18, 2026, 7:28:07 AM (5 days ago) May 18
to kubernetes-sig-testing
 I like the standalone execution approach Patrick used for DRA, but moving the tests to a separate standalone directory won't work for us. These tests often need to validate private, unexported methods, see https://github.com/kubernetes/kubernetes/pull/138993/ per example.

I propose we use a custom build tag so:

- We gate the file with //go:build linux && usernstest. Standard make test runs and CI unit test jobs will completely ignore it. Zero noise, zero skipped test logs. 
- We create a specific Prow job for SIG Network that provisions an environment where unprivileged user namespaces are allowed, and runs with the tag: go test -v -tags=usernstest ./pkg/proxy/...

If people agree on the approach, then we just need to update the file header to //go:build linux && usernstest and set up the new Prow job

Patrick Ohly

unread,
May 18, 2026, 7:56:38 AM (5 days ago) May 18
to antonio.o...@gmail.com, kubernetes-sig-testing
"antonio.o...@gmail.com" <antonio.o...@gmail.com> writes:
> I propose we use a custom build tag so:
>
> - We gate the file with //go:build linux && usernstest. Standard make test
> runs and CI unit test jobs will completely ignore it. Zero noise, zero
> skipped test logs.
> - We create a specific Prow job for SIG Network that provisions an
> environment where unprivileged user namespaces are allowed,

Provision how? Let's clarify that first because it might not be simple -
I honestly don't know.

Antonio Ojea

unread,
May 18, 2026, 9:31:45 AM (5 days ago) May 18
to Patrick Ohly, kubernetes-sig-testing
Provision means to set up the job to run in a cluster where user namespaces work.
I didn't check but I don't think we need to do anything special, just maybe use securityContext.privileged: true, but we already do that for the kind and dind jobs ... 
basically is clone one of the existing kind jobs and run go test -v -tags=usernstest 

Patrick Ohly

unread,
May 18, 2026, 11:00:54 AM (4 days ago) May 18
to Antonio Ojea, kubernetes-sig-testing
Antonio Ojea <antonio.o...@gmail.com> writes:
> Provision means to set up the job to run in a cluster where user namespaces
> work.
> I didn't check but I don't think we need to do anything special, just maybe
> use securityContext.privileged: true

So the assumption is that the Linux kernel has that feature enabled and
it just has to be made available. Fair enough, in this case. But I'd
like to hear from others (Ben?) what they think about such bespoke build
flags and one-of test jobs.

Benjamin Elder

unread,
May 18, 2026, 12:38:12 PM (4 days ago) May 18
to Patrick Ohly, Antonio Ojea, kubernetes-sig-testing
I don't think "linux specific" is accurate here. These are *unprivileged usernamespace* specific, which is not all linux hosts either (e.g. security concerns).

In fact, I don't think our CI clusters currently support it, and I suspect these tests are silently skipped in CI as-is.


My primary concern is that we don't introduce a new class of tests that are assumed to cover some functionality but are widely not actually running.
We've had bad experiences with this previously.

So I suggest we:
1. Remove any self-skip functionality.
2. Put them into dedicated packages, so in CI we can trivially detect if they ran versus other tests co-located in the package
3. Put them in a dedicated directory tree so you can `make test-withusernamespace` (or whatever we call it), also then CI can avoid double-calling the normal unit tests.
4. We can follow-up with attempts to make `make test-withusernamespaces` (we really need a better name) do the right thing (e.g. use build/run.sh on mac)
5. When we can't create an unprivilegd usernamespace, we should fail loudly, and provide a pointer for how to make the tests work.
6. We make sure CI does in fact run them.

Earlier there was a comparison to sockets ... sockets work *everywhere* and we fail the tests if we can't provision them.
I think we need to explicitly categorize these.

- Ben


--
You received this message because you are subscribed to the Google Groups "kubernetes-sig-testing" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-te...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/kubernetes-sig-testing/yrjhv7ckwyn8.fsf%40pohly-mobl1.fritz.box.

antonio.o...@gmail.com

unread,
May 18, 2026, 2:34:15 PM (4 days ago) May 18
to kubernetes-sig-testing
I just tested that user namespaces work in the prow-build cluster, I think it works in gke COS nodes since 1.33.

I think that tags as I said previously address those concerns, if the tag is not set it will not even compile

Benjamin Elder

unread,
May 18, 2026, 2:41:33 PM (4 days ago) May 18
to antonio.o...@gmail.com, kubernetes-sig-testing
> I think that tags as I said previously address those concerns, if the tag is not set it will not even compile

tags *and a dedicated set of packages* / target, so we know where to run those and can specifically test them, see previous comments.

Patrick Ohly

unread,
May 19, 2026, 6:04:32 AM (4 days ago) May 19
to Benjamin Elder, antonio.o...@gmail.com, kubernetes-sig-testing
"'Benjamin Elder' via kubernetes-sig-testing"
<kubernetes-...@googlegroups.com> writes:

>> I think that tags as I said previously address those concerns, if the tag
> is not set it will not even compile
>
> tags *and a dedicated set of packages* / target, so we know where to run
> those and can specifically test them, see previous comments.

Dedicated packages then would enable running inside e2e_node because
they can be imported to register the tests there. But as Antonio said,
that doesn't work for unit tests which need to access unexported
functions or fields.

Are you 100% sure that these tests cannot be rewritten as black-box
tests of the package?

In the PRs you suggested maintaining the list of packages which must be
tested like this in the test-infra job, combined with the build tag to
exclude them from "make test", aka pull-kubernetes-unit. That works, but
it's fragile: suppose someone adds such a unit test in a new
package. That package then has to be merged without running the tests.
Only then can the job can be updated to include it, potentially breaking
presubmit and periodic if the tests were broken.

This assumes that contributor and reviewer are careful and know that
test-infra needs to be updated. If they aren't, the new test will
silently not be executed. This is the risk that Ben called out. We've
had that before with unit tests that got added under hack/tools/* at a
time when test.sh didn't run unit tests outside of the workspace.

>> I just tested that user namespaces work in the prow-build cluster, I think
>> it works in gke COS nodes since 1.33.

Even if it works now, you are basically making assumptions about how
the Kubernetes Prow is configured. That feels a bit dirty.

Antonio Ojea

unread,
May 19, 2026, 6:54:49 AM (4 days ago) May 19
to Patrick Ohly, Benjamin Elder, kubernetes-sig-testing
On Tue, 19 May 2026 at 12:04, Patrick Ohly <patric...@intel.com> wrote:
"'Benjamin Elder' via kubernetes-sig-testing"
<kubernetes-...@googlegroups.com> writes:

>> I think that tags as I said previously address those concerns, if the tag
> is not set it will not even compile
>
> tags *and a dedicated set of packages* / target, so we know where to run
> those and can specifically test them, see previous comments.

Dedicated packages then would enable running inside e2e_node because
they can be imported to register the tests there. But as Antonio said,
that doesn't work for unit tests which need to access unexported
functions or fields.

Are you 100% sure that these tests cannot be rewritten as black-box
tests of the package?


The whole benefit of user namespaces is that they allow you to run integration tests with the kernel as unit tests, we already have black box tests so just replicating them with user namespaces is not giving any value.
 
In the PRs you suggested maintaining the list of packages which must be
tested like this in the test-infra job, combined with the build tag to
exclude them from "make test", aka pull-kubernetes-unit. That works, but
it's fragile: suppose someone adds such a unit test in a new
package. That package then has to be merged without running the tests.
Only then can the job can be updated to include it, potentially breaking
presubmit and periodic if the tests were broken.


Those tests can only run using the helper that is under a tag, so AFAIK those tests will not compile without the tag.
Of course if something breaks it can not impact anything that are not those sig-network unit tests using user namespaces.
It is a SIG network responsibility to maintain those jobs, not a project one, we already have plenty of examples of SIGs and WG with custom jobs (windows, DRA, node, ... ), so I do not fully understand what is the concern here, we are not trying to do a project wide testing strategy to shift the entire project, just to discuss the best way to integrate that is scalable and has zero impact, windows tests that use tags may be the best comparison ...
 

This assumes that contributor and reviewer are careful and know that
test-infra needs to be updated. If they aren't, the new test will
silently not be executed. This is the risk that Ben called out. We've
had that before with unit tests that got added under hack/tools/* at a
time when test.sh didn't run unit tests outside of the workspace.

>> I just tested that user namespaces work in the prow-build cluster, I think
>> it works in gke COS nodes since 1.33.

That is already fixed, if user namespaces are not supported it will fail instead of skipped.
 

Even if it works now, you are basically making assumptions about how
the Kubernetes Prow is configured. That feels a bit dirty.

It is not dirty, it is how it works today, Kubernetes Prow clusters to run unit and integration tests depends on GKE and other cloud providers like AWS, we already make a lot of assumptions on the kernel on CI, some examples:

- Back in the days we didn't have IPv6 support because modules were disabled by default, so we explicitly enable them https://github.com/kubernetes/test-infra/blob/2b936b7df9057d5d01a4013291d674e1335367a8/images/bootstrap/runner.sh#L59-L65

- Because we use GKE and COS images, and those were not having specific kernel modules, I had to request features to internal teams to enable those modules that were released as part of GKE so we can run nftables proxy in our CI https://github.com/kubernetes/kubernetes/issues/128829#issuecomment-2609369264
 
User namespaces are in the kernel for a long time, the feature is also GA in kubernetes, the main problem with them AFAIK is that they had several criticals CVEs and distros with strong security concerns disable them by default, but that is not longer a thing in GKE since 1.33, and I couldn't test in other distros but I assume is the same, github actions also allow to run them disabling the apparmor rule they added in 24.04 https://github.com/lima-vm/lima/issues/2319 

Benjamin Elder

unread,
May 19, 2026, 10:55:36 AM (3 days ago) May 19
to Antonio Ojea, Patrick Ohly, kubernetes-sig-testing
> Those tests can only run using the helper that is under a tag, so AFAIK those tests will not compile without the tag.

As long as the test files are also tagged, yes.


> In the PRs you suggested maintaining the list of packages which must be
tested like this in the test-infra job, combined with the build tag to
exclude them from "make test", aka pull-kubernetes-unit. That works, but
it's fragile: suppose someone adds such a unit test in a new
package. That package then has to be merged without running the tests.
Only then can the job can be updated to include it, potentially breaking
presubmit and periodic if the tests were broken.

This is why I believe the list needs to be maintained in the Kubernetes repo.

I suggested explicit packages with these tests and a make target.

Even if we can't separate the packages, we should add a target in k/k that anyone can execute without knowledge of the job, and which can be updated without touching test-infra.

Ideally it should only run the packages with these tests, that is a gap vs eg DRA e2e tests (and why I suggested a dedicated package tree, like test/usernamespaces/... or similar)
We could accomplish something similar by maintaining a package list in the make target.

> Of course if something breaks it can not impact anything that are not those sig-network unit tests using user namespaces.
> It is a SIG network responsibility to maintain those jobs, not a project one, we already have plenty of examples of SIGs and WG with custom jobs (windows, DRA, node, ... ), so I do not fully understand what is the concern here, we are not trying to do a project wide testing strategy to shift the entire project, just to discuss the best way to integrate that is scalable and has zero impact, windows tests that use tags may be the best comparison ...

Eh, even with SIG Windows and DRA we are involved in how those are being run and they have been discussed with SIG Testing many times.
Presubmits are NOT something that is just ignored by anyone except the maintainers of the job.
Other contributors need to be able to interact with these reasonably if they're sending a fix to SIG Net.

> User namespaces are in the kernel for a long time, the feature is also GA in kubernetes, the main problem with them AFAIK is that they had several criticals CVEs and distros with strong security concerns disable them by default

It's actually very much still a security concern, but that shouldn't stop us from using it.

Benjamin Elder

unread,
May 19, 2026, 11:31:32 AM (3 days ago) May 19
to Antonio Ojea, Patrick Ohly, kubernetes-sig-testing
Things to consider:

1. We previously had many unit tests relying on "the CI runs unit tests as root on linux".
We fixed or disabled those and made the CI run as non-root.
Some of them are likely candidates for leveraging this functionality.
For example storage mount utils may be a good next candidate.

We will need a pattern, it does not need to be a KEP, nor does it need to be excessively complex, but we should not rush heedlessly.

2. When we put tests into the upstream project, we expect both CI and contributors to run them.
Otherwise there is no point in upstreaming them.
We must clarify how that will work in both cases.

3. We are introducing a new requirement, even though we hope it is widely available.

Additionally, the Windows tests/jobs are a good example, they're a very active topic at the moment and have been ... problematic.
We're working through that with SIG Windows (Patrick and I are both active there in the past week ...).

Let's take a moment to make something maintainable and generally useful.
I think we can start with a manual list behind a make target and go from there, but let's not start with something difficult to iterate on.

In further iterations we can:
- Automatically find `//go:build linux && usernstest` and run them with the correct options.
- Test for userns support and give the user instructions when it is not available.
- Automatically run under build/run.sh when on macOS hosts
- Enable non-networking tests to leverage userns

- Ben

Antonio Ojea

unread,
May 19, 2026, 5:24:30 PM (3 days ago) May 19
to Benjamin Elder, Patrick Ohly, kubernetes-sig-testing

Ok, I was missing the point that we wanted to build a pattern of this, I was with tunnel vision to solve the SIG Network problem.
The original goal was to test kernel-level dataplane configurations (netlink, iptables, nftables) as internal golang tests, locally and in CI, without requiring root.
The underlying mechanism wraps the test binary in unshare -r -n (credit to Rodrigo Campos for the pattern in kube-network-policies#156), mapping the unprivileged user to root inside an isolated network namespace.
This way of testing provides a lot of benefits, specially for networking features that depend on the kernel netlink interface, since it requires root privileges and e2e tests are slow and impacted for a lot of environment variable that cause flakiness.

Based on the feedback from this thread, this is the concrete technical proposal.:

1. All user namespace tests (standard _test.go files) and their specific helper will be gated behind a Go build tag: //go:build userns.
We will put the helper in a dedicated package (e.g., test/userns/userns.go) so it cannot be misused.
A standard make test or make test-integration will ignore these files. If a standard untagged test accidentally imports the userns helper, it will fail at compile time.

2. To address the concern about test-infra fragility and maintaining package lists, we will introduce a "make test-userns" target, similar to the existing test, test-integration, and test-e2e-node.
This target will list the corresponding _test.go files with the -tags=userns flag. If a developer adds a new userns unit test to a new package, it is automatically picked up by this target.

3. There will be no silent skips.
The userns test helper will explicitly check for system compatibility (e.g., verifying sysctl kernel.unprivileged_userns_clone and namespace capabilities) and fail if it is not supported.

4.We will create a dedicated Prow job (pull-kubernetes-test-userns) that executes make test-userns.

Let me know if this aligns with your thoughts and addresses your concern for an initial iteration.


--
You received this message because you are subscribed to the Google Groups "kubernetes-sig-testing" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-te...@googlegroups.com.

Patrick Ohly

unread,
May 20, 2026, 5:32:51 AM (3 days ago) May 20
to Antonio Ojea, Benjamin Elder, kubernetes-sig-testing
Antonio Ojea <antonio.o...@gmail.com> writes:
> Let me know if this aligns with your thoughts and addresses your concern
> for an initial iteration.

Thanks, this sounds good to me.

It's worth calling out that the build tag has one implication:
golangci-lint in pull-kubernetes-verify will not check those files. This
might be surprising to developers. Other build tags (for example,
windows) have the same problem, so this is not new. I can't think of a
good solution. Running golangci-lint multiple times would be slow
and confusing when the different runs report the same issues in shared
code.

--
Best Regards

Patrick Ohly
Cloud Software Architect
Reply all
Reply to author
Forward
0 new messages