SCTP: KEPs vs CVEs

267 views
Skip to first unread message

Dan Winship

unread,
Jul 9, 2019, 12:01:42 PM7/9/19
to kubernetes-...@googlegroups.com
There was discussion in the SIG meeting a few weeks ago about moving
the SCTP KEP forward, which worried some people because of a history
of CVEs involving SCTP. Tim wanted a write-up on the issues.

TL;DR: The thing admins need to worry about most is pods fiddling
with SCTP in their own network namespace, which can be done whether
or not Kubernetes supports SCTP (and we should do a better job of
documenting that). If you have your cluster correctly configured to
avoid that problem, then nothing Kubernetes itself does will
introduce additional attack vectors. And if you *don't* have your
cluster configured correctly to avoid that problem, then the design
of the SCTP KEP still does its best to avoid adding any *new* attack
vectors.


I. Background / What We Need To Worry About

There are occasional SCTP-related CVEs. Eg, most recently
CVE-2019-8956 which is a local root exploit. In general, the kernel
SCTP code hasn't had "many eyes" on it, and people are pretty dubious
of it.

Specifically, this dubiousness is regarding the 40,000 lines of code
in net/sctp/, which is normally compiled into its own kernel module,
sctp.ko, which implements IPPROTO_SCTP sockets for userspace. There is
also a small amount of SCTP-related code in the kernel that is *not*
part of sctp.ko, and which people are generally *not* especially
worried about. In particular, the routing code can move SCTP packets
between interfaces as needed, and iptables/nft, OVS, and eBPF all
support matching/filtering SCTP packets, without needing to load
sctp.ko (because for their purposes, SCTP isn't *that* much different
from TCP and UDP).

One of the things that makes sctp.ko in particular a big problem for
Kubernetes (as compared to other random optional kernel modules of
dubious quality) is that, by default, unprivileged processes can cause
socket-related kernel modules to be autoloaded just by creating a
socket of the appropriate type; so if sctp.ko is on disk, and it's not
blacklisted, and it's not blocked by SELinux, then any pod can open an
IPPROTO_SCTP socket on 127.0.0.1, fiddle with it appropriately to
trigger an exploit, and then do bad things.

But that is completely independent of whether Kubernetes itself
supports SCTP; if you are an admin who is worried about
sctp.ko-related security exploits, and you don't have any need for
SCTP support in your cluster, then you should already be doing one of:

1. Uninstalling sctp.ko from the machine.

2. Blacklisting or blocking the module via /etc/modprobe.d/.

3. Using SELinux (or another Linux Security Module) to deny
"module_request" permission to containers, so pods can't cause
kernel modules to be autoloaded in general.

(And this applies doubly so to dccp.ko [which is also autoloadable via
socket creation and is considered even more dubious than sctp.ko]. And
depending on your level of paranoia, perhaps also to other kernel
modules that can be autoloaded via sockets, like bluetooth,
appletalk(!), etc.)

If nodes are configured so that *nobody* can autoload sctp.ko, then
there is basically nothing else for us to worry about, whether or not
SCTP is enabled in Kubernetes.

If nodes are configured with sctp.ko already loaded, or such that pods
can autoload it themselves (either because the administrator intends
for SCTP to be used, or because they didn't know to block it), then
the cluster is vulnerable to attacks from authenticated users, and
there's nothing we can do about it.

If nodes are configured so that *pods* can't autoload sctp.ko, but
system components still can (most likely meaning that the
administrator did not take any explicit steps to protect against SCTP,
but SELinux has got their back), then we're safe from malicious
attempts to use SCTP from within pods, as long as we make sure that
there's no way for an attacker to trick Kubernetes into loading
sctp.ko on its behalf. Most of our ability to potentially screw things
up falls in this case.


II. A Digression: "Userspace SCTP" - The Complication That Ends Up
Being a Simplification

One important use case in the SCTP KEP is people who are doing
"userspace SCTP". That means that instead of pods opening IPPROTO_SCTP
sockets and using the kernel's SCTP support, they open SOCK_RAW
sockets, and marshal and unmarshal SCTP packets directly. (This
requires CAP_NET_RAW.) There are a few reasons people do this,
including alleged performance problems with the kernel SCTP code, and
missing features in the sockets API. But the important part for us is
that the kernel only lets you do this if sctp.ko *hasn't* been loaded.

To support this use case without having to add another feature flag,
the decision was made that Kubernetes itself would never open SCTP
sockets (and therefore never cause sctp.ko to be loaded).
Specifically, when setting up a pod HostPort or service NodePort,
Kubernetes does not open a placeholder listening socket on that port
the way it does with TCP and UDP. (This is theoretically bad, since it
means the kernel might try to assign that port to an unrelated
listening socket later, which would then not work because of the
iptables rules diverting traffic away from it. In practice... people
generally do not open SCTP sockets on the host network and ask to be
assigned a random port, so...)

While this was decided on for functionality reasons, it ends up
helping us out with security too; if Kubernetes itself never opens
SCTP sockets under any circumstances, whether SCTP is enabled or not,
then it will never cause sctp.ko to be loaded and so most of the rest
of the security issue is solved.


III. Are We Sure There Are No Other SCTP-Related Security Issues?

The SCTPSupport feature gate technically only has one effect: it
controls whether validation accepts or rejects "SCTP" as a Protocol
value in Services, Pod ContainerPorts, and NetworkPolicies. Nothing
else in Kubernetes looks at the feature gate itself; everything just
assumes that if it sees an SCTP port on a resource, then that must
mean that the feature is enabled. (Even though it technically only
means that the feature was enabled *at some point in the past*, not
necessarily that it's enabled currently...)

So to figure out what other security problems SCTPSupport might
introduce, we need to look at what everything in Kubernetes might do
in response to a "Protocol: SCTP" socket somewhere. (For bonus points,
we should also figure out if there are any places where Kubernetes
might work with SCTP sockets which do *not* involve the data
controlled by the feature gate. [Answer: no, AFAICT])

Here's what I could find:

- pkg/util/conntrack/, pkg/util/ipset/, pkg/util/iptables/,
pkg/util/ipvs/ - These utility libraries all pass the protocol
value on to external programs or lower-level APIs, but they all
interact with "good" kernel SCTP code, not socket-related code.

- pkg/controller/endpoint/, pkg/controller/service/,
pkg/kubelet/envvars/, pkg/registry/core/service/storage/
(NodePort allocation), pkg/scheduler/ (HostPort conflict
avoidance) - These all look at protocol values but just treat
them as opaque strings.

- pkg/kubectl/, cmd/kubectl/:
- "kubectl expose" handles SCTP, but it just operates on
Kubernetes resources; it does not call network APIs
- "kubectl port-forward" and "kubectl proxy" don't accept a
protocol value and only work with TCP.

- cmd/* - apiserver, etc only support listening on TCP

- pkg/proxy/iptables/, pkg/proxy/ipvs/ - deal correctly with SCTP
services, except they intentionally avoid opening placeholder
sockets for SCTP NodePort Services as described above.

- pkg/proxy/userspace/, pkg/proxy/winuserspace/ - log errors when
they see SCTP Service ports, and don't attempt to proxy them.

- pkg/proxy/winkernel/ - I'm not 100% sure here; Windows does not
have kernel SCTP support, but there are references to SCTP in
this file. It's possible that Windows has the equivalent of
routing/iptables SCTP support but not SCTP socket support?
Anyway, it doesn't really matter for this discussion since this
is non-Linux anyway...

- pkg/kubelet/dockershim/network/hostport/ - Like the proxy code,
this creates iptables rules for SCTP HostPorts but does not open
listening sockets for them.

- pkg/kubelet/dockershim/network/cni/ - passes HostPort information
to CNI (see below)

- pkg/kubelet/dockershim/ - passes SCTP HostPort information to
docker, which will then open placeholder SCTP sockets. :-(

- pkg/kubelet/kuberuntime/ - passes HostPort info to the runtime.
- CRI-O - vendors pkg/kubelet/dockershim/network/hostport to do
hostports, so new-enough versions will do the right thing
and older versions will return errors about unsupported
protocol value.
- containerd - defers HostPort handling to CNI (see below)
- ... other (non-docker) runtimes?

- pkg/cloudprovider/providers/, staging/src/k8s.io/legacy-cloud-providers/
- aws: only supports TCP
- azure: only supports TCP and UDP
- gce: only supports TCP and UDP
- openstack: only supports TCP
- vsphere: doesn't support LoadBalancers at all?

Kubernetes-external:

- github.com/containernetworking/plugins/plugin/meta/portmap/ -
default CNI HostPort implementation. Handles SCTP, but doesn't do
placeholder listening sockets for *any* protocol type (since it
doesn't have a daemon that can hold the sockets open). So it's
fine.


So of all that stuff, the only "bad" behavior comes from docker, which
will create placeholder listening sockets for SCTP HostPorts, meaning:

1. It's incompatible with "userspace SCTP"

2. In the case where an admin didn't know to block sctp.ko from
being loaded, but was being rescued by the fact that SELinux
prevented containers from being able to load it themselves, then
an attacker would be able to subvert this by getting docker to
load the module for them.

(I say "docker" but the code is actually in libnetwork, so there could
be other people using that code. Also, to be clear, Docker's behavior
here is absolutely correct, it's just that we don't want the correct
behavior.)

It is also possible that some network plugins, external cloud
providers, container runtimes, or other add-ons might do "bad" things
with SCTP.

But mostly we're good.


IV. The Light at the End of the Tunnel

One of our kernel developers notes:

SCTP is not as stable as TCP, but not as buggy as DCCP. Recently
syzkaller in upstream is fuzzing sctp also. We are getting reports
and fixing mostly as they come. SCTP got much better from where it
was 4 years ago.


V. Next Steps

- Document the inherent issues with sctp.ko (and dccp.ko and the like),
and the various solutions.
https://kubernetes.io/docs/tasks/administer-cluster/securing-a-cluster/
might be a good place for this.

- Note in the network plugin conformance documentation (ha ha ha) that
plugins SHOULD support SCTP, but SHOULD NOT open SCTP sockets.

- Maybe abstract out the "hold open a host port unless it's SCTP"
code rather than having identical copies in pkg/proxy/iptables,
pkg/proxy/ipvs, and pkg/kubelet/dockershim/network/hostport.

- Move the SCTP KEP forward

- Add e2e tests

- Maybe add a BeforeSuite() check to confirm that sctp.ko is
not loaded on any node when the tests start, and an
AfterSuite() check to confirm that it's still not loaded
afterward. That could help us to notice if any future code
change breaks the "don't open SCTP sockets" rule. (Because
any test that needed to open an SCTP socket would either
fail [if sctp.ko wasn't available on the test host] or
trigger the AfterSuite() check [if it was].)

- Figure out and document the Windows SCTP status

- Figure out if we want to make SCTP support be
disabled-by-default even after the feature goes GA (to better
protect clusters in the edge cases even when the admin hasn't
taken any other steps to protect them).

- Figure out if we should do anything about the docker behavior
(maybe just documenting it).

Tim Hockin

unread,
Jul 9, 2019, 7:56:25 PM7/9/19
to Dan Winship, kubernetes-...@googlegroups.com
A+ would read again.

SGTM
> --
> You received this message because you are subscribed to the Google Groups "kubernetes-sig-network" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-ne...@googlegroups.com.
> To post to this group, send email to kubernetes-...@googlegroups.com.
> Visit this group at https://groups.google.com/group/kubernetes-sig-network.
> To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-network/1242fc95-3039-7e97-76f0-39def1ac2609%40redhat.com.
> For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages