Proposals: Optionally HTTP/TCP probe from pod's netns

156 views
Skip to first unread message

halfcrazy

unread,
May 5, 2021, 3:45:10 AM5/5/21
to kubernetes-sig-network

Proposals
Add the ability to perform HTTP/TCP probe from pod's network namespace. For Eg adds a bool field podnetns to probe spec.


Background
Currently, HTTP/TCP liveness and readness probe are raised by kubelet from the current node's default network namespace. This assumes the node can access the hosted pod's endpoint. 

Why we need it
For those pods who cannot access from another network namespace such as listening on `127.0.0.1:8080` it can only be accessed from the pod's network namespace otherwise you should use exec probe. Another scenario is VPC, we expose pods to out of the cluster with EIP and don't care about the accessibility from the host, also we want to use the built-in liveness probe mechanism.

antonio.o...@gmail.com

unread,
May 5, 2021, 4:15:51 AM5/5/21
to kubernetes-sig-network
Also discussed here:

* Pod readiness probe cannot be directed at specific IP family
* Pod probes lead to blind SSRF from the node

Should this the default and not opt-in? 


Side note, important golang limitation in order to implement it, if we want to move forward:
* net: Dial is not safe to use in namespaces with dual-stack enabled

halfcrazy

unread,
May 5, 2021, 6:53:27 AM5/5/21
to kubernetes-sig-network
Append
After reading the related issue, if we set this the default the network policy effect will be more straightforward. 
IMHO if this will not break the backward compatibility it's ok to change the behavior.

Antonio Ojea

unread,
May 12, 2021, 5:40:34 AM5/12/21
to halfcrazy, kubernetes-sig-network
hmm, I thought this was simpler, it seems that the exec probe uses the CRI-API to execute the command in the container

and the suggestion is to make it runtime agnostic


I can't see any other way to run this in the namespace than kubelet "entering" directly the pod namespace to execute the probe :/

--
You received this message because you are subscribed to the Google Groups "kubernetes-sig-network" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-ne...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-network/08b563c0-1c1b-4702-9e0d-4ba77beee79en%40googlegroups.com.

Dan Williams

unread,
May 13, 2021, 9:43:25 AM5/13/21
to Antonio Ojea, halfcrazy, kubernetes-sig-network
On Wed, 2021-05-12 at 11:39 +0200, Antonio Ojea wrote:
> hmm, I thought this was simpler, it seems that the exec probe uses
> the CRI-API to execute the command in the container
>
> https://github.com/kubernetes/cri-api/blob/708d0d76e582f969d2a1056b07eaf19109b76b9e/pkg/apis/runtime/v1/api.proto#L91-L92
>
> and the suggestion is to make it runtime agnostic
>
> https://github.com/kubernetes/kubernetes/issues/99425#issuecomment-839276282
>
> I can't see any other way to run this in the namespace than kubelet
> "entering" directly the pod namespace to execute the probe :/

Which means the runtime has to do it via CRI requests...

Dan

Antonio Ojea

unread,
May 13, 2021, 9:53:04 AM5/13/21
to Dan Williams, halfcrazy, kubernetes-sig-network
On Thu, 13 May 2021 at 15:43, Dan Williams <dc...@redhat.com> wrote:
On Wed, 2021-05-12 at 11:39 +0200, Antonio Ojea wrote:
> hmm, I thought this was simpler, it seems that the exec probe uses
> the CRI-API to execute the command in the container
>
> https://github.com/kubernetes/cri-api/blob/708d0d76e582f969d2a1056b07eaf19109b76b9e/pkg/apis/runtime/v1/api.proto#L91-L92
>
> and the suggestion is to make it runtime agnostic
>
> https://github.com/kubernetes/kubernetes/issues/99425#issuecomment-839276282
>
> I can't see any other way to run this in the namespace than kubelet
> "entering" directly the pod namespace to execute the probe :/

Which means the runtime has to do it via CRI requests...



Is that really an option? 

Dan Williams

unread,
May 13, 2021, 12:55:04 PM5/13/21
to Antonio Ojea, halfcrazy, kubernetes-sig-network
On Thu, 2021-05-13 at 15:52 +0200, Antonio Ojea wrote:
>
>
> On Thu, 13 May 2021 at 15:43, Dan Williams <dc...@redhat.com> wrote:
> > On Wed, 2021-05-12 at 11:39 +0200, Antonio Ojea wrote:
> > > hmm, I thought this was simpler, it seems that the exec probe
> > > uses
> > > the CRI-API to execute the command in the container
> > >
> > >
> > https://github.com/kubernetes/cri-api/blob/708d0d76e582f969d2a1056b07eaf19109b76b9e/pkg/apis/runtime/v1/api.proto#L91-L92
> > >
> > > and the suggestion is to make it runtime agnostic
> > >
> > >
> > https://github.com/kubernetes/kubernetes/issues/99425#issuecomment-839276282
> > >
> > > I can't see any other way to run this in the namespace than
> > > kubelet
> > > "entering" directly the pod namespace to execute the probe :/
> >
> > Which means the runtime has to do it via CRI requests...
> >
> >
>
>
>
> Is that really an option? 

Sure it's an option, it's just code :)

The 'exec' probers are already done by the CRI via ExecSync(). It's
just that kubelet handles the network ones because the Kube network
model defines that all pods are reachable from kubelet for exactly this
reason.

So you can imagine a new CRI call like Probe() that takes a probeType
(exec, TCP, HTTP) and some other data depending on the type, and the
runtime does it within the pod's namespace (net, filesystem, etc).

Dan

Tim Hockin

unread,
May 13, 2021, 8:45:07 PM5/13/21
to Dan Williams, Antonio Ojea, halfcrazy, kubernetes-sig-network
Namespaces and threads don't always play nicely together and Go and threads DEFINITELY do not.  To do this, kubelet would have to exec() a local binary which changed into the pod's netns.  Or probably more pedantically, kubelet makes a CRI call to the runtime to run a probe; the runtime execs a binary which changes into the netns, does the HTTP, and then returns the result (e.g. stdout); CRI returns the result to kubelet.

While this fixes some issues, it introduces new ones - old probes that used the node's localhost will be broken, for example.  I don't see how we could do it by default.

--
You received this message because you are subscribed to the Google Groups "kubernetes-sig-network" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-ne...@googlegroups.com.

John Belamaric

unread,
May 14, 2021, 12:40:15 AM5/14/21
to Tim Hockin, Dan Williams, Antonio Ojea, halfcrazy, kubernetes-sig-network
Hmm. If you open a socket in a namespace and hold it open IIRC you shouldn't have issues. So you switch the process to the namespace, open the socket and stash it away. Of course you have to mutex on opening the socket since namespace is at the process level. But I *think* that can work.

Dan Winship

unread,
May 17, 2021, 11:11:16 AM5/17/21
to Tim Hockin, Dan Williams, Antonio Ojea, halfcrazy, kubernetes-sig-network
On 5/13/21 8:44 PM, 'Tim Hockin' via kubernetes-sig-network wrote:
> Namespaces and threads don't always play nicely together and Go and
> threads DEFINITELY do not.

There used to be a problem that runtime.LockOSThread() would ensure that
_your_ goroutine didn't get moved to another thread, but it wouldn't
prevent _other_ goroutines from being moved to your thread if you were
blocked and they weren't. So if you changed the netns in one goroutine,
it could cause random code running in other goroutines to start using
the wrong netns. It was great fun. But that was fixed in golang 1.11 or
1.12 or so. "github.com/containernetworking/plugins/pkg/ns" has helper
functions for correctly doing things in other network namespaces and
they work.

-- Dan
> <mailto:hackz...@gmail.com>>
> > > wrote:
> > > > > Append
> > > > > After reading the related issue, if we set this the default the
> > > > > network policy effect will be more straightforward. 
> > > > > IMHO if this will not break the backward compatibility it's ok
> > > > > to
> > > > > change the behavior.
> > > > >
> > > > > 在2021年5月5日星期三 UTC+8 下午
> 4:15:51<antonio.o...@gmail.com <mailto:antonio.o...@gmail.com>> 写道:
> > > > > > Also discussed here:
> > > > > >
> > > > > > * Pod readiness probe cannot be directed at specific IP
> > > > > > family
> > > > > >
> > >
> https://github.com/kubernetes/kubernetes/issues/101324#issuecomment-831563048
> <https://github.com/kubernetes/kubernetes/issues/101324#issuecomment-831563048>
> > > > > > * Pod probes lead to blind SSRF from the node
> > > > > >
> > >
> https://github.com/kubernetes/kubernetes/issues/99425#issuecomment-829712715
> <https://github.com/kubernetes/kubernetes/issues/99425#issuecomment-829712715>
> > > > > >
> > > > > > Should this the default and not opt-in? 
> > > > > >
> > > > > >
> > > > > > Side note, important golang limitation in order to implement
> > > it,
> > > > > > if we want to move forward:
> > > > > > * net: Dial is not safe to use in namespaces with dual-stack
> > > > > > enabled
> > > > > > https://github.com/golang/go/issues/44922
> <https://github.com/golang/go/issues/44922>
> > > > > >
> > > > > > On Wednesday, 5 May 2021 at 09:45:10 UTC+2 halfcrazy wrote:
> > > > > > >
> > > > > > > Proposals
> > > > > > > Add the ability to perform HTTP/TCP probe from pod's
> > > > > > > network
> > > > > > > namespace. For Eg adds a bool field podnetns to probe spec.
> > > > > > >
> > > > > > >
> > >
> https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.21/#tcpsocketaction-v1-core
> <https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.21/#tcpsocketaction-v1-core>
> > > > > > >
> > >
> https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.21/#httpgetaction-v1-core
> <mailto:kubernetes-sig-network%2Bunsu...@googlegroups.com>.
> <https://groups.google.com/d/msgid/kubernetes-sig-network/70d061b175f047ad53fc9934127553cc534d4df8.camel%40redhat.com>.
>
> --
> You received this message because you are subscribed to the Google
> Groups "kubernetes-sig-network" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to kubernetes-sig-ne...@googlegroups.com
> <mailto:kubernetes-sig-ne...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/kubernetes-sig-network/CAO_Rewb%2Bo%3Dq77nhzwDj9doQzZ3Fri2xt1rHmpxU5c00mNmR-RQ%40mail.gmail.com
> <https://groups.google.com/d/msgid/kubernetes-sig-network/CAO_Rewb%2Bo%3Dq77nhzwDj9doQzZ3Fri2xt1rHmpxU5c00mNmR-RQ%40mail.gmail.com?utm_medium=email&utm_source=footer>.

Tim Hockin

unread,
May 17, 2021, 11:20:35 AM5/17/21
to Dan Winship, Dan Williams, Antonio Ojea, halfcrazy, kubernetes-sig-network
I stand corrected :)

Casey Callendrello

unread,
May 17, 2021, 11:48:38 AM5/17/21
to Dan Winship, Tim Hockin, Dan Williams, Antonio Ojea, halfcrazy, kubernetes-sig-network
On Mon, May 17, 2021 at 11:11 AM Dan Winship <dwin...@redhat.com> wrote:
On 5/13/21 8:44 PM, 'Tim Hockin' via kubernetes-sig-network wrote:
> Namespaces and threads don't always play nicely together and Go and
> threads DEFINITELY do not.

There used to be a problem that runtime.LockOSThread() would ensure that
_your_ goroutine didn't get moved to another thread, but it wouldn't
prevent _other_ goroutines from being moved to your thread if you were
blocked and they weren't.


While that little problem has gone away, there are others. For example, net.Dial() may spawn new goroutines as part of its implementation of Happy-Eyeballs[1]. This means it's still not really safe to dance around network namespaces within a single Go process.

But all of this is mostly moot, due to some CRI implementations using other isolation systems (e.g. virtualization). So this would have to be a part of the CRI API.

Personally, I feel the complexity is not worth it. If you want a probe that connects from inside the container, just add "curl" to your image and use an ExecProbe.

--Casey

Antonio Ojea

unread,
Jun 3, 2021, 12:41:33 PM6/3/21
to Casey Callendrello, Dan Winship, Tim Hockin, Dan Williams, halfcrazy, kubernetes-sig-network
Is it worth to explore the CRI API option?

Maybe in the long term it pays off, and since doesn't seem to be a problem now, we don't have to rush it 

antonio.o...@gmail.com

unread,
Jun 5, 2021, 5:37:35 AM6/5/21
to kubernetes-sig-network
I've opened a feature request https://github.com/kubernetes/kubernetes/issues/102613 to continue the discussion.
I've also added this topic for the next sig-network meeting, it is probable that some people from containerd and CRI-o attend so CRI implementations can provide their point of view.

khn...@gmail.com

unread,
Mar 1, 2022, 2:23:15 PM3/1/22
to kubernetes-sig-network
maybe a simpler option is to run a container inside the namespace and the container calls into local probes (same namespace) and pushes the data to node:kubelet-port? the container itself will be managed via the same lifecycle as other containers, and maybe we don't report it as pod.containers.
Reply all
Reply to author
Forward
0 new messages