he example uses the syscall pidfd_getpid [5] and this allows duplicating a fd in the calling process. With this, we can connect 2 processes running in 2 different containers with a UNIX socket. The socket doesn't need to be present in the container of thecallingunprivileged process as it is able to retrieve the fd with the syscall.
Hi Alice,I'm curious about the general problem of cross-namespace UNIX domain sockets, which also comes up with vhost-user devices in Kubernetes.
If QEMU had a listen socket for pr-helper to connect to (inverting the direction of the socket connection), then could the proxy process be eliminated? QEMU listens and the privileged pr-helper container temporarily enters the mount namespace and connects to QEMU's UNIX domain socket. Now QEMU and pr-helper are connected without a proxy.
--
You received this message because you are subscribed to the Google Groups "kubevirt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubevirt-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubevirt-dev/bc09f80a-1062-407c-b68c-aaac0c7f17ban%40googlegroups.com.
Hi Alice,Some time back I was thinking about a similar problem for vhost-user connections and containers, i.e. how to connect DPDK applications from the inside of a container with OVS on the host or between each other. One idea I had is to create a separate daemon called 'socker-pair broker' that can create socket pairs and give them to other processes, so all containers only need to have a single main UNIX socket mounted and can talk with a service and negotiate creation of socket pairs for different needs.I have a reference implementation of a protocol [1], daemon and a client library [2]. It doesn't use any cutting edge system calls, so can be used pretty much anywhere.The code was never used in production environment, so there might be some issues and some permissions handling should be added, but it should be enough for a controlled environment.The work diagram is here: https://github.com/igsilya/one-socket/blob/main/doc/socketpair-broker.rstNot sure if it is suitable for your use case, just sharing what I have, in case it might be interesting.
Hi Ilya,I have a couple of questions on this.On Thu, Oct 13, 2022 at 10:25 AM Ilya Maximets <imax...@redhat.com> wrote:Hi Alice,Some time back I was thinking about a similar problem for vhost-user connections and containers, i.e. how to connect DPDK applications from the inside of a container with OVS on the host or between each other. One idea I had is to create a separate daemon called 'socker-pair broker' that can create socket pairs and give them to other processes, so all containers only need to have a single main UNIX socket mounted and can talk with a service and negotiate creation of socket pairs for different needs.I have a reference implementation of a protocol [1], daemon and a client library [2]. It doesn't use any cutting edge system calls, so can be used pretty much anywhere.The code was never used in production environment, so there might be some issues and some permissions handling should be added, but it should be enough for a controlled environment.The work diagram is here: https://github.com/igsilya/one-socket/blob/main/doc/socketpair-broker.rstNot sure if it is suitable for your use case, just sharing what I have, in case it might be interesting.How does the listening server connect to the broker daemon? I understand the client par but I'm missing the server in the picture. Additionally, this would require changes in the pr-helper right?
Secondly, where would you deploy this broker daemon? Is this a privileged component? I'm not sure if the unprivileged client could just connect to a privileged component. I see this as a bit problematic. Maybe can you expand a bit on this?
Does the broker socket need to be mounted inside the container? We'd like to avoid the need of mounting a socket in the virt-launcher pod for the reasons I have described in the first email of this thread. This is also the main reason for the pidfd_getfd syscall.
--Many thanks,Alice
You received this message because you are subscribed to a topic in the Google Groups "kubevirt-dev" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/kubevirt-dev/AmjvVqlD1Hs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to kubevirt-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubevirt-dev/CABBoX7OmO2edgaTzqq3Ai_cTZARtDdrnFbuQ91qFtwRtSVBAUA%40mail.gmail.com.
Hi Stefan,On Thu, Oct 13, 2022 at 10:25 AM Stefan Hajnoczi <shaj...@redhat.com> wrote:Hi Alice,I'm curious about the general problem of cross-namespace UNIX domain sockets, which also comes up with vhost-user devices in Kubernetes.I still need to carefully read previous Ilya's email and work on vhost-user.If QEMU had a listen socket for pr-helper to connect to (inverting the direction of the socket connection), then could the proxy process be eliminated? QEMU listens and the privileged pr-helper container temporarily enters the mount namespace and connects to QEMU's UNIX domain socket. Now QEMU and pr-helper are connected without a proxy.Yes, the direction is the actual issue here. Having a listening socket in the unprivileged container and connecting from the privileged container isn't an issue. It is even the way how virt-handler (privileged) establishes a communication channel with virt-launcher (unprivileged).Unfortunately, in my case, the direction is the opposite QEMU needs to connect to the pr-helper.
Stefan
--
You received this message because you are subscribed to the Google Groups "kubevirt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubevirt-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubevirt-dev/103855fc-6fb6-4bad-9174-2eaaa76f0809n%40googlegroups.com.
On Thu, Oct 13, 2022 at 12:07 PM Alice Frosi <afr...@redhat.com> wrote:Hi Ilya,I have a couple of questions on this.On Thu, Oct 13, 2022 at 10:25 AM Ilya Maximets <imax...@redhat.com> wrote:Hi Alice,Some time back I was thinking about a similar problem for vhost-user connections and containers, i.e. how to connect DPDK applications from the inside of a container with OVS on the host or between each other. One idea I had is to create a separate daemon called 'socker-pair broker' that can create socket pairs and give them to other processes, so all containers only need to have a single main UNIX socket mounted and can talk with a service and negotiate creation of socket pairs for different needs.I have a reference implementation of a protocol [1], daemon and a client library [2]. It doesn't use any cutting edge system calls, so can be used pretty much anywhere.The code was never used in production environment, so there might be some issues and some permissions handling should be added, but it should be enough for a controlled environment.The work diagram is here: https://github.com/igsilya/one-socket/blob/main/doc/socketpair-broker.rstNot sure if it is suitable for your use case, just sharing what I have, in case it might be interesting.How does the listening server connect to the broker daemon? I understand the client par but I'm missing the server in the picture. Additionally, this would require changes in the pr-helper right?For the broker daemon both the server and a client are just clients. Two clients connect to a broker, the broker gives them a socket pair, both clients disconnect from the broker and communicate directly between each other over the socket pair they now have. Connection is already established and there is no need to listen() or connect(), just send()/recv() right away.This is more optimized for 1:1 communications. For a 1:N topology, one of the broker clients (a.k.a. server) may request a new connection from the broker right after receiving one for a previous client. Broker holds client connections, so we will not miss them. This case can also be optimized by not closing server-to-broker connection, so broker can supply socket pairs to the 'server' with new and new clients. I don't remember if that is implemented in one-socket, but it should not be hard to add support for that case.
I'm not familiar enough with kubevirt, but I would say that pr-helper will need to be able to talk with a broker, so it needs to understand the workflow, so some changes will be needed to add support.
Secondly, where would you deploy this broker daemon? Is this a privileged component? I'm not sure if the unprivileged client could just connect to a privileged component. I see this as a bit problematic. Maybe can you expand a bit on this?It's not a privileged component, all it does is listen()/socketpair()/send(), fairly basic socket operations. Clients just need a way to connect to it.It can run as a systemd daemon on the host (possibly socket-activated) or anywhere else from where you can get the one socket file out.
Does the broker socket need to be mounted inside the container? We'd like to avoid the need of mounting a socket in the virt-launcher pod for the reasons I have described in the first email of this thread. This is also the main reason for the pidfd_getfd syscall.Yes, you need to mount the broker socket to containers, so it might not be an option for you. The point is that it is one and only socket that needs to be mounted. It will be the same socket file for every container and it is not really service-specific, it's a very generic service.IMHO, the world would be a much simpler place if some one-socket-like service is available system-wide in kubernetes and containers were able to request it as a resource. I think this can be achieved by creating a kubernetes device plugin.
On Thu, Oct 13, 2022 at 1:41 PM Ilya Maximets <i.max...@redhat.com> wrote:On Thu, Oct 13, 2022 at 12:07 PM Alice Frosi <afr...@redhat.com> wrote:Hi Ilya,I have a couple of questions on this.On Thu, Oct 13, 2022 at 10:25 AM Ilya Maximets <imax...@redhat.com> wrote:Hi Alice,Some time back I was thinking about a similar problem for vhost-user connections and containers, i.e. how to connect DPDK applications from the inside of a container with OVS on the host or between each other. One idea I had is to create a separate daemon called 'socker-pair broker' that can create socket pairs and give them to other processes, so all containers only need to have a single main UNIX socket mounted and can talk with a service and negotiate creation of socket pairs for different needs.I have a reference implementation of a protocol [1], daemon and a client library [2]. It doesn't use any cutting edge system calls, so can be used pretty much anywhere.The code was never used in production environment, so there might be some issues and some permissions handling should be added, but it should be enough for a controlled environment.The work diagram is here: https://github.com/igsilya/one-socket/blob/main/doc/socketpair-broker.rstNot sure if it is suitable for your use case, just sharing what I have, in case it might be interesting.How does the listening server connect to the broker daemon? I understand the client par but I'm missing the server in the picture. Additionally, this would require changes in the pr-helper right?For the broker daemon both the server and a client are just clients. Two clients connect to a broker, the broker gives them a socket pair, both clients disconnect from the broker and communicate directly between each other over the socket pair they now have. Connection is already established and there is no need to listen() or connect(), just send()/recv() right away.This is more optimized for 1:1 communications. For a 1:N topology, one of the broker clients (a.k.a. server) may request a new connection from the broker right after receiving one for a previous client. Broker holds client connections, so we will not miss them. This case can also be optimized by not closing server-to-broker connection, so broker can supply socket pairs to the 'server' with new and new clients. I don't remember if that is implemented in one-socket, but it should not be hard to add support for that case.Thanks!I'm not familiar enough with kubevirt, but I would say that pr-helper will need to be able to talk with a broker, so it needs to understand the workflow, so some changes will be needed to add support.This isn't really related to KubeVirt but to QEMU.Secondly, where would you deploy this broker daemon? Is this a privileged component? I'm not sure if the unprivileged client could just connect to a privileged component. I see this as a bit problematic. Maybe can you expand a bit on this?It's not a privileged component, all it does is listen()/socketpair()/send(), fairly basic socket operations. Clients just need a way to connect to it.It can run as a systemd daemon on the host (possibly socket-activated) or anywhere else from where you can get the one socket file out.Yes, I understand this. But we could have privileged and unprivileged components that want to communicate. AFAIU, the broker doesn't make this difference and in this case, it is the broker that manages the connection. For example, unwanted components might manage to connect to privileged containers.In my example, the model is a bit different because the connection is established by the privileged component.Or do I miss something here?
Does the broker socket need to be mounted inside the container? We'd like to avoid the need of mounting a socket in the virt-launcher pod for the reasons I have described in the first email of this thread. This is also the main reason for the pidfd_getfd syscall.Yes, you need to mount the broker socket to containers, so it might not be an option for you. The point is that it is one and only socket that needs to be mounted. It will be the same socket file for every container and it is not really service-specific, it's a very generic service.IMHO, the world would be a much simpler place if some one-socket-like service is available system-wide in kubernetes and containers were able to request it as a resource. I think this can be achieved by creating a kubernetes device plugin.Well, the mount of the socket is problematic because the bind mount isn't transparent to k8s as it is done by kubevirt. If the unmount of the socket isn't done properly on the by kubevirt, kubernetes cannot clean up and umount the container filesystem. Otherwise, I could simply mount the pr-helper socket and QEMU could directly connect to it. This is how I implemented the first poc.What will be in your case the resource managed by the kubernetes device plugin? I think you are talking about the resource exposed through vhost-user here, not the socket, right?
--Alice
You received this message because you are subscribed to a topic in the Google Groups "kubevirt-dev" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/kubevirt-dev/AmjvVqlD1Hs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to kubevirt-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubevirt-dev/CABBoX7MeLpk%2BQy785q-z%2B6-B3J1L4Yi5E_2uUvftzXEDeKTB%2Bg%40mail.gmail.com.