Bpf_l4_csum_replace

7 views
Skip to first unread message

Victorio Galindo

unread,
Jul 25, 2024, 5:24:57 AM7/25/24
to ghetonitad

LinkedIn and 3rd parties use essential and non-essential cookies to provide, secure, analyze and improve our Services, and to show you relevant ads (including professional and job ads) on and off LinkedIn. Learn more in our Cookie Policy.

bpf_l4_csum_replace


Download --->>> https://shoxet.com/2zNIeu



The adoption of eBPF (Extended Berkeley Packet Filter) has revolutionized high-performance applications, tracing, security, and packet filtering within the Linux kernel. Specifically, TC-BPF, a type of eBPF program attached to the Traffic Control (TC) layer, has emerged as a powerful tool for packet manipulation in both ingress and egress. This blog delves into the practical application of TC-BPF to redirect DNS queries in a Docker environment.

The intricate workings of Docker networking involve network namespaces, veth pairs, and a bridge interface, all contributing to the isolation of containers. Docker employs its own DNS server, and understanding how it intercepts DNS queries through iptables is crucial for our redirection strategy.

The TC-BPF program, attached to both loopback and eth0 interfaces, plays a pivotal role in this redirection process. We explore the use of the bpf_redirect_neigh helper function for efficient egress redirection, ensuring correct MAC addresses for the new route. The blog provides a step-by-step guide, including setting up the environment, writing TC-BPF programs, and deploying them using the tc tool.

The TC-BPF programs inspect DNS packets, redirecting them based on protocol and destination. UDP packets destined for Docker's DNS resolver trigger a redirection to an external DNS server. The programs intelligently modify packet headers, correcting checksums for a seamless redirection experience. Wireshark captures validate the successful redirection, showcasing the power and flexibility of TC-BPF in a Dockerized network.

This blog not only serves as a practical guide for implementing DNS redirection but also offers insights into the versatility of TC-BPF in enhancing network control within containerized environments.

Docker uses network namespaces for network isolation. Inside each container namespace. We have a loopback interface and a eth0 interface. The eth0 interface is connected to the host network namespace with a veth pair. The veth interface of each container is connected together using a bridge interface. This interface is connected to the main host's interface for outgoing traffic using a NAT.

The resolvectl configuration in every docker container is 127.0.0.11:53 as the DNS server. The docker daemon runs its own DNS server that is exposed on all containers. As an example, let's say it is exposed on 127.0.0.11:41552. Now, docker configures iptables SNAT and DNAT rules to map port 53 to the port docker daemon is exposed in.

The application first makes a query to the DNS server specified in the resolvectl config. The DNS query made to 127.0.0.11:53 is intercepted by iptables, and the port is changed to reflect the actual DNS server port. Docker daemon gets the query and resolves it if it is container name/id. If not, then it is an external query so it uses the DNS resolver of the host to resolve the query and then sends a reply back to the application. Just like before, iptables is set up to intercept all traffic from the dockerd DNS resolver. It changes the source port of the reply to 53.

TC-BPF has access to a bpf helper function called bpf_redirect. This can be used to redirect the packet to the TC hookpoint in both ingress and egress of any other interface. However, if redirecting to egress of a different interface, we must change the MAC addresses in the packets so they are correct for the new route. There is an egress side optimization with the bpf_redirect_neigh helper function. This fills the L2 layer based on the L3 Layer IP addresses using the routing table of the kernel. There also is a bpf_redirect_peer helper, which can do ingress-to-ingress redirecting but across namespaces. But since we won't need to do any redirecting across namespaces, we will not use this.

The dotted lines show the flow of the packet when redirected. First, the DNS query enters the loopback interface and then redirects it to the egress of the eth0 interface after changing the destination IP. Now, we must also redirect the reply that comes to the eth0 interface back to loopback after changing the IP address.

We need to perform some basic checks on the packet to make sure that the packet size is correct so we don't access memory that is out of bounds. The eBPF verifier will reject the program if we don't do this.

We can now load the IP address and ports in the packet for modification. We keep a version of the original fields in l3_original_fields and l4_original_fields since we need these to compare with to change the checksum.

Now, we have to correct the checksum in the IP header and UDP header. We use bpf_csum_diff to find the difference between old and new fields in the IP header. This is then used in bpf_l3_csum_replace to change the IP header checksum

For UDP header checksum it is calculated using a pseudo-header, which includes not only the UDP source and destination ports but also the source and destination IP as well. Thus, for this, we can use the chaining feature of bpf_csum_diff to find the total difference by passing in l3sum as one of the parameters. We finally change the checksum using bpf_l4_csum_replace.

We see this since Wireshark captures packets after the tc hook. Since we are attaching the tc-bpf program on the ingress of loopback, Wireshark is able to capture the query from 127.0.0.1 -> 127.0.0.11. This would not have been possible if we had attached the TC program in the egress of the interface. The packet would be redirected before Wireshark can capture it. Similarly, in the reply, we are redirecting to the egress hook of the loopback interface which is before the wireshark hook. Hence, it can capture the 127.0.0.11 -> 127.0.0.1 packet. Again if we used bpf_redirect to redirect packets directly to the ingress of the loopback interface, we would not see this packet captured.

We were able to successfully redirect traffic using tc-bpf in docker containers. Tc-bpf is used in many applications to elevate existing networking or as a replacement for iptables itself in Kubernetes CNIs like Cillium. Redirecting DNS queries is also an important requirement in Keploy, where we use eBPF to redirect DNS queries made only by the application. There is a whole host of use cases for tc-bpf in intelligent networking, with its full potential only beginning to surface now.

TC-BPF, is a type of eBPF program attached to the Traffic Control layer in the kernel. In Docker environments, it plays a crucial role in packet manipulation for both ingress and egress traffic. It can be used to redirect packets efficiently, inspect and modify packet headers, and control network traffic within containerized environments. TC-BPF enables advanced network control and manipulation, making it ideal for tasks such as DNS redirection in Docker.

Docker utilizes its own DNS server, typically exposed on 127.0.0.11:53 within container namespaces. Understanding Docker's DNS handling is crucial for DNS redirection strategies because Docker intercepts DNS queries through iptables, mapping port 53 to the port where the Docker daemon's DNS server is exposed.

  1. Ensuring compatibility with Docker networking configurations and iptables rules.
  2. Handling packet checksums and header modifications correctly to avoid packet corruption or rejection.
  3. Managing namespace isolation and ensuring that redirection occurs within the appropriate network context.
  4. Testing and validating the redirection process to ensure correct functionality and performance.
  5. Considering security implications and potential impact on network stability when deploying TC-BPF program based solutions in production environments.

4a15465005
Reply all
Reply to author
Forward
0 new messages