Hi folks !--
So i as read through this whole thread. A few things that come to mind.... Somewhat unrelated -- The history of the KPNG project is documented in the notes.... here https://docs.google.com/document/d/1yW3AUp5rYDLYCAtZc6e4zeLbP5HPLXdvuEFeVESOTic/edit.... Theres alot of interesting history there and its interesting to read and look back on. Like the early debates on diffstore, grpc, and so on. Or the time mikael figured out that we needed t proto.MarshalOptions{Deterministic: true} to get the hashing of the endpoints fixed ... If anyone wants to know what its like to try to make a battle hardened kube proxy , thats a good starting point :)- Me and Amim shot nerf darts at ricardo during KPNGCON https://youtu.be/GT_p2mkbn2E?t=252 . That slowed the project down by at least a few months while ricardo recovered (the wounds were emotional, as well as physical)- I think Shane's recent post on K8s certification - if that was around at the time we were doing KPNG, there could have been a good alternative reality where we could have ended up as a "Certifiable" Kube proxy - that passed all 250 or so k8s sig-net tests + Conformance. Then the idea of in tree or out of tree would be irrelevant. Of course, the effort of making such a certiffication might not be worth it given that ... well... how many people are actually trying to rewrite the kube-proxy ? Im assuming its handfulls, but its not like theres going to be 100s of companies in that business. Most folks are happy with the stock in tree proxy.- The idea of making it "in-tree" to me was always confusing. I think the goal was always (in my mind) to have (like dan said) kinda our parallel mirror universe of sig-network and see how far we could go. The reality is - not having a large corporate sponsor made it virtually impossible for that universe to continue to co-exist forever. People came and went. Nobody was paid to work on KPNG. It was destined ultimately to lack in the consistency and polish that other initiatives would have in this area.
You received this message because you are subscribed to the Google Groups "kubernetes-sig-network" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-ne...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-network/CACVCA%3Dc8QTqSD8rny0cAVcY%3DV3TXm9JL-OmbUAVv_DmuyyQnmg%40mail.gmail.com.
Hi Mikael,
Thanks for coming back to this and sharing your perspective. I really appreciate the transparency about the emotional side of it, KPNG was a huge effort, and it's clear a lot of heart went into it. As a SIG lead, I want also to offer a perspective from the maintenance side to help put things into context.
I’m a big fan of hacking and new ideas; I always encourage people to start new projects. However, my primary commitment is the stability of the ecosystem. For those at the last SIG Network meeting, the fact that we had zero open bugs was a proof to the incredibly high standards we maintain for our in-tree codebase.
The main friction with KPNG started when the goal shifted toward moving the code in-tree. There is a common misconception that "in-tree" solves adoption or maintenance. In reality, it doesn't always attract more contributors; often, it just shifts the heavy lifting—CVEs, kernel regressions, dependency upgrades—onto the core maintainers who remain for the long haul. As Solomon Hykes famously said: 'Rule #1 of open-source: no is temporary, yes is forever.' We have to be very careful about what we say 'yes' to.
Regarding the flakiness: as someone who has developed a 'weird skill' for hunting regressions by fixing flakes, I've learned these small signals are almost always the tip of a very deep iceberg. I saw those same jobs running cleanly in other projects (kindnet, Cilium, etc.), which is why I pushed for that same level of evidence for KPNG. I just caught a perfect example this week: two regressions mdlayher/netlink#283 and mdlayher/netlink#280 library, found only because of GitHub Actions errors in the kube-network-policies repo.
To give you an idea of the "deep regressions" I'm talking about that often start as "just a flake":
IPv6 UDP regressions: Where packets larger than MTU returned EMSGSIZE instead of fragmenting (Issue #133361) (Kernel regression).
Netlink library breaks: Impacting Cilium, Calico, and OVN Github actions (Netlink PR #925) (golang netlink library regression).
Race conditions in net.InterfaceAddrs: Causing NodePort Services to become inaccessible (Issue #129146) (golang standard library bug).
I’m super happy to hear the ideas are continuing with knls. This out-of-tree project, allows you to move fast and experiment without the weight of millions of production clusters on your shoulders, and we can always have the conversation again later about the benefits for the project.
Let’s keep the conversation positive. To me the main goal of the retrospective is to learn how to better support innovation without compromising the core.
Best,
Antonio