On Sun, Oct 23, 2016 at 03:15:31PM -0700, Yaroslav Molochko wrote:
> I’m really happy that this topic raised some attention. I can’t open all the
> cards, but as I’ve said previously we are dealing with VPN and anonymity in
> the internet. What this means - we must make sure none of user information is
> exposed outside, otherwise in some countries person can be even beheaded. And
> obviously, countries like this, try to block out our solution. So, each node
> has about 100+ outgoing IPs, which are dynamic and managed by our security
> officers, they check banned and compromised IPs and change them on the hosts
> dynamically. We must also make sure that source IP of the client is arrived
> without changes. We also have pretty strict delay metrics which we should
> follow, otherwise techniques such as delay measurement may be used to identify
> user, more info here:
http://dimacs.rutgers.edu/Workshops/Anonymous/parv.pdf
> <
http://dimacs.rutgers.edu/Workshops/Anonymous/parv.pdf>
Oh, cool use case :)
> Answering your questions:
>
> > On Oct 23, 2016, at 1:16 PM, Rodrigo Campos <
rod...@sdfg.com.ar> wrote:
> >
> > On Sun, Oct 23, 2016 at 12:20:08PM -0700, Yaroslav Molochko wrote:
> >> About host networking
> >> We deal with some sort of VPN solution, that is why client source address is a must as well as extra delay affects user experience under heavy load.
> >
> > Which client source address? Which is the client and which one is the server in
> > the case you want to preserve the "source address”?
>
> In our case it is both. We must make sure we identified user and there is no
> MitM involved, as well as we must make sure we set correct outgoing IP for
> that particular user.
Oh, okay. I see
> >> We did an experiment under docker-compose and it is working way better with host networking for us.
> >
> > I don't know what docker compose does, but it might not be comparable.
> >
> > In kubernetes you have plenty of options for networking, and can use AWS vpc in
> > AWS or the private google cloud network, with kubernetes on those cloud
> > providers at least. And there are tons of other options also (liek weave, etc.
> > and those may affect performance, I don't know)
> >
>
> Even though AWS and GCP are two amazing cloud providers, and we would love to be able to migrate there, we can’t do this because of several reasons:
> 1. They charge a lot for traffic
> 2. They are not available in all the countries we try to be present.
Sure, just saying that there are options and probably the docker-compose
comparission isn't fair. Just that.
>
> > Also, coreos uses flannel and I doubt you will see an impact with that either.
> >
> > But in any case, I really doubt the docker-compose experiment implies
> > hostNetworking in kubernetes is faster.
> >
> >>
> >> As David Oppenheimer pointed out, there is nodePort predicate,
> >
> > nodePort? hostPort maybe?
>
> Excuse me for the mixing up this, you are right it is hostPort.
No problem, wasn't sure I was following :-D
>
> >
> >> which makes your and his solution with required CPU parameter working, at
> >> least in theory (did not try it yet). I’ve completely missed this nodePort
> >> predicate which made this solution less preferable, because of long
> >> convergency time, with the predicate - solution is completely valid.
> >
> > Not sure what you want to say with this. Can you please elaborate? Why is node
> > port important to you? Why not use the scheduler to spread the pods in different
> > VMs? And why do you have a hard requirement on that? Are you sure it's a hard
> > requirement?
>
> Sure, will be happy to clear things up for you. As I’ve mentioned, we have
> pretty strict delay requirements, which lead us to patching kernel with our
> modified network stack (4.8 kernel has part of our patches BTW), and a lot of
> our applications are communicating between each other through unix sockets
> because it is almost 50% faster in our case.
Aham...
> So, we have 1 application (lets
> call it IN) which is dealing with customer’s encrypted connections, and we
> place as many instances as we have CPUs cores. We also have outgoing gateway
> app (lets call it OUT) which is applying some anonymizing rules and sent
> traffic through some specific IP. So basically IN communicate with OUT through
> unix sockets, and as OUT is not so CPU demanded we run only one instance of
> it. But we can’t run as many instance as CPUs because OUT app is RAM demanding
> due to extended caching. So what I’ve thought of, I could run (CPUCores *
> replicaSets) = amount of replicas, with exposed port (one for each IN instance
> on the node) and create one replicaSet for OUT with hostPort which would
> prevent other instances to be placed over that node.
Oh, okay. Now I understand. It makes sense, although you will probably lose the
IP address of the client connecting, though. There are efforts to preserve that
(it's easy on HTTP, as you add it in a header now), but they are alpha I think
now.
Also, as there is an "Ingress" kind, it's been discussed an "Egress" too. You
can try to see the state or push it forward and makes sure it works for you ;)
But, to have a solution today, what you say makes sense. But I'm not sure how
you will communicate between the IN and OUT pods if they are different pods and
you need unix sockets and it is SO sensible to performance.
I would consider doing: 1 pod that has several containers, several IN containers
that each reserve the CPU usage you want and one OUT container (that also
reserves the mem usage you want). All in one pod.
This way, you can communicate via unix sockets using an emptyDir volume or
HostPath if that is more performant. Also, the OUT container may need to use
hostNetwork to do the outgoing IP thing you need.
And if a logical host consists of several IN and one OUT instances, then you
really want them all in the same pod. That what a pod tries to abstract, really.
Longer term, something like an egress kind might be a real win for this use
case, but it's not something available today.
Thanks a lot,
Rodrigo