Merging pipework into docker

1,016 views
Skip to first unread message

Solomon Hykes

unread,
Dec 1, 2013, 8:03:56 PM12/1/13
to docke...@googlegroups.com, docker-dev
Over the months pipework has become an unofficial catchall for networking hacks not officially supported by docker. Michael and Jerome have mentioned it might be time to look at merging some of it in. I agree.

A good start would be to list the different features people use pipework for, and identify those we want to  merge. And if not, what should be merged instead.

Calum LaCroix

unread,
Dec 1, 2013, 9:34:27 PM12/1/13
to Solomon Hykes, docker-dev, docker-user

Hi Solomon,

Awesome! I saw a discussion about this on GitHub earlier today - I think integrating Pipework would be a great opportunity to refactor the networking side a bit. Currently it very much assumes one IPv4 address with port forwarding which offers a pretty limited range of use cases. Yesterday I hacked up IPv6 support (https://github.com/dotcloud/docker/pull/2974) but it's very much embedded in the current framework - it works well, since IPv6 doesn't require NAT or anything particularly fancy but still shares the limitations of the original code (docker0 bridge, single addresses).

Personally, I use Docker for Orchestrator (https://github.com/cvlc/orchestrator) and on a few of my development servers, almost exclusively with pipework. 

It'd be great to throw around some new network architecture ideas - starting with user-customizable IP and IPv6 addresses and ranges (with multiple addresses of each possible), the ability to choose between DHCP/static custom addresses/automatically generated local networks and configurable target bridge (brctl/OpenVZ? NATed with exposed ports or NATed by IP, or even routed according to predefined IP-MAC pairs in an address pool). Most of these features are already available through pipework and those that aren't shouldn't require any more refactoring than would be required to get Pipework integrated. Obviously there's a line between customizability and the ability to support users and build stable software but I think a bottom-up redesign could better fit that balance than the current code and make everything simpler/more maintainable in the future.

Regards,

Calum

Solomon Hykes

unread,
Dec 1, 2013, 11:44:44 PM12/1/13
to Calum LaCroix, docker-dev, docker-user
I absolutely agree, now that we've made progress refactoring storage drivers we can look into networking. 

This goes hand in hand with refactoring lxc into an execution driver. Since lxc currently spans both execution and network configuration, we need to break it up into 2 distinct drivers, so that you can keep the default lxc sandboxing without being stuck with its default bridging configuration.

One thing that worries me is the overlap between some pipework hacks and current docker features. For example service discovery. In your case, may I ask how you configure your containers to connect to each other?

James Turnbull

unread,
Dec 2, 2013, 12:40:04 AM12/2/13
to Solomon Hykes, docker-dev, docker-user
Solomon Hykes wrote:
> I absolutely agree, now that we've made progress refactoring storage
> drivers we can look into networking.
>

In this process though I'd also want to make sure we don't sacrifice
ease of use for functionality. Whilst the current setup has some
limitations, especially at scale, it's incredibly easy too. Basic
container networking requires the user to know zero about Docker
networking except how to find the IP and the relevant ports. That's
awesome magic we should keep.

Cheers

James



--
* The Docker Book (http://dockerbook.com)
* The LogStash Book (http://logstashbook.com)
* Pro Puppet (http://tinyurl.com/ppuppet)
* Pro Linux System Administration (http://tinyurl.com/linuxadmin)
* Pro Nagios 2.0 (http://tinyurl.com/pronagios)
* Hardening Linux (http://tinyurl.com/hardeninglinux)

kiorky

unread,
Dec 2, 2013, 3:23:05 AM12/2/13
to Solomon Hykes, Calum LaCroix, docker-dev, docker-user, Jean-Philippe Camguilhem, r...@makina-corpus.com, Guillaume Cheramy
Hi,

Happy to hear that networking will get some work.
Toward container IP allocation, i would love to have a simili static or predictable ip allocating scheme.
    - At least, something that won't change when allocated
    - Or at best something that i can fix myself !

Indeed, something prior to 0.6.6, i was used that
when a container was newly fired, it gets an IP, and at each reboot it got the same IP again.
Since
, i'm bitten by https://github.com/dotcloud/docker/issues/2801, now docker IP changes ip of my containers at each restart.

So, i just wanted to point out that lot of iptables frontends (like shorewall) don't let that much customization around, and then problem arises when docker & the firewall fight to setup their own networking rules.
So, the most simple is to do this kind of setup on the firewall end.
In the shorewall case, i found two solutions to handle docker with shorewall:
    - Do no use docker ports and do the NAT(pat) stuff to the container ip myself, i found it to be the most easiest, and was flawless until the ip of containers changed at each restart
    - Use docker ports, and do some shorewall scripting to restart it via cron, do restart wrappers & co to
readd the DOCKER chain, save/reload the DOCKER nat chain, this is far for flaws & bugs...
A third would imply to write a perl shorewall plugin to hook it to save/restore the DOCKER chain when it reloads, this is not simple to do.
-- 
Cordialement,
KiOrKY
GPG Key FingerPrint: 0x1A1194B7681112AF
Pensez à l’environnement. 
N’imprimez ce courriel que si vous en avez vraiment besoin.
signature.asc

Calum LaCroix

unread,
Dec 2, 2013, 12:36:01 PM12/2/13
to Solomon Hykes, docker-dev, docker-user
Hi Solomon,

You asked how I configure my containers:

On my dev servers, I use a modified version of pipework very similar to the one in my own fork (https://github.com/cvlc/pipework/blob/master/pipework) to create macvlan subinterfaces for each container. These containers are assigned static MAC addresses, IPs and routes from a proxy script that also includes custom iptables and ip6tables configurations (note that I need to apply my custom MAC address to the subinterface and not within the container's namespace for my ISP to recognize the traffic as originating from the assigned static MAC address). For IPv6, I need to use a static DUID so I have pipework automatically launch dhclient in the namespace. I don't take advantage of the new Docker features such as linking quite yet, instead using the containers almost as classical VMs with all the var/data stored on the host and exposed via volumes. The primary benefits of docker+pipework for those services are the image management/git-like utilities and resulting easy and fast automated failover with low resource usage compared to layer-1 virtualization. I choose LXC over OpenVZ since it's provided by default in the kernel and Docker over pure-LXC for it's ease of configuration and the aforementioned git-like features. I rely on network defences, common sense and grsecurity for security at the moment, but I've been considering applying SELinux MAC profiles (no idea how well they work with Docker or whether current profiles exist or I'd need to make my own) to constrain the container's root account permissions to cover the well-known holes that exist in LXC at the moment.

Orchestrator currently uses the above fork of pipework as-is to launch a container for each DHCPv6 client that connects to the configured 'client' network. The container is then assigned it's own stateful IPv6 address by dhclient, launched in the container's namespace. When persistence/container group support is completed, I expect to have both IP6Tables support to limit communication between unrelated containers/clients and the MAC of any previously launched containers stored by both the DHCPv6 server and orchestrator, to be automatically re-assigned when previously stopped containers are restarted. In this case, I expect that I will want the static MAC inside the guest's namespace (opposed to the macvlan subinterface configuration above). 

James, the current configuration does work well for a wide number of configurations and is a safe default - but it could be improved even more! One example would be to *require* exposure of ports, even within a local network. This would encourage careful configuration of containers on the part of the users in addition to promoting a more containerized mindset (as opposed to the 'VM' mindset, that I am guilty of myself in the first use case :)). This would be relatively easy to implement - simply apply a stateful firewall to the container namespace's netfilter/IPTables with default of 'DROP' for incoming and forwarded traffic. When the -p 8321 option is added to expose port 8321, the function would only implicitly add an ACCEPT rule by default, with the NAT provided when the full-form -p 8321:8321 is used. The behaviour of NATing a random external port would be provided by something like '-p any:8321'. This is only a very small change from the defaults and greatly enhances security where users may have enabled ip forwarding for Docker, especially those who may have done so without fully realizing the implications and are in semi-trusted networks with limited client-side protection.

kiorky: Compatibility with other applications that add/remove IPTables rules is very important and the reason that I think working with IPTables rules should be constrained to the appropriate container namespace when at all possible. With shorewall and the various issues you've had with IP changes, it sounds like you'd find use in a lower-level, more granular configuration similar to that offered by pipework. In this case, I'd imagine that switches to disable NAT and automatic installation of IPTables rules on the docker daemon would help most, alongside more extensive test methods for the IP allocation side of the network code. Alternatively, were you to have the option to explicitly specify a desired IP+subnet both when starting the Docker daemon and any containers, you could script up a sequential IP allocator yourself.

Danny: I'm not so sure that Docker should get involved with external interfaces to apply SNAT - this definitely feels like something an administrator would want to vet and apply themselves. That said, it would be nice to have the ability to add hooks to the daemon for container creation tasks that could create custom rules in IPTables. I'm not entirely sure what the best way to go about that would be - the obvious choice is to use configuration files but these are notably absent from Docker in general. Having switches could very quickly turn the docker -d command into a mess, possibly environment variables like IPTABLES_PRE and IPTABLES_POST for pre/post creation with some syntax verification? Something like 'export IPTABLES_PRE=-t nat -A POSTROUTING -o eno1 -p tcp -s @CONTAINER_IP -j SNAT --to-source 87.155.54.12'? This would solve your race condition, as long as the hook was triggered early enough during container creation. 

Regards,

Calum


Calum LaCroix

unread,
Dec 2, 2013, 2:09:30 PM12/2/13
to Danny Yates, docker-user, Solomon Hykes, docker-dev
Hey Danny,

Yes, I get you now! Having the option to choose between macvlan subinterfaces and the current bridge topology would be great. Even something as simple as -n=false, -n=bridge:docker0, -n=macvlan:eno1 when starting docker -d could provide that. Another feature that would help is the ability to bring in additional applications or configuration after namespace setup but before the container is fully initialized. 

Considering this, maybe it would be better to add several hooks throughout the start and stop process - starting with, for example, $PRE_NS and $POST_NS; these could call scripts or commands specified directly before namespace creation (natively) and directly after (through netns). This feature would also provide the $PRE_IPTABLES and $POST_IPTABLES mentioned earlier. 

Any other ideas of 'checkpoints' that could be used for PRE_ and POST_ hooks? This seems like it could be an interesting way to add 'power user' extensibility without impacting ease of use or default use cases.


On 2 December 2013 17:57, <da...@codeaholics.org> wrote:
Hey Calum,

There's a lot to think about in there. Thanks for sharing.

On the SNAT thing: we use iptables to link our MACVLAN interface with the Docker-provided (172.17.0.0) address rather than using pipework because having two interface inside the namespace was causing issues with some of our stuff. However, doing that made all outbound traffic appear to come from the host IP address, just as it does in "vanilla" Docker setup. Hence we added SNAT to rewrite the outbound IP source address to be that of the MACVLAN interface so that traffic originates from the IP of the container.

So my ideal setup would be not to use SNAT at all, but just to have the equivalent of pipework merged into Docker - a correctly configured (veth) interface inside the namespace such that all inbound and outbound traffic is correctly addressed, BUT as the sole interface in the namespace (i.e. not the current two-interface situation that pipework sets up) AND set up at the right point in the lifecycle that it's ready to go as soon as the containerised process starts, not some number of millis later.

Make sense?

D.

Kevin Wallace

unread,
Dec 2, 2013, 9:40:14 PM12/2/13
to Calum LaCroix, Danny Yates, docker-user, Solomon Hykes, docker-dev
(Re-sending, now that I’m properly subscribed to docker-dev. Sorry
for the duplicate message to anyone originally CC’d)

I would love to be able to create multiple guest macvlan interfaces,
each attached to a different host interface. Right now I’m using a
modified pipework (https://github.com/jpetazzo/pipework/pull/8) to do
this.
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "docker-dev" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/docker-dev/mA6FPWud8kk/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> docker-dev+...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

kiorky

unread,
Dec 3, 2013, 3:24:02 AM12/3/13
to Calum LaCroix, Solomon Hykes, docker-dev, docker-user
Hi Calum,

I am not sure to follow you at 100% but if it seems to sounds like one of the 2 solutions i proprosed:

A way to configure docker ip allocation myself, and yes in that case the allocation and NAT stuff is up to me and i'm fine with that. docker would just setup on its startup the bridge interface.

On the user side, the only thing we would have to do would certainly to fix the ip/netmask of the containers which
docker applies (as today via lxcconfig) on the network interfaces.

Today, this part can not be touched by default afaik, and we can hack around with pipework.

I would be happy to have such a feature that would unblock me !


signature.asc
Reply all
Reply to author
Forward
0 new messages