Wifi ClusterHat setup tutorial (DHCP, DNS, Ansible, Docker, and hostapd)

1,470 views
Skip to first unread message

Robert Metcalf

unread,
Dec 30, 2016, 4:00:46 PM12/30/16
to ClusterHAT
Hi everyone.

I have just completed my Christmas project - setting up my cluster hat! It works completly via Wifi by having two interfaces. On one interface it is a client and connects to the internet and on the other it acts as a wifi access point. 

I have written up my notes in how I did this and posted them here: https://code2.metcarob.com/node/294

I would be very interested if anyone has any comments on what I have done in setting this up. Also I am sure some people may find the notes useful if they want to set up something similar.

I am a programmer this was the first time I tried anything like this so I am sure the experts can point out hundreds of things I have done wrong.

I have had a really fun few days with the cluster hat and I have learnt a tonne of stuff going through this! This is going to be an excellent base for me to go back to and learn more.

Happy new year all
Robert

Cid Felipe

unread,
Jan 26, 2017, 6:42:46 AM1/26/17
to ClusterHAT
Wow, I've been waiting for something like this for some time, I think I'll have "fun" for some time.

Rich Conescu

unread,
Feb 13, 2017, 8:33:55 PM2/13/17
to ClusterHAT
Hi Robert,

I have tried going through your tutorial for a few days now, and I noticed something peculiar. Not sure if it is because I am using the current GUI controller and lite images for the zeroes, but I can follow your tutorial up until I enable IP forwarding. Once I do that I can no longer access the Pi zeroes. If I do not do that, I can control the zeroes up until I install docker on the controller. 

At that point, I lose the ability to utilize DNS and "ssh p1" and have to use the hard coded ip addresses as "ssh 192.168.2.101" for example.

I then attempted to use squid proxy to bypass IP forwarding, and could not get that to work to save my life. So I went back and re-attempted to enable IP forwarding and now I can ping p1 through p4, but with no response. At least the system recognizes the DNS though.

Any suggestions would be greatly appreciated! And thank you again for the tutorial! 

Robert Metcalf

unread,
Feb 14, 2017, 4:04:58 AM2/14/17
to ClusterHAT
Hi Rich,
Glad you found the tutorial helpful.

I am not sure exactly  what point in the tutorial broke. Setting up IP Forwarding is Step 6, but the DNS server is only installed in step 8, so before step 8 I was always using IP addresses to refer to the P1-4.

What do you get from the nslookup commands? "nslookup 192.168.2.101", "nslookup p1"?

The second to last line in your message suggests the DNS is working as you can ping p1-p4 but the PI's are not responding.
Is the DNS using the correct IP address for p1-p4?

Also, there are other posts on here that comment a new version of docker might interfere with the br0 interface. I don't know if this is a related issue.

I am away from my cluster at the moment but I will be back next week and can check details and compare notes then if required.
Robert

Rich Conescu

unread,
Feb 14, 2017, 7:33:01 AM2/14/17
to ClusterHAT
Hi, thanks for the quick response!

I have uninstalled docker from the controller, and the behavior is the same as before. Also noting that this behavior was observed prior to installing docker 1.13.1 as well. I have walked through the tutorial about 4 times now with fresh cards each time just to retrace my steps.

If I skip step 6 altogether, I can access the pis, but only via IP 192.168.2.101-104 for the zeroes after completing the DNS step. If I try to ping p1, I get unknown host. But, if I go back and redo step 6, I can ping host p1, but no response. So it is as if when I perform step 6 the onboard DNS works, but I cannot communicate with the pis. When I disable step 6, I lose the benefit of the DNS, but I can regain access to the pis. 

I am not sure why enabling the net.ipv4.ip_forward=1 in sysctl.conf seems to mess up the connection, but it appears to break the link to br0. I have also attempted to "chmod +x /etc/network/iptables" to see if that changes anything. In fact, I am curious if this is part of the docker issue people are seeing. I state that because docker did not change the behavior I am experiencing. 

Do you think it would be possible to instead of using ip forwarding to use a local proxy like squid instead? 

Thanks,
Rich

Robert Metcalf

unread,
Feb 14, 2017, 9:33:42 AM2/14/17
to ClusterHAT
(Sorry, I accidentally emailed reply rather than posting here)

Hi,
I think there might be a mistake in my tutorial, but I am not sure I need to check.

wlan0 is the Pi 3's onboard wifi interface
wlan1 is the WiFi dongle I plugged into the Pi 3.

What happened is I wrote the entire tutorial using wlan0 as the internet access.
When I got to Step 23 I added the dongle wlan1 but I found that the dongle wasn't able to be used as an access point.

But I realized that wlan0 WAS capable of acting as an access point.

So I swapped wlan0 and wlan1 around. So I changed to using the Pi3's wlan1 to access the internet and the controllers wlan0 is configured in the final step as the access point wifi.

So I am wondering to myself if I have my wlan0's and wlan1's the correct way round in the tutorial.

Looking at the config files in parts 1 and 2 I don't think this is the problem 

In every step before 23, wlan0 is used for internet access. (in Step 23 I just say go back and swap them all round)

I was wondering if you have everything connected to the bridge.
ethpi1-4 should be connected to br0 (/etc/network/interfaces)
The DHCP server should have INTERFACES="br0" (/etc/dhcp/dhcpd.conf)

Actually, never mine about the above, this must be working for you since your Pi has got an IP address successfully.

Can you make sure that when you start Step 6 you are be able to:
1. Ping google from the controller
2. SSH to P1 via 102.168.2.201
3. Ping google from P1 but it will FAIL as there is no internet access

Then complete Step 6, this will send traffic from br0 to the internet (wlan0)
Make sure the interfaces are correct:
LOCAL_IFACE=br0
INET_IFACE=wlan0
(/etc/network/iptables)

When you finish step 6 then you should be able to ping google from P1.
Can you check if this is the case?

Robert

Rich Conescu

unread,
Feb 14, 2017, 6:29:25 PM2/14/17
to ClusterHAT
Ok - I believe everything that you mentioned here is the same as in the tutorial. I was able to access the pis by hard coded ip addresses of 192.168.1.101-104. I then followed step 6 to the letter, making sure that the iptables file is setup and referenced appropriately. My wlan0 is my inet_iface and br0 is my local_iface.

I even chmod +x iptables and ran as sudo. This did nothing, and no access to the pis.

In fact, weirder note, I ran "cat /var/lib/dhcp/dhcpd.leases" and noticed the controller was assigned to 192.168.2.210. I could not ping that address nor could I ssh in. That was however the only assigned IP to appear.

I feel like the controller is operating outside of the local network and cannot access inside the DHCP server. If I disable the ipv4 forwarding however, everything returns to normal. It is just strange.

Thank you!

Rich Conescu

unread,
Feb 14, 2017, 6:30:55 PM2/14/17
to ClusterHAT
Correction: I was using 192.168.2.101-104 to access the pis. Not what was mentioned in the previous post.


On Friday, December 30, 2016 at 4:00:46 PM UTC-5, Robert Metcalf wrote:

Rich Conescu

unread,
Feb 14, 2017, 7:16:15 PM2/14/17
to ClusterHAT
I think I found the problem! the DHCP server is no longer launching and specifically because of ipv4 forwarding. Now the question is, what did I do incorrectly? Here are the details:

Feb 14 19:09:16 controller dhcpd[3301]: 
Feb 14 19:09:16 controller dhcpd[3301]: No subnet declaration for br0 (no IPv4 addresses).
Feb 14 19:09:16 controller dhcpd[3301]: ** Ignoring requests on br0.  If this is not what
Feb 14 19:09:16 controller dhcpd[3301]: you want, please write a subnet declaration
Feb 14 19:09:16 controller dhcpd[3301]: in your dhcpd.conf file for the network segment
Feb 14 19:09:18 controller isc-dhcp-server[3292]: Starting ISC DHCP server: dhcpdcheck syslog for diagnostics. ... failed!
Feb 14 19:09:18 controller isc-dhcp-server[3292]: failed!
Feb 14 19:09:18 controller systemd[1]: isc-dhcp-server.service: control process exited, code=exited status=1
Feb 14 19:09:18 controller systemd[1]: Failed to start LSB: DHCP server.
Feb 14 19:09:18 controller systemd[1]: Unit isc-dhcp-server.service entered failed state.

This is my dhcpd.conf:

ddns-update-style none;
option domain-name "metcarob-local.com";
##option domain-name-servers ns1.example.org, ns2.example.org;

##default-lease-time 600;
##max-lease-time 7200;

# If this DHCP server is the official DHCP server for the local
# network, the authoritative directive should be uncommented.
authoritative;

# Use this to send dhcp log messages to a different log file (you also
# have to hack syslog.conf to complete the redirection).
log-facility local7;

subnet 192.168.2.0 netmask 255.255.255.0 {
        range 192.168.2.201 192.168.2.250;

        option routers                  192.168.2.1;
        option subnet-mask              255.255.255.0;
        option broadcast-address        192.168.2.255;
        option domain-name-servers      192.168.2.1;
        ##option ntp-servers              192.168.1.1;  #DO NOT SEND ntp server info
        option netbios-name-servers     192.168.1.1;
        option netbios-node-type 2;
        default-lease-time 86400;
        max-lease-time 86400;
host p1 {
                hardware ethernet 00:22:82:FF:FF:01;
                fixed-address 192.168.2.101;
        }
        host p2 {
                hardware ethernet 00:22:82:FF:FF:02;
                fixed-address 192.168.2.102;
        }
        host p3 {
                hardware ethernet 00:22:82:FF:FF:03;
                fixed-address 192.168.2.103;
        }
        host p4 {
                hardware ethernet 00:22:82:FF:FF:04;
                fixed-address 192.168.2.104;
        }
}

Rich Conescu

unread,
Feb 15, 2017, 6:36:55 AM2/15/17
to ClusterHAT
Ok - so I have done the steps out of order, and somehow, the DHCP service is now working and DNS is not responding. I have set dns to everywhere mentioned in the tutorial, and this time it is still defaulting the DNS to 192.168.1.1, even when I execute iptables manually through sudo. 

Would it not be simpler to manually define p1-p4 in /etc/hosts and just give up on DNS? DHCP is hardcoding the IP addresses correctly.


On Friday, December 30, 2016 at 4:00:46 PM UTC-5, Robert Metcalf wrote:

Robert Metcalf

unread,
Feb 20, 2017, 6:37:23 AM2/20/17
to ClusterHAT

Hi,

Sorry for the delay in response. I have a lot going on at the moment.

Yes it would be perfectly possible to use host files instead of running a DNS server, or even specify IP addresses. I had a few reasons for wanting to run a DHCP server on the cluster controller.

1.       I would have to maintain of host files (1 on controller, 4x1 on each Pi Zero, and 1 my laptop when I connect it to the access point.)

2.       It would be tricky for my laptop as I connect it to different networks and I would have to keep swapping out host files

3.       I want to take my cluster hat to hack sessions and work with other developers who may want to connect to the access point. It would be nicer to demo it to them without getting them to setup hosts files

4.       I want to be learning how to use Docker, Docker Swarm, Ansible, having my own Docker Repo, maybe Kubernetes. I would like my environment as ‘real’ as possible and especially with the networking parts not having a DNS server with host names setup may complicate things.

Also this whole thing is a learning project for me so I wanted to learn how to setup my own DNS server. Now I have it working I am thinking of using what learnt to somehow replace my ISP router’s DNS server because one I create myself might be more maintainable.

Robert

Alexis Iglauer

unread,
Mar 3, 2017, 6:14:17 PM3/3/17
to ClusterHAT
Hi

Thanks for this great tutorial -- so far I am only reading it (not implementing), but will do that when I get time, hopefully soon.  One simplification I can offer immediately though: In step 11, you can copy your ssh keys to the target computer with ssh-copy-id -i ~/.ssh/id_rsa.pub pX .
Given that your pi zeroes are in a private network, you could argue you don't need to change the passwords anyway, but you can do that when you ssh to the pi to check that the ssh-copy-id worked.

Regards
Alexis

Robert Metcalf

unread,
Mar 5, 2017, 11:04:44 AM3/5/17
to ClusterHAT
Hi,
Thanks for that improvement. Sorry for the delay in replying but I took my cluster to an open conference in London yesterday and I was busy preparing stuff for that.
I have tested it out on my cluster and it worked so I have updated Step 11. It will still have the bit changing the pi password as I think that keeping it as default may be a security risk. (I know the network is private but I prefer belt and braces with security and I could theoretically route services to the pi and make a mistake exposing ssh.)
If you have any other comments or improvements as you go through it I would be interested to hear from you.
Robert

Alexis Iglauer

unread,
Mar 6, 2017, 6:43:04 PM3/6/17
to ClusterHAT
OK, going through this step-by-step, so I'm listing ideas/experiences as I go along, and maybe helping others along the way.  Apologies that this is coming in so drip-fed, and kudos once again to your excellent guide.  Several of my suggestion are personal preference and seem simpler to me, however this does not mean they appear simpler to anyone else. No worries if you decide not to use them.

I know dnsmasq relatively well, and have based my installation on that.  I think it is simpler than using ISC + bind, as dnsmasq has DHCP and DNS rolled into one (so only one package, only one config file, and anything that gets a DHCP lease will be known to the DNS).  So my version of Step 3 is:

sudo apt install dnsmasq

The relevant config (place this in /etc/dnsmasq.d/pi-cluster) is

interface=br0
dhcp-range=10.0.0.100,10.0.0.200,12h

dhcp-host=00:22:82:ff:ff:01,10.0.0.21,pi1
dhcp-host=00:22:82:ff:ff:02,10.0.0.22,pi2
dhcp-host=00:22:82:ff:ff:03,10.0.0.23,pi3
dhcp-host=00:22:82:ff:ff:04,10.0.0.24,pi4

Note that I am using different IP ranges (as my 192.168.2.x is occupied).

Later on, the command to see whether leases have been assigned is:
pi@controller:/etc/dnsmasq.d $ cat /var/lib/misc/dnsmasq.leases
1488883491 00:22:82:ff:ff:04 10.0.0.24 pi4 01:00:22:82:ff:ff:04
1488881867 00:22:82:ff:ff:03 10.0.0.23 pi3 01:00:22:82:ff:ff:03
1488881816 00:22:82:ff:ff:02 10.0.0.22 pi2 01:00:22:82:ff:ff:02
1488881902 00:22:82:ff:ff:01 10.0.0.21 pi1 01:00:22:82:ff:ff:01

Steps 8 and 9 (DNS / bind) are not necessary in my setup.

For step 6, I prefer using ufw (uncomplicated firewall) to set up iptables rules etc.  It is available on debian, and also takes care of things staying in place through reboots etc.  Docs are here: https://help.ubuntu.com/community/UFW

sudo apt install ufw
sudo ufw enable
sudo ufw allow ssh
sudo ufw allow from 10.0.0.0/24

The above allows ssh from 'outside' and allows the pi zeros to access any service on the controller.  Now set up masquerading as per https://help.ubuntu.com/lts/serverguide/firewall.html#ip-masquerading (taking care to put in the appropriate network in the line

-A POSTROUTING -s 192.168.0.0/24 -o eth0 -j MASQUERADE


That's it from me most likely -- I'll be moving on to ansible now, where I am a total beginner so I doubt I will be able to offer any improvements to your guide.

Thanks again and cheers!
Alexis

Alexis Iglauer

unread,
Mar 7, 2017, 7:31:29 AM3/7/17
to ClusterHAT
On a addition:  ufw needs 
sudo ufw allow bootps  else DHCP doesn't work.

Thank you and regards
Alexis

Robert Metcalf

unread,
Mar 25, 2017, 6:50:59 AM3/25/17
to ClusterHAT
Hi, 
Thanks for all that info. Sorry for the delay in posting a response.
I am working out how to incorporate other program choices into the tutorial.
I am not an expert in different linux programs and I just got the choices I made working satisfactory for me then moved on.
Robert

Giorgio

unread,
Apr 8, 2017, 10:00:01 AM4/8/17
to ClusterHAT
Hello, 
thank you for sharing this experience with us.
I was following the steps in the tutorial and have encountered the same problem as Rich: after setting the IP tables, I can't access the zeros.

Robert Metcalf

unread,
Apr 10, 2017, 2:21:45 PM4/10/17
to ClusterHAT
Two people with the same issue means there is defiantly something wrong with the tutorial.
I have double checked my iptable config files vs the live one I am using and there doesn't seem to be any transcription errors. It's possible there is some difference due to version changes with the images.
I think when I have some time I can get myself a new set of SD cards and retry the setup from scratch and alter the tutorials.
It might be some months before this can happen though.

Alexis Iglauer

unread,
Apr 10, 2017, 2:35:27 PM4/10/17
to ClusterHAT
Just checking something -- are you trying to access the pi0s from the controller pi or from the broader network?  If from the broader network, is the controller on wired ethernet or wifi (wifi needs extra steps).

Have you tried my version of the config which uses ufw?

Thank you and regards
Alexis

Robert Metcalf

unread,
Apr 10, 2017, 2:49:00 PM4/10/17
to ClusterHAT
On my setup I can access the Pi from the controller with:
ssh p1
At this stage in the tutorial (part 2 Step 6) the controller isn't yet a wifi controller. However when the tutorial is complete it is and at that point you can log in to the controller's wifi network and use it's dhcp server and use the same command to access the pi. (Although you may need to do:
ssh pi@p1
)

If you have plugged in the controller to the wired network then I am sure the /etc/network/iptables file will need to be different. Also network interfaces file (/etc/network/interfaces) will need to be different.
Maybe the dhcp and other files might need changing.
If you want a way to setup a cluster from a wired network I think there are other tutorials out there. The reason I did my own rather than follow another one was because all the ones I found described how to setup a cluster on a wired network and I wanted wireless.

Giorgio

unread,
Apr 10, 2017, 3:08:06 PM4/10/17
to ClusterHAT
Thank you for the quick response.
the networks come in the form of wlan0 to the controller. After some twiksI had no more DHCP errors and I was able to ssh from the controller, but unable to ping from rpzw to outside.

PS: as my zeros a W, i connect them to the local wireless and strage thing I was unable to rich outside, looks like a conflict or something if I disable usb0, which comes from hat, the wlan0 on zero gets out...
PS: Alexis method get me in the same situation. 

I dont think it's a problem in *.img, it's mostly because of my novice expertise. Then my question is how should I reverse it to see where I do wrong? As Rich said, I did it step by step 10 times maybe 20 times each time on new *.img... still stuck ...

Patrick Nooijen

unread,
Apr 11, 2017, 2:02:55 AM4/11/17
to ClusterHAT
isn't it a routing problem? Do a route -n and check that table. You can also post it here.

Giorgio

unread,
Apr 11, 2017, 2:52:05 AM4/11/17
to ClusterHAT

Giorgio

unread,
Apr 11, 2017, 3:15:16 AM4/11/17
to ClusterHAT
Alexis,
If my connection with the internet on the controller is wlan0 with 192.168.1.86 is this ufw rule right:

-A POSTROUTING -s 192.168.1.0/24 -o wlan0 -j MASQUERADE
??

Thanks

Alexis Iglauer

unread,
Apr 11, 2017, 4:19:34 AM4/11/17
to ClusterHAT

I think for wlan you also need to use ARP masquerading (hence my

question on whether it is wired or wlan).  I've done this using

parprouted https://wiki.debian.org/BridgeNetworkConnectionsProxyArp.

Patrick Nooijen

unread,
Apr 11, 2017, 5:05:05 AM4/11/17
to ClusterHAT
you have 2 gateways on the zero. Try removing the one you dont need. Either BR0 or WLAN0. Depends on what you're trying to do.
Can you reach your router from the zero?

Op dinsdag 11 april 2017 08:52:05 UTC+2 schreef Giorgio:

Giorgio

unread,
Apr 11, 2017, 6:24:33 AM4/11/17
to ClusterHAT
Thanks for your involvement.
I need the br0 to give me the access/internet to zero while with wlan0 to create an Access Point.
at this point, i can ssh into the controller and from controller to zero BUT i can't access zero from outside and I can't get out from it. If i disable br0 I can ssh directly to zero and outside, but ofc no access to the controller. If I disable wlan0 on zero I cant ping outside of it...
ps: disabling a gateway is bringing its interface down?;)

Patrick Nooijen

unread,
Apr 11, 2017, 8:20:38 AM4/11/17
to ClusterHAT
that is the WLAN on the pi3 controller. Here you build the hostapd config.
try disabling yout wlan on the pi zero. Then everything should be routed over the BR0.

If you then connect wlan0 on the zero it knows your home network (on the pi zero) over the br0 and wlan0. then you should remove one route to make it work.

Op dinsdag 11 april 2017 12:24:33 UTC+2 schreef Giorgio:

Scott Beeker

unread,
Apr 21, 2017, 10:51:19 PM4/21/17
to ClusterHAT
In your clusterhat tutorial part II.  The first steps 1-6.  Are these on the controller or the first RPI?  Just completed part I.  Confused on part II which pi we are operationg on for parts 1-6.  Also, when I setup the cluster I used the four cluster zero images and not the controller image for the controller when configuring all the pi zeros.  Is that issue?  

Robert Metcalf

unread,
Apr 22, 2017, 4:17:09 AM4/22/17
to ClusterHAT
Hi,
Step 6 - Forward traffic from Pi's to Internet 
Preformed on the controller.

Step 7 - Setup Pi Zero's 2,3 and 4
Start by setting up the SD cards another machine. (I used the controller image but note there is a change in a file that needs to be preformed. Remember to change PX to P1 P2 P3 or P4.)
The step then instructs you to ssh into the controller and turn on P1-4.
You need to ssh into each pi to test that you can, but then the command 'clusterhat off p1' has to be executed on the controller.

Step 8 - Install and configure a DNS server on the controller (bind)
Preformed on the controller.

Step 9 - Add a DNS zone for the cluster
Preformed on the controller.

Step 10 - Checkpoint
Preformed on the controller.

Hope that helps

Robert

Scott Beeker

unread,
Apr 23, 2017, 3:09:01 AM4/23/17
to ClusterHAT
Thanks! Helps a lot.

Couple more questions / Issues.  Only having completed parts 1-4
1) When you create the second series of networks.  192.168.2.* (not exactly sure what you call it since they appear to actually be point to point individual USB networks bridged to ethernet.). Anyways, I lose ability for pi zeros to access the Internet.  This is important because none of the pi zeros can update and or install anything from web.  I would expect since now there is a 2.* network there needs to be routes in the router to handle this.
2) When I run the docker script the template part still does not copy the docker template file since it is part of a path and path is not fully there
3) The whole cluster loses it when I try to do any of the steps in part 5 to configure the wlan1.

Scott Beeker

unread,
Apr 23, 2017, 3:24:18 AM4/23/17
to ClusterHAT
I have a problem that the resolv.conf file keeps changing back to original config. When this happens I get similar to what you describe
until I reset the name server and domain in resolv.conf back to 127.0.0.1 and sab-local.com because this makes decition to used local  controller DNS or forward it to your router

Matt Harris

unread,
Apr 27, 2019, 10:50:24 AM4/27/19
to ClusterHAT
Hi have just come across this post and am very keep to give this tutorial a go however, the link appears to be dead.


Is this tutorial posted anywhere else?
Message has been deleted

Patrick Nooijen

unread,
Sep 30, 2019, 4:52:25 AM9/30/19
to ClusterHAT
https://code.metcarob.com/node/174

Op zaterdag 27 april 2019 16:50:24 UTC+2 schreef Matt Harris:
Reply all
Reply to author
Forward
0 new messages