Having trouble with passwordless SSH and NFS

Jeffrey Layton

unread,

May 28, 2017, 11:10:20 AM5/28/17

to ClusterHAT

Good morning,

I tried to configure passwordless ssh to all of the nodes in the ClusterHAT and I ran into some problems. I ran the command recommended on 8086.net (https://8086.support/content/23/83/en/how-do-i-create-and-copy-the-ssh-key-from-the-controller-pi-to-the-pi-zeros-to-allow-passwordless-logins.html ). Here is the output:

I set up the ssh keys on the controller without a password. Then I ran the recommended command. Here is the output.

pi@controller:~/.ssh $ for I in 1 2 3 4; do echo "Copying to p$I.local";ssh-copy-id pi@p$I.local;done

Copying to p1.local

/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed

/usr/bin/ssh-copy-id: ERROR: ssh: Could not resolve hostname p1.local: Name or service not known

Copying to p2.local

/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed

/usr/bin/ssh-copy-id: ERROR: ssh: Could not resolve hostname p2.local: Name or service not known

Copying to p3.local

/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed

/usr/bin/ssh-copy-id: ERROR: ssh: Could not resolve hostname p3.local: Name or service not known

Copying to p4.local

The authenticity of host 'p4.local (192.168.1.99)' can't be established.

ECDSA key fingerprint is 3b:10:cd:84:68:21:36:82:61:c8:d2:b6:bb:e1:05:7d.

Are you sure you want to continue connecting (yes/no)? yes

/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed

/usr/bin/ssh-copy-id: WARNING: All keys were skipped because they already exist on the remote system.

For some reason it doesn't like the ".local" extension. So I took that off the command and it seemed to work. Any ideas why it doesn't like ".local"?

Also, when I ssh to a node such as p1, I can't do a passwordless ssh to any other p node. Any thoughts? (I usually just NFS export /home from the "controller" to the "p" nodes but I'm having problems with that too (may be related to the ".local" issue.

TIA!

Jeff

Chris Burton

unread,

May 28, 2017, 12:15:42 PM5/28/17

to ClusterHAT

Hi,

For some reason it doesn't like the ".local" extension. So I took that off the command and it seemed to work. Any ideas why it doesn't like ".local"?

If you can't log into them using "ssh p...@p1.local" then I would guess they haven't been powered on long enough to boot up properly, or they haven't joined the bridge correctly.

Depending on what the Pi Zero is doing whilst booting up it can take longer than a minute for it to boot up so it might be worth waiting a little longer before trying to copy the keys. If you've waited long enough I'd advise checking the network interfaces exist "ifconfig -a" should show the ethpi1 interface, they've been added to the bridge correctly "brctl show" should list br0 with eth0/ethpi1, then once they've booted up you should be able to ping them "ping -c1 p1.local" and once pingable you should be able to access them via SSH a few seconds later "ssh p...@p1.local" and similar for the other Pi Zeros. Once you can SSH in with "ssh p...@p1.local" you should be able to copy the keys over.

Also, when I ssh to a node such as p1, I can't do a passwordless ssh to any other p node. Any thoughts?

This is to be expected as you'd need to create a new private key on p1, and then copy it to controller/p2/p3/p4 and repeat for the other Pi Zeros.

(I usually just NFS export /home from the "controller" to the "p" nodes but I'm having problems with that too (may be related to the ".local" issue.

You didn't say what your problem was but when I was looking at nfsroot on the Pi Zeros I found rpcbind wasn't always starting on boot (https://github.com/dagon666/rpcbindplumber links to some of the issues) so that might be something to check.

Chris.

Jeffrey Layton

unread,

May 28, 2017, 1:09:35 PM5/28/17

to Chris Burton, ClusterHAT

Chris,

Thanks for the tips! My answers are in-line below.

On Sun, May 28, 2017 at 4:15 PM, Chris Burton <bur...@gmail.com> wrote:

Hi,
For some reason it doesn't like the ".local" extension. So I took that off the command and it seemed to work. Any ideas why it doesn't like ".local"?

If you can't log into them using "ssh p...@p1.local" then I would guess they haven't been powered on long enough to boot up properly, or they haven't joined the bridge correctly.

Depending on what the Pi Zero is doing whilst booting up it can take longer than a minute for it to boot up so it might be worth waiting a little longer before trying to copy the keys. If you've waited long enough I'd advise checking the network interfaces exist "ifconfig -a" should show the ethpi1 interface, they've been added to the bridge correctly "brctl show" should list br0 with eth0/ethpi1, then once they've booted up you should be able to ping them "ping -c1 p1.local" and once pingable you should be able to access them via SSH a few seconds later "ssh p...@p1.local" and similar for the other Pi Zeros. Once you can SSH in with "ssh p...@p1.local" you should be able to copy the keys over.

pi@controller:~ $ clusterhat on all

Turning on P1

Turning on P2

Turning on P3

Turning on P4

pi@controller:~ $ date

Sun 28 May 16:36:28 UTC 2017

pi@controller:~ $ date

Sun 28 May 16:37:42 UTC 2017

pi@controller:~ $ date

Sun 28 May 16:39:25 UTC 2017

pi@controller:~ $ date

Sun 28 May 16:42:10 UTC 2017

pi@controller:~ $ for I in 1 2 3 4; do echo "Copying to p$I.local";ssh-copy-id pi@p$I.local;done

Copying to p1.local

/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed

/usr/bin/ssh-copy-id: ERROR: ssh: Could not resolve hostname p1.local: Name or service not known

Copying to p2.local

/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed

/usr/bin/ssh-copy-id: ERROR: ssh: Could not resolve hostname p2.local: Name or service not known

Copying to p3.local

/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed

/usr/bin/ssh-copy-id: ERROR: ssh: Could not resolve hostname p3.local: Name or service not known

Copying to p4.local

/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed

/usr/bin/ssh-copy-id: WARNING: All keys were skipped because they already exist on the remote system.

pi@controller:~ $ ssh -v p1.local

OpenSSH_6.7p1 Raspbian-5+deb8u3, OpenSSL 1.0.1t 3 May 2016

debug1: Reading configuration data /etc/ssh/ssh_config

debug1: /etc/ssh/ssh_config line 19: Applying options for *

ssh: Could not resolve hostname p1.local: Name or service not known

pi@controller:~ $ ssh p1

The programs included with the Debian GNU/Linux system are free software;

the exact distribution terms for each program are described in the

individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent

permitted by applicable law.

Last login: Sun May 28 15:04:59 2017 from p4

pi@p1:~ $

On the controller node I took a look at the interfaces:

pi@controller:~ $ ifconfig -a

br0 Link encap:Ethernet HWaddr 00:22:82:ff:fe:01

inet addr:192.168.1.95 Bcast:192.168.1.255 Mask:255.255.255.0

inet6 addr: fe80::ba27:ebff:fe39:c0c7/64 Scope:Link

inet6 addr: 2602:304:cf01:8a00:89a0:ba8a:9db1:2a88/64 Scope:Global

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

RX packets:8255 errors:0 dropped:0 overruns:0 frame:0

TX packets:4369 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

RX bytes:5876243 (5.6 MiB) TX bytes:702485 (686.0 KiB)

eth0 Link encap:Ethernet HWaddr b8:27:eb:39:c0:c7

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

RX packets:9259 errors:0 dropped:0 overruns:0 frame:0

TX packets:4740 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

RX bytes:6469887 (6.1 MiB) TX bytes:784942 (766.5 KiB)

ethpi1 Link encap:Ethernet HWaddr 00:22:82:ff:fe:01

inet6 addr: fe80::222:82ff:feff:fe01/64 Scope:Link

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

RX packets:173 errors:0 dropped:0 overruns:0 frame:0

TX packets:1644 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

RX bytes:15845 (15.4 KiB) TX bytes:122321 (119.4 KiB)

ethpi2 Link encap:Ethernet HWaddr 00:22:82:ff:fe:02

inet6 addr: fe80::222:82ff:feff:fe02/64 Scope:Link

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

RX packets:104 errors:0 dropped:0 overruns:0 frame:0

TX packets:1557 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

RX bytes:7212 (7.0 KiB) TX bytes:111494 (108.8 KiB)

ethpi3 Link encap:Ethernet HWaddr 00:22:82:ff:fe:03

inet6 addr: fe80::222:82ff:feff:fe03/64 Scope:Link

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

RX packets:109 errors:0 dropped:0 overruns:0 frame:0

TX packets:1541 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

RX bytes:7864 (7.6 KiB) TX bytes:110089 (107.5 KiB)

ethpi4 Link encap:Ethernet HWaddr 00:22:82:ff:fe:04

inet6 addr: fe80::222:82ff:feff:fe04/64 Scope:Link

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

RX packets:192 errors:0 dropped:0 overruns:0 frame:0

TX packets:1491 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

RX bytes:20224 (19.7 KiB) TX bytes:107021 (104.5 KiB)

lo Link encap:Local Loopback

inet addr:127.0.0.1 Mask:255.0.0.0

inet6 addr: ::1/128 Scope:Host

UP LOOPBACK RUNNING MTU:65536 Metric:1

RX packets:148 errors:0 dropped:0 overruns:0 frame:0

TX packets:148 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1

RX bytes:12192 (11.9 KiB) TX bytes:12192 (11.9 KiB)

wlan0 Link encap:Ethernet HWaddr b8:27:eb:6c:95:92

UP BROADCAST MULTICAST MTU:1500 Metric:1

RX packets:153 errors:0 dropped:153 overruns:0 frame:0

TX packets:0 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

RX bytes:54087 (52.8 KiB) TX bytes:0 (0.0 B)

Looks like they are all there. Looking at the Bridges:

pi@controller:~ $ brctl show

bridge name bridge id STP enabled interfaces

br0 8000.002282fffe01 no eth0

ethpi1

ethpi2

ethpi3

ethpi4

I'm not familiar with the bridging so I'm not sure what this means.

I can't ping p1.local from the controller:

pi@controller:~ $ ping -c1 p1.local

ping: unknown host p1.local

The controller has been up for 15 minutes and the compute nodes have been up 13 minutes when I tried the commands.

I waited another 10 minutes or so and I was able to ping p1.local and log into it with "ssh p...@p1.local". So I tried copying the pub key to the nodes:

pi@controller:~ $ for I in 1 2 3 4; do echo "Copying to p$I.local";ssh-copy-id pi@p$I.local;done

Copying to p1.local

/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed

/usr/bin/ssh-copy-id: WARNING: All keys were skipped because they already exist on the remote system.

Copying to p2.local

/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed

/usr/bin/ssh-copy-id: ERROR: ssh: Could not resolve hostname p2.local: Name or service not known

Copying to p3.local

/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed

/usr/bin/ssh-copy-id: ERROR: ssh: Could not resolve hostname p3.local: Name or service not known

Copying to p4.local

/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed

/usr/bin/ssh-copy-id: WARNING: All keys were skipped because they already exist on the remote system.

I can ssh into the nodes now. But 15+ minutes seems like a long time. Any suggestions on debugging this? (or speeding it up?)

Also, when I ssh to a node such as p1, I can't do a passwordless ssh to any other p node. Any thoughts?

This is to be expected as you'd need to create a new private key on p1, and then copy it to controller/p2/p3/p4 and repeat for the other Pi Zeros.

Yep. This is why I want to use NFS to export /home from the controller to all of the compute nodes :)

(I usually just NFS export /home from the "controller" to the "p" nodes but I'm having problems with that too (may be related to the ".local" issue.

You didn't say what your problem was but when I was looking at nfsroot on the Pi Zeros I found rpcbind wasn't always starting on boot (https://github.com/dagon666/rpcbindplumber links to some of the issues) so that might be something to check.

I haven't thought about nfsroot - cool idea (I'll try that at some other point).

In the meantime, on the controller node I installed the NFS server on the controller and created /etc/exports:

pi@controller:~ $ more /etc/exports

/home *(rw,sync,no_subtree_check)

pi@controller:~ $ sudo exportfs -a

pi@controller:~ $ sudo showmount -v

showmount for 1.2.8

I used the wildcard to get started. I grabbed p4 to test. I edited /etc/fstab to look like the following:

pi@p4:~ $ more /etc/fstab

proc /proc proc defaults 0 0

/dev/mmcblk0p1 /boot vfat defaults 0 2

/dev/mmcblk0p2 / ext4 defaults,noatime 0 1

controller.local:/home /home nfs defaults 0 0

# a swapfile is not a swap partition, no line here

# use dphys-swapfile swap[on|off] for that

pi@p4:~ $ ls -s

total 0

The last line shows no files (I created a "junk" file in the /home of the controller). Running "mount" shows it's not mounted:

pi@p4:~ $ mount

/dev/mmcblk0p2 on / type ext4 (rw,noatime,data=ordered)

devtmpfs on /dev type devtmpfs (rw,relatime,size=218256k,nr_inodes=54564,mode=755)

sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)

proc on /proc type proc (rw,relatime)

tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)

devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)

tmpfs on /run type tmpfs (rw,nosuid,nodev,mode=755)

tmpfs on /run/lock type tmpfs (rw,nosuid,nodev,noexec,relatime,size=5120k)

tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)

cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/lib/systemd/systemd-cgroups-agent,name=systemd)

cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)

cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)

cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)

cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)

cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)

cgroup on /sys/fs/cgroup/net_cls type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls)

systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=22,pgrp=1,timeout=300,minproto=5,maxproto=5,direct)

debugfs on /sys/kernel/debug type debugfs (rw,relatime)

mqueue on /dev/mqueue type mqueue (rw,relatime)

configfs on /sys/kernel/config type configfs (rw,relatime)

/dev/mmcblk0p1 on /boot type vfat (rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=ascii,shortname=mixed,errors=remount-ro)

So I "sudo-ed" to /bin/bash on p4 and went to /tmp and tried remounting /home:

pi@p4:~ $ sudo /bin/bash

root@p4:/home/pi# cd /tmp

root@p4:/tmp# mount -a

^C

root@p4:/tmp# mount -a -v

/proc : already mounted

/boot : already mounted

/ : ignored

mount.nfs: timeout set for Sun May 28 17:06:37 2017

mount.nfs: trying text-based options 'vers=4,addr=192.168.1.95,clientaddr=192.168.1.99'

mount.nfs: mount(2): Connection refused

mount.nfs: trying text-based options 'vers=4,addr=192.168.1.95,clientaddr=192.168.1.99'

mount.nfs: mount(2): Connection refused

^C

Not sure why it's refusing to mount. Iptables? (It's running but there aren't any rules). There is nothing in the logs of controller or p4 (the only thing in there is 'action 17' around rsyslogd.

Thanks!

Jeff

Jeffrey Layton

unread,

May 29, 2017, 4:03:15 PM5/29/17

to Chris Burton, ClusterHAT

Chris (or whomever),

Sorry to jump back a little. I got NFS working (needed to restart rpcbind and nfs-kernel-server). But I'm having some trouble getting the bridging to work. For example, I booted all 4 zeros and after about 35 minutes, only one zero allowed to ping (p4.local). The other zeros can't be pinged as using the .local extension ("unknown host").

pi@controller:~ $ brctl show

bridge name bridge id STP enabled interfaces

br0 8000.002282fffe01 no eth0

ethpi1

ethpi2

ethpi3

ethpi4

I didn't add anything to the images on the zeros so I don't know why it takes so long for the bridging to come up.

I'm not sure what's causing the issue. Does anyone have any suggestions?

TIA!

Jeff

Chris Burton

unread,

May 31, 2017, 7:41:55 PM5/31/17

to ClusterHAT, bur...@gmail.com

Hi,

I didn't add anything to the images on the zeros so I don't know why it takes so long for the bridging to come up.

I'm not sure what's causing the issue. Does anyone have any suggestions?

I'm not sure how you can contact the Pi Zero using just "ssh pi@p4" as the Controller doesn't have this information in the standard setup. The only thing I can think of is your local router/gateway is handing out the IP via DHCP, adding the hostname to the local DNS server (and possibly populating /etc/resolv.conf with a "search" value to use). But that still leaves the question of why p4.local works but not the others, unless it has maybe ran out of IP addresses on the DHCP side of things?

Can you try the Cluster HAT using a local KB/mouse/monitor (no ethernet on the controller) to see if after waiting a min from powering on p1-p4 you can "ping -c1 p1.local" .. "ping -c1 p4.local", if that works then I'd look at your router/gateway settings/manual.

Chris.

Jeffrey Layton

unread,

Jun 4, 2017, 8:57:58 AM6/4/17

to Chris Burton, ClusterHAT

On Wed, May 31, 2017 at 7:41 PM, Chris Burton <bur...@gmail.com> wrote:

Hi,
I didn't add anything to the images on the zeros so I don't know why it takes so long for the bridging to come up.

I'm not sure what's causing the issue. Does anyone have any suggestions?

I'm not sure how you can contact the Pi Zero using just "ssh pi@p4" as the Controller doesn't have this information in the standard setup. The only thing I can think of is your local router/gateway is handing out the IP via DHCP, adding the hostname to the local DNS server (and possibly populating /etc/resolv.conf with a "search" value to use). But that still leaves the question of why p4.local works but not the others, unless it has maybe ran out of IP addresses on the DHCP side of things?''

This was indeed the problem. My home router was screwing up the DHCP on the cluster. So I pulled the network cable and the system booted just fine. I can then plug in the cable to access the Internet. If I bring down a node and then bring it back up, I have to unplug and then re-plug the network cable. Sort of a pain but not a show stopped. Anyway I can fix this?

Thanks for all of the help! The ClusterHAT just rocks!

Jeff

Can you try the Cluster HAT using a local KB/mouse/monitor (no ethernet on the controller) to see if after waiting a min from powering on p1-p4 you can "ping -c1 p1.local" .. "ping -c1 p4.local", if that works then I'd look at your router/gateway settings/manual.

Chris.

--
You received this message because you are subscribed to the Google Groups "ClusterHAT" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clusterhat+unsubscribe@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/clusterhat/77ade730-ca1a-4cf2-87c0-47156a598cd4%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Virus-free. www.avast.com

Chris Burton

unread,

Jun 5, 2017, 4:19:59 PM6/5/17

to ClusterHAT

Hi,

This was indeed the problem. My home router was screwing up the DHCP on the cluster. So I pulled the network cable and the system booted just fine. I can then plug in the cable to access the Internet. If I bring down a node and then bring it back up, I have to unplug and then re-plug the network cable. Sort of a pain but not a show stopped. Anyway I can fix this?

I would assume switching to static IP addresses for the controller/pi zeros should work around it.

Depending on the router you can probably set fixed leases based on the MAC addresses (to prevent the IPs configured statically being used elsewhere) on the router or use IPs not in the DHCP range but are in the IP range used on the LAN.

Chris.

Reply all

Reply to author

Forward