On Sunday, May 5, 2019 at 1:15:12 PM UTC+1, Marek Marczykowski-Górecki wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> On Sun, May 05, 2019 at 03:19:50AM -0700,
tal...@gmail.com wrote:
> > Hi,
> >
> > I'm trying to find out why HVM qubes using DHCP don't work with mirage-firewall (
https://github.com/mirage/qubes-mirage-firewall/issues/56).
> >
> > The process seems to go like this:
> >
> > 1. The HVM qube makes a DHCP request over its emulated network device.
> > 2. The DHCP server in the stub domain replies, saying the router is 10.137.0.1 (see
https://github.com/QubesOS/qubes-vmm-xen-stubdom-linux/blob/master/rootfs/init#L37).
>
> Oh, you've found a bug here. It was this way in R3.2, but not in R4.0
> anymore.
> Actually, addresses we use are not really DHCP-friendly. The expected
> configuration (from client VM side) is:
> IP: VM's actual IP
> netmask: 255.255.255.255
> routing table:
> 1. gw IP directly on eth0 (ip route add GW_IP/32 dev eth0)
> 2. default gateway via gw IP (ip route add default via GW_IP)
>
> I think it is possible to express this in DHCP response, but definitely
> it isn't what we do right now.
Just returning the IP address of the firewall as the router would be enough for us, I think. If I delete the routes and recreate them with the right IP then it works.
Or, I can just do "arp -s ..." to set a MAC address for 10.317.0.1 and that works too.
> > 3. The Qube tries to use this and fails, because that's not the IP address of the firewall.
> >
> > Testing with sys-firewall, it seems that sys-firewall responds to all ARP requests with its own address. e.g.
> >
> > [user@test ~]$ sudo route add 1.2.3.4 eth0
> > [user@test ~]$ timeout 1s ping 1.2.3.4
> > PING 1.2.3.4 (1.2.3.4) 56(84) bytes of data.
> > [user@test ~]$ sudo arp -an
> > ? (1.2.3.4) at fe:ff:ff:ff:ff:ff [ether] on eth0
> >
> > Is this the expected behaviour? What are the rules about what addresses the firewall should answer for?
>
> I think this is result of enabled proxy_arp[1]. Which is basically a
> workaround for misconfigured VM (default route directly through eth0,
> instead of gw IP).
>
> [1]
https://github.com/QubesOS/qubes-core-agent-linux/blob/e3db225aab74c26ff12d4a4e544cc5d60e1effd7/network/vif-route-qubes#L68
Ah, I see. What do you suggest we do about this? I see several options:
- Wait for a fix.
- File an issue on qubes-issues.
- Get mirage-firewall to respond to *.1 ARP requests.
> BTW There is one more problem you may also hit when talking to HVM
> through emulated nic. There are two interfaces to such HVM, with the
> same IP on the other side. The other interface is paravirtual one. If
> emulated one is in use (present), you should ignore paravirtual one. I'm
> not sure how you handle it on mirage-firewall side, but it's also
> visible in xenstore state of that interface - it stays at state "2" in
> backend and "1" in frontend.
mirage-firewall only allows one client with a given IP address at a time. If another comes along, it waits for the first to disconnect before accepting the new one.
If I use the emulated Realtek driver, I see the firewall log this:
2019-05-03 18:35:30 -00:00: INF [client_net] add client vif {domid=106;device_id=0} with IP 10.137.0.4
2019-05-03 18:35:30 -00:00: INF [client_net] Client 106 (IP: 10.137.0.4) ready
2019-05-03 18:35:30 -00:00: INF [ethernet] Connected Ethernet interface fe:ff:ff:ff:ff:ff
2019-05-03 18:35:31 -00:00: INF [client_net] add client vif {domid=105;device_id=0} with IP 10.137.0.4
Here, the stubdom (106) connected successfully and the main domain (105) never tried to connect, so it works.
If the qube instead uses the PV device:
2019-05-03 18:39:04 -00:00: INF [client_net] add client vif {domid=108;device_id=0} with IP 10.137.0.4
2019-05-03 18:39:05 -00:00: INF [client_net] Client 108 (IP: 10.137.0.4) ready
2019-05-03 18:39:05 -00:00: INF [ethernet] Connected Ethernet interface fe:ff:ff:ff:ff:ff
2019-05-03 18:39:05 -00:00: INF [client_net] add client vif {domid=107;device_id=0} with IP 10.137.0.4
[...]
2019-05-03 18:39:16 -00:00: INF [client_net] Client 107 (IP: 10.137.0.4) ready
2019-05-03 18:39:16 -00:00: INF [ethernet] Connected Ethernet interface fe:ff:ff:ff:ff:ff
2019-05-03 18:39:16 -00:00: INF [client_eth] Waiting for old client 10.137.0.4 to go away before accepting new one
2019-05-03 18:39:16 -00:00: INF [net-xen:backend] Frontend asked to close network device dom:108/vif:0
2019-05-03 18:39:16 -00:00: INF [client_net] client {domid=108;device_id=0} has gone
So in this case it connected the stubdom first, then the PV driver tried to connect and got put on hold, then the stubdom disconnected and the PV driver's connection went though. So that also works :-)