I had a working setup, then something in the network happened that broke
the DHCP client. ...
Basically the interface goes UP (via a simple netmonitor daemon) but the
logic in net/udp/udp_psock_sendto_unbuffered.c finds it DOWN
And the reason it finds it down is that the interface pointer where the
flags are checked is not the same...
Because now I have a TUN interface, which is just created BEFORE dhcp is
negotiated. This interface is still DOWN.
What happens is that in the DHCP client, even if we pass an interface
name, does not use this interface name to select the actual interface on
which the request is sent.
So, having enabled TUN (and created a tunnel, but yet uninitialized),
the DHCP request is sent to the TUN interface (which is down) and not to
the Ethernet interface (which is UP) - just because there is no
interface selection.
How can we force an UDP socket to bind on a specific interface (by MAC
address?), when no IP has been set up yet, and the two interfaces have a
INADDR_ANY address?
I had a look at the standard ISC DHCP source code for linux and it seems
that at least one of the options to do that is to implement something
called SO_BINDTODEVICE that allows to bind a UDP socket to a specific
ethernet interface without resolving the device using an IP address
See here:
https://source.isc.org/cgi-bin/gitweb.cgi?p=dhcp.git;a=blob;f=common/socket.c;h=483eb9c3bd53260a5c27f0849f8bc1c148e65777;hb=HEAD
Line 265
Also documented here https://linux.die.net/man/7/socket
What do you think of this? Maybe I'm missing something simpler? Or
another solution?
it gets the device like:
914 dev = netdev_ifr_dev(req);
Where netdev_ifr_dev() is in the same file. The request, req, contains the device name string as the first parameter and it simply does a lookup for the device with the matching name:
652 return netdev_findbyname(req->ifr_name);
I can't imagine how that would find the an interface where the names doe not match. netdev_findbyname() just basically searches through the devices for the matching name. The logic would have to be broken to get the wrong device. You would have to tell me.
Hello
the netmonitor is a copy of the nsh net monitor. I am asking it to supervise eth0, which is the STM32 MAC.
https://github.com/f4grx/hn70ap/blob/master/apps/sysdaemon/sysdaemon_netmonitor.c
(this is also the file that calls DHCP whenever the link is up)
However, the order of initialization of my app happens to create a tun interface (I have named it uhf%d) BEFORE the DHCP request is sent after the link becomes ready.
Note that disabling TUN fixes the DHCP issue, so there must be some influence from having a second interface...
To send packets, the DHCP Client (apps/netutils/dhcpc/dhcpc.c)
creates a standard UDP socket.
To send the request, the DHCP client does this:
pdhcpc->sockfd = socket(PF_INET, SOCK_DGRAM, 0);--
You received this message because you are subscribed to the Google Groups "NuttX" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nuttx+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
nsh> stm32_ifup: Bringing up: 0.0.0.0
Interface is going UP
netdev_ifr_ioctl: cmd: 1812
The UDP socket is bound to INADDR_ANY
There is no way to tell I want to send on eth0 or on tun0
We just tell to bind on 0.0.0.0 ... But at that point, this address is used BOTH by eth0 and tun0, so there is no way to select the proper interface.
We would need something like 'bind_to_interface' (by MAC or name) instead of bind() by IP, which is ambiguous.
A bit later in dhcpc_request() we just set the selected interface to INADDR_ANY but it is of no use. We should bind to the required interface instead.
netdev_findbyname() will obviously be used within the IP stack, but we need a way to use this from userspace.
That is the problem: when two devices have the same IP address (here
0.0.0.0), the key is not unique and the intended interface cannot be
selected anymore! (the first sentence probably has a typo)
more logs and instrumentation to make it clear that IFUP is called successfully for eth0 but DHCP tries to send data to tun0
netdev_register: Registered MAC: 00:00:00:00:00:00 as dev: tun0
Started tun RX thread
Mass Storage mounted at /data
Mounted /proc
Set MAC: D8:80:39:9C:0C:0D
netdev_ifr_ioctl: cmd: 1813
Entry
netdev_ifr_ioctl: cmd: 1827
arch_phy_irq: Attach PHY IRQ
netdev_ifr_ioctl: cmd: 1819
netdev_ifr_ioctl: cmd: 1828
netdev_ifr_ioctl: cmd: 1829
Bringing the link up
netdev_ifr_ioctl: cmd:*** Launching nsh
1NuttShell (NSH)
8nsh> 18
netdev_ifr_ioctl: Called SIOCSIFFLAGS with pointer 20000ff8, flags
00000002
stm32_ifup: Bringing up: 0.0.0.0
stm32_ethconfig: Reset the Ethernet block
stm32_ethconfig: Initialize the PHY
stm32_phy_boardinitialize: called (intf=0)
stm32_phy_boardinitialize: PHY reset...
stm32_phy_boardinitialize: PHY reset done.
stm32_phyinit: PHYSR[30]: 0116
stm32_phyinit: Duplex: FULL Speed: 100 MBps
stm32_ethconfig: Initialize the MAC and DMA
stm32_ethconfig: Enable normal operation
stm32_macaddress: eth0 MAC: d8:80:39:9c:0c:0d
netdev_ifup: ifup method success, dev for d_flags = 20000ff8
Interface is going UP
netdev_ifr_ioctl: cmd: 1812
dhcpc_open: MAC: d8:80:39:9c:0c:0d
Starting DHCP request
netdev_ifr_ioctl: cmd: 1793
netdev_ifr_ioctl: cmd: 1794
dhcpc_request: Broadcast DISCOVER
psock_udp_sendto: WARNING: device (200040e4) is DOWN
psock_sendto: ERROR: Family-specific send failed: -118
DHCP request failed: -1 errno 118
netdev_ifr_ioctl: cmd: 1827
netdev_ifr_ioctl: cmd: 1819
netdev_ifr_ioctl: cmd: 1828
netdev_ifr_ioctl: cmd: 1829
the dev pointer for eth0 is 20000ff8, netdev_ifup is called and successful
but after DHCP_request we see that psock_udp_sendto FAILS because it has determined that the device for 0.0.0.0 is at pointer 0x200040e4, which is TUN0
This happens even if tun0 is DOWN and eth0 is UP
The reason for this is that in udp_psock_send_unbuffered, the DEV that is used for testing interface flags has been selected a few lines before using udp_find_raddr_device(conn). I have instrumented this function (in udp_finddev) and this is what happens:
udp_find_raddr_device ->udp_find_ipv4_device -> netdev_findby_ipv4addr
So yes there is a search by IP and not by device name.
This then goes to net/netdev/netdev_findbyaddr.c
I have added more instrumentation here.
This is the output again:
Set MAC: D8:80:39:9C:0C:0D
netdev_ifr_ioctl: cmd: 1813
Entry
netdev_ifr_ioctl: cmd: 1827
arch_phy_irq: Attach PHY IRQ
netdev_ifr_ioctl: cmd: 1819
netdev_ifr_ioctl: cmd: 1828
netdev_ifr_ioctl: cmd: 1829
Bringing the link up
*** Launching nsh
NuttShell (NSH)
nsh> netdev_ifr_ioctl: cmd: 1818
netdev_ifr_ioctl: Called SIOCSIFFLAGS with pointer 20000ff8, flags
00000002
stm32_ifup: Bringing up: 0.0.0.0
stm32_ethconfig: Reset the Ethernet block
stm32_ethconfig: Initialize the PHY
stm32_phy_boardinitialize: called (intf=0)
stm32_phy_boardinitialize: PHY reset...
stm32_phy_boardinitialize: PHY reset done.
stm32_phyinit: PHYSR[30]: 0136
stm32_phyinit: Duplex: FULL Speed: 100 MBps
stm32_ethconfig: Initialize the MAC and DMA
stm32_ethconfig: Enable normal operation
stm32_macaddress: eth0 MAC: d8:80:39:9c:0c:0d
netdev_ifup: ifup method success, dev for d_flags = 20000ff8
Interface is going UP
netdev_ifr_ioctl: cmd: 1812
dhcpc_open: MAC: d8:80:39:9c:0c:0d
Starting DHCP request
netdev_ifr_ioctl: cmd: 1793
netdev_ifr_ioctl: cmd: 1794
dhcpc_request: Broadcast DISCOVER
udp_find_raddr_device: called for PF_INET
with conn->u.ipv4.raddr=FFFFFFFF
netdev_findby_ipv4addr: called
netdev_findby_ipv4addr: return g_netdevices because
rip=BROADCAST and lip=ANY
psock_udp_sendto: WARNING: device (200040e4) is DOWN
psock_sendto: ERROR: Family-specific send failed:
-118
DHCP request failed: -1 errno 118
netdev_ifr_ioctl: cmd: 1827
netdev_ifr_ioctl: cmd: 1819
netdev_ifr_ioctl: cmd: 1828
netdev_ifr_ioctl: cmd: 1829
So: We are in the situation where RIP=broadcast and LIP=ANY
This goes to a branch of the code where I can read:
/* First, check if this is the broadcast IP address */
if (net_ipv4addr_cmp(ripaddr, INADDR_BROADCAST))
{
/* Yes.. Check the local, bound address. Is it INADDR_ANY?
*/
if (net_ipv4addr_cmp(lipaddr, INADDR_ANY))
{
ninfo("return g_netdevices because rip=BROADCAST and lip=ANY\n");
<-- this is my added instrumentation
/* Yes.. In this case, I think
we are supposed to send the
* broadcast packet out ALL locally available
networks. I am not
* sure of that and, in any event, there is nothing
we can do
* about that here.
*
* REVISIT: For now, arbitrarily return the first
network
* interface in the list of network devices. The
broadcast
* will be sent on that device only.
*/
return g_netdevices;
}
else
{
/* Return the device associated with the local address
*/
ninfo("return based on lipaddr\n");
return netdev_finddevice_ipv4addr(lipaddr);
}
}
Admittedly this is a bit different: I thought the problem was the
duplicated local address, but in fact, DHCP requires sending a
BROADCAST to eth0 but in that situation you just return
g_netdevices which is whatever happens to be the last registered
interface, here, tun0.
Sending on ALL interfaces does not look like a good option, and an arbitrary device is what happens if we dont bind a UDP socket to a particular interface. So again: How do we make sure that the choice is not arbitrary, but is sent to the interface passed to the dhcp client?
I insist that this is a DHCP specific issue because we want
to send broadcast on a controlled interface with no bound source
address. All other broadcast do not have this problem since
the correct device can be selected via its correctly defined
source address.
Sebastien
more logs and instrumentation to make it clear that IFUP is called successfully for eth0 but DHCP tries to send data to tun0
I cannot do that, since this is about a DHCP client!
At this point I have no valid IP, neither on the tun (just created) nor on the
ethernet.
So both interfaces have a 0.0.0.0 local address.
This is really a specific situation.
We could turn that SO_ option to a UDP specific thing. This is the only case
where this shit (sorry) has to happen. Other protocols will probably deal with
initialized local addresses, and that's no problem.
The problem only happen when BOTH conditions are true:
-Destination is INADDR_BROADCAST;
-Local address is INADDR_ANY.
Changing the init order: In my case, at boot time, this means waiting for the
end of DHCP negotiation to start the tunnels. I can do that as a temporary
workaround, but this is not really future proof...
Also, DHCP is renegotiated every time the ethernet wire is plugged in the
board... and I cant stop the tunnels every time the ethernet is disconnected,
then proceed to DHCP negotiation and wait this to restart the tunnels...
I understand that you dont want a UDP only feature, even if I wonder what would
be the use of this feature in other protocols.
Would that be possible to insert the "interface selection bypass" directly in
netdev_findbyaddr(), if the socket calling this is bound to a device? This way,
you could get support for all protocols 'at once'.
PS: There is another ugly workaround: Affect a random IP before DHCP negotiation
and use that to bind to the correct interface. But I really dont like that!
Please don't ask me to do that :)
I understand that you dont want a UDP only feature, even if I wonder what would
be the use of this feature in other protocols.There are also protocol-specific socket options which could be used that would effect only one protocol. All of the SOL_SOCKET socket options have global effect on all socket protocols. But SOL_UDP would effect only UDP sockets. The socket option could then be like UDP_BINDTODEVICE. UDP socket options would be defined in include/netinet/udp.h which does not yet exist.I will look around to see if there is any similar UDP socket options around.
Meanwhile please find attached a patch that reformat some comments that were longer than 80 chars per line.
Sebastien
--
You received this message because you are subscribed to the Google Groups "NuttX" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nuttx+un...@googlegroups.com.
Would that be possible to insert the "interface selection bypass" directly in
netdev_findbyaddr(), if the socket calling this is bound to a device? This way,
you could get support for all protocols 'at once'.
I agree with you, that's not the correct place. So what can we do? Implement the thing in the sendto() methods of protocols?
First we need a list of required changes.
It looks like it's not so complex. The presence of the option on a socket just says that the interface is predefined instead of being chosen via the normal way.
My opinion is that when the user defines this option on a socket, he becomes accountable for the new behaviour, even if things dont work as intended.
This option only makes real sense for udp broadcasts with no
source IP. In all other cases it is irrelevant since the local or
remote IP is enough to choose the interface, and normal routing
happens.
Do you have any other idea of required changes?
Sebastien
--
You received this message because you are subscribed to the Google Groups "NuttX" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nuttx+un...@googlegroups.com.
Can we agree on a way forward here? This is blocking my development at this
point (and an opportunity to showcase a working NuttX board at Nuit du Hack, a
french security/hacking convention).
At least can we start a branch and define this option in headers and setsockopt,
so we can start implementing it in all protocols?
I'm ready to follow whaterver path pleases you, but I need DHCP to work
correctly in this setup. The workarounds are not really practical. I can poke
code all around to make it work, but it's better to things correctly and officially.
You can use the TCP_KEEPALIVE protocol option as a model.
Can we agree on a way forward here?
Thanks, also for the TCP keepalive specific options. That's a good example.
OK, will do that in the UDP connection instead of the socket
OK, will do that in the UDP connection instead of the socket
unsigned if_nametoindex(const char *); char *if_indextoname(unsigned, char *); struct if_nameindex *if_nameindex(void); void if_freenameindex(struct if_nameindex *);
Will the lookup by index also fail if we delete an interface, then create a new
one? The index should reassigned, and valid again in this unfortunate case?
Better would be to store the full interface name and lookup that when required,
but it would be a pretty large loopup overhead each time we send data.
Can you please push a stub and declaration for if_nametoindex in the bindtodev branch?
I see that it will collide names with sim/src/up_tapdev.c line 226...
a7c1394d broke netdev_register, net_unlock() line 400 finds g_count equals to zero and asserts.
Probably because a lock() is missing in the new interface handling functions
Sebastien
--
You received this message because you are subscribed to the Google Groups "NuttX" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nuttx+un...@googlegroups.com.
I meant commit 7c1394d8
Sebastien
found it, fix will be included in incoming patch.
Meanwhile:
Interface is going UP
udp_setsockopt: UDP_BINDTODEVICE: conn 20005ce8 interface eth0,
index 1
Starting DHCP request
psock_udp_sendto: conn 20005ce8: Retrieving interface via
BOUNDTODEVICE option (index 1 -> dev 0)
psock_udp_sendto: ERROR: udp_find_raddr_device failed
DHCP request failed: -1 errno 114
We're close, but netdev_findbyindex does not return eth0 for index 1. Looking into this now.
Sebastien
IT WORKS!
Interface is going UP
udp_setsockopt: UDP_BINDTODEVICE: conn 20005ce8 interface eth0,
index 1
Starting DHCP request
psock_udp_sendto: conn 20005ce8: Retrieving interface via
BOUNDTODEVICE option (index 1 -> dev 20000ff8)
*** Launching nsh
NuttShell (NSH)
nsh> psock_udp_sendto: conn 20005ce8: Retrieving interface via
BOUNDTODEVICE option (index 1 -> dev 20000ff8)
IP addr: 192.168.0.19
Net mask: 255.255.255.0
Default router: 192.168.0.254
Please find patch attached
-fixes two typos introduced by commit 7c1394d8
-define and use UDP_BINDTODEVICE option in UDP psock_send_*
I have left two generic _info() calls so you can easily test the
option was activated and used. You can remove them.
Sebastien
To unsubscribe from this group and stop receiving emails from it, send an email to nuttx+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "NuttX" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nuttx+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "NuttX" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nuttx+unsubscribe@googlegroups.com.
I dont remember, sorry a million times...
I did that to (try to) match what linux does.
But I think I was confused because the the man 7 socket page talks about reception, not transmission. Sorry about that :(
Thanks for your help.
Sebastien
To unsubscribe from this group and stop receiving emails from it, send an email to nuttx+un...@googlegroups.com.
There are a few things that I disagree with in this patch. I will incorporate the the change. For example, as we discussd before, sendo address must take precedence over the bound device address. Didn't we agree to that? There is another change that looks odd to me but I will need to study the code more first.
Hello
you changes are good, I have chosen -1 thinking that interface 0 was valid, which it is not. 0 is a better "non enabled" value.
But it does not work. The interface is bound, but DHCP still
fails because the interface is resolved to null.
If we dont do that, the effect of bindtodevice is useless except
when broadcasting and the socket is bound to INADDR_ANY.
The point of this option is to force the device whatever the source/destination address would select. This is an override.
But it seems that you dont want this override. That's fine for me, the important use case still works.--
You received this message because you are subscribed to the Google Groups "NuttX" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nuttx+un...@googlegroups.com.
See this patch against current master. You have discarded this important change: itf indexes are ONE based, but this function assumed ZERO based.
Also I dont agree (but it does not matter): I insist that the bound device should be tried before the search by remote address.If we dont do that, the effect of bindtodevice is useless except when broadcasting and the socket is bound to INADDR_ANY.
The point of this option is to force the device whatever the source/destination address would select. This is an override.
But it seems that you dont want this override. That's fine for me, the important use case still works.
Hello,
Please see my replies below.
Hi, Sebastien,
See this patch against current master. You have discarded this important change: itf indexes are ONE based, but this function assumed ZERO based.
I still don't understand it. The index should be one besed everywhere in the system, but converted to zero-based only before accessing the set. Which function is assuming zero based?
Also I dont agree (but it does not matter): I insist that the bound device should be tried before the search by remote address.If we dont do that, the effect of bindtodevice is useless except when broadcasting and the socket is bound to INADDR_ANY.
The point of this option is to force the device whatever the source/destination address would select. This is an override.
But it seems that you dont want this override. That's fine for me, the important use case still works.
Inconsistent implementations is what we get when we go with non-standard, kludge interfaces. I think there will be inconsistencies no matter what you do.
The sendto() syntax permits UDP to send datagrams to any available network through the device serving network. If you send the device through the wrong network, then you have screwed up. Other than your particular usage. I can't see any general use for this socket option.
I prefer to keep it out of the way of the address-based packet flow.
Greg
The bit test in g_devset requires a zero based index, otherwise bit 0 will never be used (since interface indexes start at 1)