On Fri, Aug 09, 2019 at 05:03:34PM +0200, Vincent Lefevre wrote:
On 2019-08-09 16:05:11 +0200, Beniamino Galvani wrote:in the traces I see that there are 3 servers and one of them advertises a subnet different from other two. This setup makes the behavior non-deterministic because clients can get an address either in the 10.0.1.0/24 or in the 140.77.12.0/23 network. Do you know if the network configured in this way on purpose?I think so, as there are 2 kinds of machines: those that are supposed to have a fixed IP address on the main network, and the other machines, which will be on a secondary network. My machine is in the former class. I don't know how such machines are supposed to be identified (probably with a weak identification), but I can see my machine name in the DHCP Discover and DHCP Request packets.
Ok, then this mechanism is not working since your machine is receiving offers for both networks and randomly takes one. I don't know how common this setup is, but the RFC seems to warn against it: Once the network address and lease have been determined, the server constructs a DHCPOFFER message with the offered configuration parameters. It is important for all DHCP servers to return the same parameters (with the possible exception of a newly allocated network address) to ensure predictable client behavior regardless of which server the client selects.
Looking at dhcp-int-failure.pcap, there is an offer from 140.77.1.11: [...] to which the internal client replies with a request. Note the server-id set to 140.77.1.11: [...] The DHCP server at 10.0.1.1 NAKs the request even if it had a different server-id; I don't think this is correct: [...]RFC 2131 says: "If a server receives a DHCPREQUEST message with an invalid 'requested IP address', the server SHOULD respond to the client with a DHCPNAK message and may choose to report the problem to the system administrator." So this seems correct. Note that it does not say that the server must check the server-id, and the fact that it says "a server" instead of "the server" tends to make me think that this is how it works. BTW, if the server implicitly needs to check the server-id, why doesn't the internal client do this too about the DHCPNAK response?
This part of RFC 2131 is quite clear in my opinion: 3. The client receives one or more DHCPOFFER messages from one or more servers. The client may choose to wait for multiple responses. The client chooses one server from which to request configuration parameters, based on the configuration parameters offered in the DHCPOFFER messages. The client broadcasts a DHCPREQUEST message that MUST include the 'server identifier' option to indicate which server it has selected, and that MAY include other options specifying desired configuration values. [...] 4. The servers receive the DHCPREQUEST broadcast from the client. Those servers not selected by the DHCPREQUEST message use the message as notification that the client has declined that server's offer. The server selected in the DHCPREQUEST message commits the binding for the client to persistent storage and responds with a DHCPACK message containing the configuration parameters for the requesting client. [...] So the behavior of server 10.0.1.1 seems not compliant to RFC.
Also, RFC 2131 says that the "If the client receives a DHCPNAK message, the client restarts the configuration process", that is what the internal client does, until the ACK comes before or until timeout. dhclient apparently ignores the NAK, but I haven't found yet in the code where this is done and based on what.It seems that RFC 2131 has some contradictions in case of several DHCP servers on several networks. IMHO, the client should be tolerant and ignore DHCPNAK if the server-id is different.
I checked again and the internal client doesn't do any filtering based on the server-id. In the dhclient log at: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=933930#42 Aug 06 15:48:15 cventin NetworkManager[797]: <info> [1565099295.6037] dhcp4 (enp0s25): dhclient started with pid 1052 Aug 06 15:48:15 cventin dhclient[1052]: DHCPREQUEST for 140.77.13.17 on enp0s25 to 255.255.255.255 port 67 Aug 06 15:48:21 cventin dhclient[1052]: DHCPREQUEST for 140.77.13.17 on enp0s25 to 255.255.255.255 port 67 Aug 06 15:48:21 cventin dhclient[1052]: DHCPNAK from 10.0.1.1 Aug 06 15:48:21 cventin NetworkManager[797]: <info> [1565099301.7956] dhcp4 (enp0s25): state changed unknown -> expire Aug 06 15:48:21 cventin NetworkManager[797]: <info> [1565099301.8037] dhcp4 (enp0s25): state changed expire -> unknown Aug 06 15:48:21 cventin dhclient[1052]: DHCPDISCOVER on enp0s25 to 255.255.255.255 port 67 interval 4 Aug 06 15:48:21 cventin dhclient[1052]: DHCPOFFER of 140.77.13.17 from 140.77.1.12 Aug 06 15:48:21 cventin dhclient[1052]: DHCPREQUEST for 140.77.13.17 on enp0s25 to 255.255.255.255 port 67 Aug 06 15:48:21 cventin dhclient[1052]: DHCPACK of 140.77.13.17 from 140.77.1.12 the transaction succeeds because the ACK comes before any NAK, which is the same thing it happens when the transaction succeeds with the internal client. Perhaps could you capture other logs with dhclient too see how it handles multiple NAK? Thank you Beniamino
Attachment:
signature.asc
Description: PGP signature