On Thu, 2020-10-22 at 15:51 +0200, Guillaume Betous wrote:
Hello everyone, I'm working on a connected device, and we are producing hundred/thousand of them. The system is based on Debian 9 on armhf. We use NetworkManager to allow the device to automatically connect on WiFi. Usually the client AP is down, and, for maintenance reason, the client can open the dedicated SSID and our device is connecting, then offers an HTTP interface. It works perfectly 99% of time. But sometimes, on a brand newly flashed device, it can't connect to the specified WiFi. Every time I saw that, I used an UART connection to manually log, and a simple "nmcli conn up <conn_name>" makes the client connecting immediately, so it seems that my configuration file is correct. After I manually connected, rebooting the device makes it reconnect every time, as expected. I was not able to systematically investigate on every failing device but I could saw on journalctl the following logs (MAC and SSID are anonymized) sept. 14 15:02:20 2004000276 NetworkManager[589]: <info> [1536930140.1218] Config: added 'ssid' value 'xxxxxx' sept. 14 15:02:20 2004000276 NetworkManager[589]: <info> [1536930140.1237] Config: added 'scan_ssid' value '1' sept. 14 15:02:20 2004000276 NetworkManager[589]: <info> [1536930140.1286] Config: added 'key_mgmt' value 'WPA-PSK' sept. 14 15:02:20 2004000276 NetworkManager[589]: <info> [1536930140.1288] Config: added 'psk' value '<hidden>' sept. 14 15:02:20 2004000276 NetworkManager[589]: <info> [1536930140.2031] device (wlan0): supplicant interface state: inactive -> scanning sept. 14 15:02:45 2004000276 NetworkManager[589]: <warn> [1536930165.4771] device (wlan0): Activation: (wifi) association took too long, failing activation sept. 14 15:02:45 2004000276 NetworkManager[589]: <info> [1536930165.4772] device (wlan0): state change: config -> failed (reason 'ssid-not-found') [50 120 53] sept. 14 15:02:45 2004000276 NetworkManager[589]: <info> [1536930165.4784] manager: NetworkManager state is now DISCONNECTED sept. 14 15:02:45 2004000276 NetworkManager[589]: <warn> [1536930165.4977] device (wlan0): Activation: failed for connection 'xxxxxx' sept. 14 15:02:45 2004000276 NetworkManager[589]: <info> [1536930165.5109] device (wlan0): supplicant interface state: scanning -> disconnected sept. 14 15:02:45 2004000276 NetworkManager[589]: <info> [1536930165.5375] device (wlan0): state change: failed -> disconnected (reason 'none') [120 30 0] sept. 14 15:02:45 2004000276 NetworkManager[589]: <info> [1536930165.5546] device (wlan0): set-hw-addr: set MAC address to yy:yy:yy:D4:0F:70 (scanning) sept. 14 15:02:45 2004000276 NetworkManager[589]: <info> [1536930165.5835] device (wlan0): supplicant interface state: disconnected -> disabled sept. 14 15:02:45 2004000276 NetworkManager[589]: <info> [1536930165.5865] device (wlan0): supplicant interface state: disabled -> disconnected It seems to me that for some reason (maybe my AP, maybe the WiFi component..) the connection is having a timeout, but after that NetworkManager is not retrying. - Is there a kind of blacklist of non functional connections/APs ? - Is there a timeout which after expiration NetworkManager will reconnect ? - Any advice to improve connection reliability ? Thank you very much for your advice,
Hi, It would be better to look at the full debug logs. Enable level=TRACE for NetworkManager (seee [1]) and pass "-ddd" to wpa_supplicant. [1] https://cgit.freedesktop.org/NetworkManager/NetworkManager/tree/contrib/fedora/rpm/NetworkManager.conf#n28 If the password is wrong, then autoactivation gets blocked indefinitely (or until a new secret agent connects who could provide the password). That is a problem, particularly because sometimes wpa_supplicant (wrongly) thinks that the PSK is wrong. Optimally, those cases where wpa_supplicant makes a wrong guess should be fixed. But realistically, it would need a workaround in NetworkManager. Maybe the profile should have a property "always-assume-the-secret-is-correct". Or maybe the blocking should only be for a certain time (e.g. 5 minutes). Dunno. But first it requires to understand why it doesn't autoconnect anymore. Please check the logs. best, Thomas
Attachment:
signature.asc
Description: This is a digitally signed message part