Re: What happens after a timeout during connection ?



On Thu, 2020-10-22 at 15:51 +0200, Guillaume Betous wrote:
Hello everyone,

I'm working on a connected device, and we are producing
hundred/thousand of them. The system is based on Debian 9 on armhf.
We use NetworkManager to 
allow the device to automatically connect on WiFi. Usually the client
AP is down, and, for maintenance reason, the client can open the
dedicated SSID 
and our device is connecting, then offers an HTTP interface.

It works perfectly 99% of time.

But sometimes, on a brand newly flashed device, it can't connect to
the specified WiFi.

Every time I saw that, I used an UART connection to manually log, and
a simple "nmcli conn up <conn_name>" makes the client connecting
immediately, so 
it seems that my configuration file is correct. After I manually
connected, rebooting the device makes it reconnect every time, as
expected.

I was not able to systematically investigate on every failing device
but I could saw on journalctl the following logs (MAC and SSID are
anonymized)

sept. 14 15:02:20 2004000276 NetworkManager[589]:
<info>  [1536930140.1218] Config: added 'ssid' value 'xxxxxx'
sept. 14 15:02:20 2004000276 NetworkManager[589]:
<info>  [1536930140.1237] Config: added 'scan_ssid' value '1'
sept. 14 15:02:20 2004000276 NetworkManager[589]:
<info>  [1536930140.1286] Config: added 'key_mgmt' value 'WPA-PSK'
sept. 14 15:02:20 2004000276 NetworkManager[589]:
<info>  [1536930140.1288] Config: added 'psk' value '<hidden>'
sept. 14 15:02:20 2004000276 NetworkManager[589]:
<info>  [1536930140.2031] device (wlan0): supplicant interface state:
inactive -> scanning
sept. 14 15:02:45 2004000276 NetworkManager[589]:
<warn>  [1536930165.4771] device (wlan0): Activation: (wifi)
association took too long, failing 
activation
sept. 14 15:02:45 2004000276 NetworkManager[589]:
<info>  [1536930165.4772] device (wlan0): state change: config ->
failed (reason 'ssid-not-found') 
[50 120 53]
sept. 14 15:02:45 2004000276 NetworkManager[589]:
<info>  [1536930165.4784] manager: NetworkManager state is now
DISCONNECTED
sept. 14 15:02:45 2004000276 NetworkManager[589]:
<warn>  [1536930165.4977] device (wlan0): Activation: failed for
connection 'xxxxxx'
sept. 14 15:02:45 2004000276 NetworkManager[589]:
<info>  [1536930165.5109] device (wlan0): supplicant interface state:
scanning -> disconnected
sept. 14 15:02:45 2004000276 NetworkManager[589]:
<info>  [1536930165.5375] device (wlan0): state change: failed ->
disconnected (reason 'none') [120 
30 0]
sept. 14 15:02:45 2004000276 NetworkManager[589]:
<info>  [1536930165.5546] device (wlan0): set-hw-addr: set MAC
address to yy:yy:yy:D4:0F:70 (scanning)
sept. 14 15:02:45 2004000276 NetworkManager[589]:
<info>  [1536930165.5835] device (wlan0): supplicant interface state:
disconnected -> disabled
sept. 14 15:02:45 2004000276 NetworkManager[589]:
<info>  [1536930165.5865] device (wlan0): supplicant interface state:
disabled -> disconnected

It seems to me that for some reason (maybe my AP, maybe the WiFi
component..) the connection is having a timeout, but after that
NetworkManager is not 
retrying.

- Is there a kind of blacklist of non functional connections/APs ?
- Is there a timeout which after expiration NetworkManager will
reconnect ?
- Any advice to improve connection reliability ?

Thank you very much for your advice,



Hi,


It would be better to look at the full debug logs.

Enable level=TRACE for NetworkManager (seee [1]) and pass "-ddd" to
wpa_supplicant.

[1] https://cgit.freedesktop.org/NetworkManager/NetworkManager/tree/contrib/fedora/rpm/NetworkManager.conf#n28


If the password is wrong, then autoactivation gets blocked indefinitely (or
until a new secret agent connects who could provide the password). That is a problem,
particularly because sometimes wpa_supplicant (wrongly) thinks that the PSK
is wrong. Optimally, those cases where wpa_supplicant makes a wrong guess should be fixed.
But realistically, it would need a workaround in NetworkManager. Maybe the profile
should have a property "always-assume-the-secret-is-correct". Or maybe the blocking should
only be for a certain time (e.g. 5 minutes). Dunno.

But first it requires to understand why it doesn't autoconnect anymore. Please check the logs.


best,
Thomas

Attachment: signature.asc
Description: This is a digitally signed message part



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]