Re: BUG: NM only connects the third time



On Fri, Nov 02, 2007 at 07:19:05AM -0400, Dan Williams wrote:
> NM ignores link changes during device activation.  This is for a number
> of reasons.  The first is driver variance, and because during the
> activation, the device may bounce up and down due to dhclient starting
> up, or whatever.  Drivers are quite finicky; rtl8189 are quite notorious
> for link bouncing.
> 
> This may have gotten better as drivers get better over time, and may no
> longer be the appropriate behavior.  At a minimum, a link timer needs to
> be started that would wait 5 seconds and smooth out link state before
> terminating the activation.

Ah yes, I understand that. It make sense to not interrupt an ongoing
connexion, just in case the link information is wrong. But I am still
confused by the logs: after the line "Activation (eth0): cancelled",
dhclient is still running. Is that expected ? NM announces it is giving
up but goes on trying just in case ? And this dhclient is still running
while NM is connecting through wlan0, and even after it managed to
connect through wlan0 !

Maybe it is the expected behaviour (let's just run dhclient on both
interfaces and see who's winning), but then the messages in the logs are
confusing and NM should not announce that it cancelled eth0.

> > 11:06:53 NM: <info>  Activation (eth0) Beginning DHCP transaction.
> > 11:06:53 NM: <info>  DHCP daemon state is now 12 (successfully started) for interface eth0
> > 11:06:53 NM: <info>  Activation (eth0) Stage 3 of 5 (IP Configure Start) complete.
> > 11:06:53 NM: <info>  nm-device-802-3-ethernet.c - link_deactivated_helper (129) device eth0 will set active link to FALSE
> > 11:06:53 NM: <info>  nm-device.c - nm_device_set_active_link (596) device eth0 link state set to 0
> > 11:06:53 NM: <info>  SWITCH: terminating current connection 'eth0' because it's no longer valid.
> > 11:06:53 NM: <info>  Deactivating device eth0.
> > 11:06:53 NM: <info>  Activation (eth0): cancelling...
> > 11:06:53 NM: <info>  Activation (eth0) cancellation handler scheduled...
> > 11:06:53 NM: <info>  Activation (eth0): waiting for device to cancel activation.
> > 11:06:54 NM: <info>  Activation (eth0) cancellation handled.
> > 11:06:54 NM: <info>  Activation (eth0): cancelled.
> > 11:06:54 NM: nm_device_is_802_3_ethernet: assertion `dev != NULL' failed
> > 11:06:54 NM: nm_device_is_802_11_wireless: assertion `dev != NULL' failed
> > 11:06:54 dhclient: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 6
> > 11:07:00 dhclient: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 10
> > 11:07:10 dhclient: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 13
> > 11:07:23 dhclient: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 21
> > 11:07:30 NM: <info>  Updating allowed wireless network lists.
> > 
> > 	[... from here, NM switchs to wlan0, launches a dhclient on the wifi and gets my connexion, but one can still find in the logs: ...]
> > 
    11:07:38 NM: <info>  Activation (wlan0) successful, device activated.
> > 11:07:44 dhclient: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 8
> > 11:07:52 dhclient: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 3
> > 11:07:55 dhclient: No DHCPOFFERS received.

But again, it is not very important.

A last thing and I'll stop bothering you, a usability request.

As I understand, NM keeps a memory of Access Points that the wifi card
found around, and it doesn't drop them at once when the card stop
reporting them, just in case they reappear.

For this reason, it makes sense to stop the connexion before suspend and
reestablish it after resume, because otherwise NM would stay connected to
the AP it had before suspending, until he'd realize the AP is no longer
around, and this could take about 30 seconds, without a valid connection.

However, because of the suspend/wake mechanism, after each resume NM has
to reestablish a connexion, which takes around 5 seconds. That's 5
seconds garanteed without connexion, to avoid the possibility of having
30 seconds without if the laptop moved during the suspend. I understand
the logic, even if I am not sure it is a good deal.

[I hope I have well understood the problem and am not talking rubbish]

I was wondering if there was a way to make the best of both worlds: the
suspend method of NM could be made a NOP, so that the connexion is not
shut down at suspend. At resume time, when NM receives the wake method,
it could check at once the existing networks around and try to determine,
by comparing the previous list and the new list, if the laptop moved or
not during the suspend. If it moved, get a new connection. If it didn't,
keep the existing good connection.

This way, a laptop moving during the suspend gets a 5 second delay on
resume without connectivity (the time NM takes to connect to a new AP),
and the laptop which didn't move has instantaneously a correct connexion.

Does it make sense ?

	Éric


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]