Re: supplicant interface scan state tracking



On Tue, 2009-07-07 at 12:15 -0400, Dan Williams wrote:
> On Mon, 2009-07-06 at 22:16 +0100, Daniel Drake wrote:
> > Hi Dan,
> > 
> > I'm finding it quite easy to reproduce a bug related to
> > nm_supplicant_interface_get_scanning()
> > but I'm not sure how to fix it.
> > 
> > The logic implemented in my OLPC mesh device so-far is that if the
> > companion device is scanning, it postpones stage2 until the scanning has
> > finished.
> > 
> > It does this by monitoring a new "scanning" property on NMDeviceWifi
> > which is implemented based on nm_supplicant_interface_get_scanning().
> > http://dev.laptop.org/git/users/dsd/NetworkManager/commit/?h=olpc&id=111baac88318f4db467360fd9703f37ac0449023
> > 
> > Also, if a connection on eth0 is active when you activate a msh0
> > connection, msh0 moves eth0 into NM_DEVICE_STATE_DISCONNECTED and
> > disables autoconnect (via the mechanism in the patch I emailed earlier).
> > 
> > Anyway, I can now easily reproduce the following sequence of events
> > causing nm_supplicant_interface_get_scanning() to be less than truthful
> > and cause a deadlock:
> > 
> > to start with, an eth0 connection is activating:
> > 
> > <info>  Activation (eth0) Stage 2 of 5 (Device Configure) complete.
> > <info>  Config: set interface ap_scan to 1
> > 
> > at this point, inside NMDeviceOlpcMeshPrivate, "scanning" is TRUE and
> > con_state = SCANNING (I know this through some debug messages)
> > 
> > <info>  (eth0): supplicant connection state:  disconnected -> scanning
> > 
> > but I interrupt it here by starting a mesh connection
> > 
> > <info>  Activation (msh0) starting connection 'olpc-mesh-1'
> > <info>  (msh0): device state change: 3 -> 4 (reason 0)
> > <info>  Activation (msh0) Stage 1 of 5 (Device Prepare) scheduled...
> > <info>  Activation (msh0) Stage 1 of 5 (Device Prepare) started...
> > 
> > msh0 now disconnects eth0
> > 
> > <info>  (eth0): device state change: 5 -> 3 (reason 2)
> > <info>  (eth0): deactivating device (reason: 2).
> > <info>  Activation (msh0) Stage 1 of 5 (Device Prepare) complete.
> > 
> > At this point, another dbus signal comes in from wpa_supplicant so
> > "scanning" moves to FALSE. This wakes up msh0 device which calls
> > nm_supplicant_interface_get_scanning() to figure out the new state, but
> > this returns TRUE because con_state is still SCANNING, so msh0 does not
> > continue the connection process and everything stops.
> 
> It could be a signal ordering issue.  The code in
> nm-supplicant-interface.c is:
> 
> 	if (priv->scanning)
> 		return TRUE;
> 	if (priv->con_state == NM_SUPPLICANT_INTERFACE_CON_STATE_SCANNING)
> 		return TRUE;
> 
> So only if both of these are FALSE will
> nm_supplicant_interface_get_scanning() return FALSE.  Each of these
> variables is independently set, priv->scanning is based off the
> supplicant's 'scanning' property, and priv->con_state is based off the
> 'state' property.
> 
> So what you're probably getting into here is that the 'scanning'
> property is being set to FALSE in the supplicant (and D-Bus listeners
> are notified too) before the State property is being updated.
> 
> The way to fix that is to make NMSupplicantInterface's scanning property
> consistent with nm_supplicant_interface_get_scanning() by only flipping
> it to FALSE when both the supplicant's State and Scanning properties
> indicate scanning is not happening.
> 
> So you'd add a private member to the NMSupplicantInterfacePrivate struct
> for 'gboolean supplicant_scanning' and replace all usage of
> priv->scanning with priv->supplicant_scanning.

Well, not all usage...  get_property() and set_property() should use
priv->scanning.  What we're trying to do here is split priv->scanning
into two parts: priv->scanning would be the object property that
represents the combination of the supplicant's scanning and state
properties, while priv->supplicant_scanning would be a direct mirror of
the supplicants scanning property for internal NMSupplicantInterface use
only.

Dan

> Then you'd create a new function called wpas_iface_check_scanning() that
> would basically do this:
> 
> static void
> wpas_iface_check_scanning (NMSupplicantInterface *self)
> {
> 	NMSupplicantInterfacePrivate *priv = NM_SUPPLICANT_INTERFACE_GET_PRIVATE (self);
> 	gboolean new_scanning = FALSE;
> 
> 	if (   priv->supplicant_scanning
> 	    || priv->con_state == NM_SUPPLICANT_INTERFACE_CON_STATE_SCANNING)
> 		new_scanning = TRUE;
> 
> 	if (new_scanning != priv->scanning) {
> 		priv->scanning = new_scanning;
> 		g_object_notify (G_OBJECT (self), "scanning");
> 	}
> }
> 
> and you'd call that function from wherever priv->supplicant_scanning and
> priv->con_state got changed (iface_scanning_cb,
> wpas_iface_handle_scanning, wpas_iface_handle_state_change,
> iface_state_cb).
> 
> That should do the trick.
> 
> Dan
> 
> > What confuses me a little here is that the supplicant is still alive
> > and
> > running, even though there aren't any active connections. It did also
> > manage to raise a dbus signal indicating the termination of the scan
> > *after* NM sent the disconnection request, but it did not manage to
> > communicate any change in con_state. Also I cannot connect to it with
> > wpa_cli to see if is still in SCANNING state.
> > 
> > Thoughts?
> > 
> > Thanks,
> > Daniel
> > 
> > 
> 
> _______________________________________________
> NetworkManager-list mailing list
> NetworkManager-list gnome org
> http://mail.gnome.org/mailman/listinfo/networkmanager-list



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]