Re: supplicant interface scan state tracking



On Mon, 2009-07-06 at 22:16 +0100, Daniel Drake wrote:
> Hi Dan,
> 
> I'm finding it quite easy to reproduce a bug related to
> nm_supplicant_interface_get_scanning()
> but I'm not sure how to fix it.
> 
> The logic implemented in my OLPC mesh device so-far is that if the
> companion device is scanning, it postpones stage2 until the scanning has
> finished.
> 
> It does this by monitoring a new "scanning" property on NMDeviceWifi
> which is implemented based on nm_supplicant_interface_get_scanning().
> http://dev.laptop.org/git/users/dsd/NetworkManager/commit/?h=olpc&id=111baac88318f4db467360fd9703f37ac0449023
> 
> Also, if a connection on eth0 is active when you activate a msh0
> connection, msh0 moves eth0 into NM_DEVICE_STATE_DISCONNECTED and
> disables autoconnect (via the mechanism in the patch I emailed earlier).
> 
> Anyway, I can now easily reproduce the following sequence of events
> causing nm_supplicant_interface_get_scanning() to be less than truthful
> and cause a deadlock:
> 
> to start with, an eth0 connection is activating:
> 
> <info>  Activation (eth0) Stage 2 of 5 (Device Configure) complete.
> <info>  Config: set interface ap_scan to 1
> 
> at this point, inside NMDeviceOlpcMeshPrivate, "scanning" is TRUE and
> con_state = SCANNING (I know this through some debug messages)
> 
> <info>  (eth0): supplicant connection state:  disconnected -> scanning
> 
> but I interrupt it here by starting a mesh connection
> 
> <info>  Activation (msh0) starting connection 'olpc-mesh-1'
> <info>  (msh0): device state change: 3 -> 4 (reason 0)
> <info>  Activation (msh0) Stage 1 of 5 (Device Prepare) scheduled...
> <info>  Activation (msh0) Stage 1 of 5 (Device Prepare) started...
> 
> msh0 now disconnects eth0
> 
> <info>  (eth0): device state change: 5 -> 3 (reason 2)
> <info>  (eth0): deactivating device (reason: 2).
> <info>  Activation (msh0) Stage 1 of 5 (Device Prepare) complete.
> 
> At this point, another dbus signal comes in from wpa_supplicant so
> "scanning" moves to FALSE. This wakes up msh0 device which calls
> nm_supplicant_interface_get_scanning() to figure out the new state, but
> this returns TRUE because con_state is still SCANNING, so msh0 does not
> continue the connection process and everything stops.

It could be a signal ordering issue.  The code in
nm-supplicant-interface.c is:

	if (priv->scanning)
		return TRUE;
	if (priv->con_state == NM_SUPPLICANT_INTERFACE_CON_STATE_SCANNING)
		return TRUE;

So only if both of these are FALSE will
nm_supplicant_interface_get_scanning() return FALSE.  Each of these
variables is independently set, priv->scanning is based off the
supplicant's 'scanning' property, and priv->con_state is based off the
'state' property.

So what you're probably getting into here is that the 'scanning'
property is being set to FALSE in the supplicant (and D-Bus listeners
are notified too) before the State property is being updated.

The way to fix that is to make NMSupplicantInterface's scanning property
consistent with nm_supplicant_interface_get_scanning() by only flipping
it to FALSE when both the supplicant's State and Scanning properties
indicate scanning is not happening.

So you'd add a private member to the NMSupplicantInterfacePrivate struct
for 'gboolean supplicant_scanning' and replace all usage of
priv->scanning with priv->supplicant_scanning.

Then you'd create a new function called wpas_iface_check_scanning() that
would basically do this:

static void
wpas_iface_check_scanning (NMSupplicantInterface *self)
{
	NMSupplicantInterfacePrivate *priv = NM_SUPPLICANT_INTERFACE_GET_PRIVATE (self);
	gboolean new_scanning = FALSE;

	if (   priv->supplicant_scanning
	    || priv->con_state == NM_SUPPLICANT_INTERFACE_CON_STATE_SCANNING)
		new_scanning = TRUE;

	if (new_scanning != priv->scanning) {
		priv->scanning = new_scanning;
		g_object_notify (G_OBJECT (self), "scanning");
	}
}

and you'd call that function from wherever priv->supplicant_scanning and
priv->con_state got changed (iface_scanning_cb,
wpas_iface_handle_scanning, wpas_iface_handle_state_change,
iface_state_cb).

That should do the trick.

Dan

> What confuses me a little here is that the supplicant is still alive
> and
> running, even though there aren't any active connections. It did also
> manage to raise a dbus signal indicating the termination of the scan
> *after* NM sent the disconnection request, but it did not manage to
> communicate any change in con_state. Also I cannot connect to it with
> wpa_cli to see if is still in SCANNING state.
> 
> Thoughts?
> 
> Thanks,
> Daniel
> 
> 



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]