Re: NM waiting too long to collect AP scan results?



On Tue, 2009-01-27 at 13:06 -0500, Dan Langlois wrote:
> I have recently installed Ubuntu 8.04, Hardy Heron, on my laptop.
> This is
> my first exposure to NetworkManager.  For the purposes of this
> discussion,
> I always connect to the same WPA wifi AP, named "darknet".  At any
> given
> moment 2 or 3 other WEP APs, and 1 or 2 unprotected APs are visible,
> but
> I don't connect to them.  My wifi hardware is an Intel 4965agn using
> the
> iwl4965 driver.

I guess there's no particular reason 9 seconds can't be used instead of
10 in the non-supplicant scan case.  We "fixed" this in 0.7 by always
using the supplicant to scan, since it's pointless to have scanning code
in two different places.

The "batching" behavior you saw in wireless_event_handler() was added
because ipw2200/2915 and some other cards background scan and
continuously emit scan events.  I posted a patch to ipw2200/2915 about 2
years ago that ratelimits these events to every 4 seconds in the kernel,
which is why the timeout in wireless_event_handler() is 5 seconds.

The issue here is that you do want *some* batching of scan results,
because there are drivers that emit scan events during scan processing,
because some older drivers will clear their scan list when a client asks
for access points.  Thus, you don't want to ask too often or you might
not get a complete set of APs.

However, you could try to modify wireless_event_handler() to, instead of
only scheduling the scan result timeout when there isn't one already
scheduled, cancel the outstanding scan results timeout *if* it will
trigger later than now + 5 seconds.

Dan

> Occasionally I'd boot up to find NM disconnected.  When this happens,
> APs
> are visible, but not darknet in particular.  If I wait long enough
> (over
> 20min sometimes), eventually NM finds it and connects.  This is way
> too
> long to wait for an AP that I know is up and running and has a strong
> signal where I'm sitting.
> 
> I started to feel that NM needed a manual "rescan" feature.  Googling
> this
> I found discussions from 2006 about this, including comments from
> Dan Williams about how just touching the nm-applet should schedule a
> rescan.
> (http://mail.gnome.org/archives/networkmanager-list/2006-November/msg00127.html)
> So why was it not finding darknet?
> 
> I decided to get the source and experiment (given that I use HH,
> I had to get the v0.6.6 source).  What I found was that while in the
> disconnected state, NM was indeed still scanning, but it just couldn't
> seem
> to find darknet.  So I dug a little deeper into the scanning guts of
> NM.
> 
> Already you might be thinking I should use a more recent version of
> NM.
> However, I'm not sure that would work with Ubuntu HH.  Since HH is the
> Long Term Support version of Ubuntu, I thought I'd look into fixing it
> up.  This is why I'm investigating the older source and am keen to
> improve it.
> 
> I'll skip along here as my email is getting long, my hypothesis is
> this:
> NM waits too long from the time it initiates the scan.  I believe the
> driver (or more precisely, the Wireless Extensions (WE) code) is
> expiring
> the scan result before NM gets them.
> 
> I propose that NM should read and process scan results sooner.  The
> current time of 10 seconds is right on the threshold of results
> getting expired (as I understand it).
> 
> I'm no NM or WE expert, so I'll leave Dan Williams and other experts
> to mull over my findings.  Here are the details.
> 
> I added some nm_info()'s to NM source to see what was happening:
> 
> Jan 22 14:43:45 talon NetworkManager: <info>  dan, wlan0 iw scan
> issued
> Jan 22 14:43:47 talon NetworkManager: <info>  dan, helper buf:8
> SIOCGIWSCAN
> Jan 22 14:43:55 talon NetworkManager: <info>  dan,
> schedule_scan(0x80a0110,20000ms)
> Jan 22 14:43:55 talon NetworkManager: <info>  dan,
> process_scan_results(0x80a0110,0x80a1800,211) WE version 22
> Jan 22 14:43:55 talon NetworkManager: <info>  dan, found AP 'BELL669'
> ret=0
> 
> Jan 22 14:44:15 talon NetworkManager: <info>  dan, wlan0 iw scan
> issued
> Jan 22 14:44:17 talon NetworkManager: <info>  dan, helper buf:8
> SIOCGIWSCAN
> Jan 22 14:44:25 talon NetworkManager: <info>  dan,
> schedule_scan(0x80a0110,120000ms)
> 
> Jan 22 14:46:25 talon NetworkManager: <info>  dan, wlan0 iw scan
> issued
> Jan 22 14:46:27 talon NetworkManager: <info>  dan, helper buf:8
> SIOCGIWSCAN
> Jan 22 14:46:35 talon NetworkManager: <info>  dan,
> schedule_scan(0x80a0110,120000ms)
> Jan 22 14:48:35 talon NetworkManager: <info>  dan, wlan0 iw scan
> issued
> Jan 22 14:48:37 talon NetworkManager: <info>  dan, helper buf:8
> SIOCGIWSCAN
> Jan 22 14:48:45 talon NetworkManager: <info>  dan,
> schedule_scan(0x80a0110,120000ms)
> Jan 22 14:48:45 talon NetworkManager: <info>  dan,
> process_scan_results(0x80a0110,0x80a1800,422) WE version 22
> Jan 22 14:48:45 talon NetworkManager: <info>  dan, found AP 'BELL479'
> ret=1
> Jan 22 14:48:45 talon NetworkManager: <info>  dan, found AP 'BELL669'
> ret=0
> 
> 
> So scans are triggered via iw_set_ext() and we setup to collect the
> results
> in 10 seconds with schedule_scan_results_timeout(self, 10).
> 
> 2 seconds later, wireless_event_helper() is called with SIOCGIWSCAN
> indicating results are ready.  But the code there doesn't respond
> right away, clamining that this might be the first of a burst of
> such messages.  So instead we continue to wait the full 10 seconds.
> 
> At the 10 second mark, results are fetched using iw_get_ext() and the
> next scan is scheduled.  Results (if any) are then processed.
> In the log above you can see that some scans had no results, others
> had just 1 or two.  None had darknet.
> 
> Curious about how iw_set_ext() and iw_get_ext() worked, I dug a
> little and found this interesting bit in net/mac80211/ieee80211_sta.c
> function ieee80211_sta_scan_result():
> 
> if (time_after(jiffies, bss->last_update +
> IEEE80211_SCAN_RESULT_EXPIRE))
>     return current_ev;
> 
> Note that IEEE80211_SCAN_RESULT_EXPIRE is defined to be (10 * HZ).
> 
> It seems that ieee80211_sta_scan_result() won't put a given AP
> into the scan result list if it was last seen over 10 seconds ago.
> 
> I changed the code in wireless_event_helper() to always call
> schedule_scan_results_timeout(self, 2) upon receipt of SIOCGIWSCAN
> and I've been running for a few days now without ever being stuck
> in the NM disconnected state.  I've also found this connects much
> faster (reasonable since I dropped a 6 second wait time) and
> surprisingly I found this also improved waking from suspend, which
> has been reliable since I made this change.
> 
> I'd be interested to hear what Dan Williams and others
> think of this analysis.  Is NM waiting too long to collect scan
> results?
> 
> cheers,
> Dan
> 
> _______________________________________________
> NetworkManager-list mailing list
> NetworkManager-list gnome org
> http://mail.gnome.org/mailman/listinfo/networkmanager-list



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]