NM waiting too long to collect AP scan results?



I have recently installed Ubuntu 8.04, Hardy Heron, on my laptop.  This is
my first exposure to NetworkManager.  For the purposes of this discussion,
I always connect to the same WPA wifi AP, named "darknet".  At any given
moment 2 or 3 other WEP APs, and 1 or 2 unprotected APs are visible, but
I don't connect to them.  My wifi hardware is an Intel 4965agn using the
iwl4965 driver.

Occasionally I'd boot up to find NM disconnected.  When this happens, APs
are visible, but not darknet in particular.  If I wait long enough (over
20min sometimes), eventually NM finds it and connects.  This is way too
long to wait for an AP that I know is up and running and has a strong
signal where I'm sitting.

I started to feel that NM needed a manual "rescan" feature.  Googling this
I found discussions from 2006 about this, including comments from
Dan Williams about how just touching the nm-applet should schedule a rescan.
(http://mail.gnome.org/archives/networkmanager-list/2006-November/msg00127.html)
So why was it not finding darknet?

I decided to get the source and experiment (given that I use HH,
I had to get the v0.6.6 source).  What I found was that while in the
disconnected state, NM was indeed still scanning, but it just couldn't seem
to find darknet.  So I dug a little deeper into the scanning guts of NM.

Already you might be thinking I should use a more recent version of NM.
However, I'm not sure that would work with Ubuntu HH.  Since HH is the
Long Term Support version of Ubuntu, I thought I'd look into fixing it
up.  This is why I'm investigating the older source and am keen to
improve it.

I'll skip along here as my email is getting long, my hypothesis is this:
NM waits too long from the time it initiates the scan.  I believe the
driver (or more precisely, the Wireless Extensions (WE) code) is expiring
the scan result before NM gets them.

I propose that NM should read and process scan results sooner.  The
current time of 10 seconds is right on the threshold of results
getting expired (as I understand it).

I'm no NM or WE expert, so I'll leave Dan Williams and other experts
to mull over my findings.  Here are the details.

I added some nm_info()'s to NM source to see what was happening:

Jan 22 14:43:45 talon NetworkManager: <info>  dan, wlan0 iw scan issued
Jan 22 14:43:47 talon NetworkManager: <info>  dan, helper buf:8 SIOCGIWSCAN
Jan 22 14:43:55 talon NetworkManager: <info>  dan, schedule_scan(0x80a0110,20000ms)
Jan 22 14:43:55 talon NetworkManager: <info>  dan, process_scan_results(0x80a0110,0x80a1800,211) WE version 22
Jan 22 14:43:55 talon NetworkManager: <info>  dan, found AP 'BELL669' ret=0

Jan 22 14:44:15 talon NetworkManager: <info>  dan, wlan0 iw scan issued
Jan 22 14:44:17 talon NetworkManager: <info>  dan, helper buf:8 SIOCGIWSCAN
Jan 22 14:44:25 talon NetworkManager: <info>  dan, schedule_scan(0x80a0110,120000ms)

Jan 22 14:46:25 talon NetworkManager: <info>  dan, wlan0 iw scan issued
Jan 22 14:46:27 talon NetworkManager: <info>  dan, helper buf:8 SIOCGIWSCAN
Jan 22 14:46:35 talon NetworkManager: <info>  dan, schedule_scan(0x80a0110,120000ms)
Jan 22 14:48:35 talon NetworkManager: <info>  dan, wlan0 iw scan issued
Jan 22 14:48:37 talon NetworkManager: <info>  dan, helper buf:8 SIOCGIWSCAN
Jan 22 14:48:45 talon NetworkManager: <info>  dan, schedule_scan(0x80a0110,120000ms)
Jan 22 14:48:45 talon NetworkManager: <info>  dan, process_scan_results(0x80a0110,0x80a1800,422) WE version 22
Jan 22 14:48:45 talon NetworkManager: <info>  dan, found AP 'BELL479' ret=1
Jan 22 14:48:45 talon NetworkManager: <info>  dan, found AP 'BELL669' ret=0


So scans are triggered via iw_set_ext() and we setup to collect the results
in 10 seconds with schedule_scan_results_timeout(self, 10).

2 seconds later, wireless_event_helper() is called with SIOCGIWSCAN
indicating results are ready.  But the code there doesn't respond
right away, clamining that this might be the first of a burst of
such messages.  So instead we continue to wait the full 10 seconds.

At the 10 second mark, results are fetched using iw_get_ext() and the
next scan is scheduled.  Results (if any) are then processed.
In the log above you can see that some scans had no results, others
had just 1 or two.  None had darknet.

Curious about how iw_set_ext() and iw_get_ext() worked, I dug a
little and found this interesting bit in net/mac80211/ieee80211_sta.c
function ieee80211_sta_scan_result():

if (time_after(jiffies, bss->last_update + IEEE80211_SCAN_RESULT_EXPIRE))
    return current_ev;

Note that IEEE80211_SCAN_RESULT_EXPIRE is defined to be (10 * HZ).

It seems that ieee80211_sta_scan_result() won't put a given AP
into the scan result list if it was last seen over 10 seconds ago.

I changed the code in wireless_event_helper() to always call
schedule_scan_results_timeout(self, 2) upon receipt of SIOCGIWSCAN
and I've been running for a few days now without ever being stuck
in the NM disconnected state.  I've also found this connects much
faster (reasonable since I dropped a 6 second wait time) and
surprisingly I found this also improved waking from suspend, which
has been reliable since I made this change.

I'd be interested to hear what Dan Williams and others
think of this analysis.  Is NM waiting too long to collect scan
results?

cheers,
Dan



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]