Re: periodic_update(): Roamed ...



Dan Williams wrote:
                     On Fri, 2009-04-24 at 15:18 -0700, Howard Chu wrote:
Dan Williams wrote:
On Mon, 2009-04-20 at 15:37 -0700, Howard Chu wrote:
Howard Chu wrote:
This is probably more related to the ath9k driver, but I wanted to start here
in case anyone is familiar with it. I've been seeing this for the past couple
months, and I just now rebuilt NM fresh from git and it's still happening:

I seem to have ruled out the driver; doing a kill -9 on NetworkManager so it
doesn't have the opportunity to tear down the connection on exit, shows that
the wifi connection works perfectly once NetworkManager is gone. No
disassociation messages in dmesg, no pauses in ssh sessions, etc.

Don't rule out the driver.  Does the driver always return the currently
associated AP in the scan list?  If not, you might hit this.  And the
driver is being stupid, because of *course* the AP you're currently
connected to should always be in the scan list, unless you're no longer
connected to it.

The code in NM grabs the SSID&   BSSID on a 6 second timer, and tries to
find that AP in the scan list.  If it can't find it (because the driver
didn't return that AP in the scan list) then it reports none.

If that's your problem, it's a driver problem.

How would I check this? Should it be obvious from "iwlist scan" ? That returns
the current AP along with the other visible ones.

Also, reviewing the comments in bug 291760, this problem is not just isolated
to the ath9k driver. It's also being reported for ath5k, wl, iwl3945, ipw2100,
rtl8187, and b43, across multiple kernel and driver revisions. As such it
seems unlikely to be the drivers' fault.

Depends; it might show up in that scan, or it might not.  If you can
reliably get it to show up every time, that's great.  But until 2.6.30,
mac80211-based drivers would not always return the current AP.  And some
of the older drivers don't either, though fullmac drivers are more
likely to be OK there.

If you already know for a fact that certain drivers are incompatible with NM, it seems you should be documenting that in your release notes or something. Or, you should be maintaining a list of tested known-to-work drivers.

There is one window where NM wouldn't be able to find the AP; if NM just
did a scan, and then the card reassociates to a different AP that's not
in the scan list, and doesn't send an GIWSCAN event so that the AP list
gets pulled (ipw2x00 do this, other drivers might not), then when NM
goes to pull the BSSID off the card, the scan list doesn't contain the
current AP.  What NM should be doing here is to request that the
supplicant grab the scan list again when it sees a BSSID it doesn't know
about, but that's somewhat complicated.

There must be more cases than this, because there are no other APs for my card to associate to. (They're all secured with WEP or WPA, and I only have credentials for mine.) The only reason I ever see the card reassociate at all is due to NM's scanning; with that patched out it just stays associated.

If the driver doesn't return the frequency of the BSSID in the scan
results, or that frequency doesn't match what the card reports from
SIOCGIWFREQ, then NM also can come up with (none).  Check that the
information from an 'iwlist scan' for that BSSID matches what 'iwconfig'
reports when associated to that specific AP.

So in conclusion there are actual driver bugs; (a) not all drivers scan
results contain the currently associated AP in every scan, and (b) not
all drivers emit scan results events when they associate to a new AP
that's not already in the scan list, and finally (c) some drivers are
just busted and return wrong channel information.

Pretty sure (c) is not the case here, the info from iwlist scan and iwconfig all matches. (b) won't happen in my current environment, so I can't test one way or another. (a) doesn't appear to happen when I look, but I have no idea how many scans are needed before the symptom occurs.

It seems to me that blaming the driver is not particularly useful unless you can provide a procedure or script to demonstrate the driver bugs. In the meantime, that whole spectrum of drivers is out there and people are trying to use them. And except for whatever NM's undocumented expectations, those cards and drivers work fine. Since only NM causes problems, it's your responsibility to either help identify the problems so the driver writers can fix them, or make NM work despite those problems. E.g., if you know that scans return unreliable information, then *stop relying on the scan results*. Clearly the driver can tell you if it's associated or not. Assuming that the association is gone because the current AP doesn't show up in the current scan list, when you know that scans can be incomplete, is stupid.

Likewise, continual scanning seems to be counterproductive. The impact it has on network throughput is significant:

64 bytes from 192.168.1.1: icmp_seq=7 ttl=64 time=1.57 ms
64 bytes from 192.168.1.1: icmp_seq=8 ttl=64 time=1.56 ms
64 bytes from 192.168.1.1: icmp_seq=9 ttl=64 time=4607 ms
64 bytes from 192.168.1.1: icmp_seq=10 ttl=64 time=3604 ms
64 bytes from 192.168.1.1: icmp_seq=11 ttl=64 time=2604 ms
64 bytes from 192.168.1.1: icmp_seq=12 ttl=64 time=1604 ms
64 bytes from 192.168.1.1: icmp_seq=13 ttl=64 time=604 ms
64 bytes from 192.168.1.1: icmp_seq=14 ttl=64 time=1.54 ms
64 bytes from 192.168.1.1: icmp_seq=15 ttl=64 time=1.54 ms
64 bytes from 192.168.1.1: icmp_seq=16 ttl=64 time=1.54 ms

Obviously the usual ping time is ~1.5ms; "iwlist scan" slows that down quite a lot. I'd rather wait 6 seconds *once* to find a new AP after I've legitimately lost an AP association, instead of waiting ~10 seconds every two minutes rescanning for a list of APs that simply don't matter.

At this point I've spent as much time as I can afford on it, and my patched NM works for me.

--
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]