Re: Race condition while using usbnet with NM 0.7.1



On Fri, 2009-07-10 at 20:32 -0300, Ricardo Salveti de Araujo wrote:
> Hi all,
> 
> We're currently using Network Manager at Mamona, a developer distro
> based on open embedded for Nokia tablet devices.
> 
> Currently we're using NM 0.7 and we just created the packages for NM
> 0.7.1, but while testing it, I faced some race condition issues with
> the usbnet.

Ha!  I've been looking for that race off and on for a while.  Thanks for
finding the root cause.  Can you try out this commit from master?  If it
works for you I'll also cherry-pick to 0.7.x.

commit 302c9fcbccf3ad945afbc3f58e42013045c6e352
Author: Dan Williams <dcbw redhat com>
Date:   Mon Jul 13 19:40:39 2009 -0400

    netlink: fix race that caused stale carrier state signals
    
    Found by Ricardo Salveti de Araujo <ricardo salveti openbossa org>
    
    The link cache was updated immediately, but the carrier state signals
    were emitted a lot later, when the cache data was already stale.  So
    just update the cache at the same time we emit the signals.  The
    carrier-state-request stuff wasn't originally converted to deferred
    for any netlink-specific reason, just to smooth the initial device
    creation process in NM.


Thanks!
Dan

> Here's the log of the Network Manager, while booting the device (with
> additional debug):
> <info>  starting...
> <info>  nm_netlink_monitor_open_connection()
> <info>  nm_netlink_monitor_request_status() <- add the handler to the main loop
> <info>  deferred_emit_carrier_state() <- consume the cache
> <info>  netlink_object_message_handler() (lo) IFF_LOWER_UP
> <info>  netlink_event_input()
> <info>  netlink_object_message_handler() (lo) IFF_LOWER_DOWN
> <info>  netlink_object_message_handler() (usb0) IFF_LOWER_DOWN
> <info>  nm_netlink_monitor_request_status() <- add the handler to the main loop
> <info>  (usb0): new Ethernet device (driver: 'ehci_udc')
> <info>  (usb0): exported as /org/freedesktop/Hal/devices/net_7a_ce_13_55_f7_81
> <info>  Trying to start the supplicant...
> <info>  netlink_event_input()
> <info>  (usb0): device state change: 1 -> 2
> <info>  (usb0): bringing up device.
> <info>  (usb0): preparing device.
> <info>  (usb0): deactivating device (reason: 2).
> <info>  Setting system hostname to 'localhost.localdomain' (no default device)
> <info>  netlink_object_message_handler() (usb0) IFF_LOWER_UP
> <info>  (usb0): carrier now ON (device state 2)
> <info>  (usb0): device state change: 2 -> 3
> <info>  netlink_event_input()
> <info>  Trying to start the system settings daemon...
> <info>  deferred_emit_carrier_state() <- consume the cache
> <info>  netlink_object_message_handler() (lo) IFF_LOWER_UP
> <info>  netlink_object_message_handler() (usb0) IFF_LOWER_DOWN
> <info>  (usb0): carrier now OFF (device state 3)
> <info>  (usb0): device state change: 3 -> 2
> <info>  (usb0): deactivating device (reason: 40).
> 
> The problem is that at the end the device carrier status is OFF, while
> it should be ON, so the NM could finish setting the IP address and
> letting it ready to use.
> 
> Looking forward to try to identify where is the problem, I found that
> the function deferred_emit_carrier_state (nm-netlink-monitor.c) is
> taking more than expected to be called, and between the
> nl_cache_refill and the actual message handler, NM brought the device
> up, setting the carrier status to ON. At the moment
> deferred_emit_carrier_state is called by the main loop, the cache data
> is not valid anymore, letting the usb0 carrier status to OFF again.
> 
> Because of this behavior, NM is not configuring the device as it
> should, and the interface remains up while without any IP.
> 
> This is not happening every time though. When NM brings up the device
> after calling deferred_emit_carrier_state, everything works fine, so
> that's why it seems that a racing condition is going on.
> 
> The question is, what's the best way to fix this issue?
> 
> I know that I could go to 2 directions, one is to check the cached
> data when getting a new event (like bringing up the usb0 interface),
> and the other is to call nl_cache_refill inside
> deferred_emit_carrier_state, changing a little bit the current
> behavior.
> 
> As I still don't understand a lot of the NM code (started reading it
> deeply today), I would like to know on what solution should I work on,
> so I could send you the patch later.
> 
> Thanks!



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]