Re: About "Received DHCP event from unexpected PID xxxx (expected xxxx)"



On Thu, 2009-02-19 at 20:37 +0800, cee1 wrote:
> Hi, Dan
>      I've reviewed  the code: when starts a new dhcp process (on the
> same net interface), the previous one will be terminated. So the there
> should always be one dhcp process for the same net interface, am I
> right?

Correct.

>      But in my test (still with tag NM-0.7.0 & dhclient-3.1.0),
> switching between APs, some log message shows NM can't get pid from
> pidfile (the pidfile doesn't  exist), but the *previous* dhcp process
> is actully alive . This is really strange.

Yeah, it's strange.  Are you running something like SELinux or AppArmor?

Dan

> 2009/2/19 cee1 <fykcee1 gmail com>
>         Hi, Dan
>         Thanks for the replay.
>         The code of NetworkManger is downloaded from gnome's ftp
>         server with version 0.7.0.
>         I've tried this NM-0.7 with both dhcpcd-4.0.10 and
>         dhclient-3.1.0.
>         
>         Following your advice, I add some log output code, and find
>         the device->state in function
>         <nm_dhcp_manager_begin_transaction> is DHC_NBI (0) or DHC_END
>         (15).
>         
>         In the case of dhcpcd-4.0.10, I see some log messages like:
>         "wlan0: dhcpcd already running on pid 1626
>         (/var/run/dhcpcd-wlan0.pid)", and NM forgets some dhcpcd
>         processes. 
>         Also I see log messages like "dhcpcd[1630]: write signal 15:
>         Bad file descriptor", does this suggests that the dhcpcd
>         process can't write the pid file??
>         
>         In the case of dhclient-3.1.0, (in my testing period), no
>         <dhcp_watch_cb> called (doesn't like the case of
>         dhcpcd-4.0.10), but still, log messages like
>         "nm_dhcp_manager_handle_event(): Received DHCP event from
>         unexpected PID 19924 (expected 0)" appear. 
>         I see only one log message like "dhclient: There is already a
>         pid file /usr/var/run/dhclient-wlan0.pid with pid 19911".
>         This time, NM forgets dhclient processes more easily. When I
>         switch connections between APs, the number of dhclient
>         processes increases to 4 or 5 in a short time (which doesn't
>         like the case of dhcpcd-4.0.10).
>         
>         Do these suggest both of the two dhcp clients above having
>         problems?
>         Is the pidfile
>         reliable?(<nm_dhcp_manager_cancel_transaction_real> uses
>         device->pid to kill the dhcp process, but <nm_dhcp_client>
>         uses pidfile)
>         
>         BTW, thanks for your work, I will try the code with the tag
>         0.7.1rc1 later.
>         
>         2009/2/19 Dan Williams <dcbw redhat com>
>         
>         
>                 On Wed, 2009-02-18 at 17:10 +0800, cee1 wrote:
>                 > Hi,
>                 >      I'm using NetwokManager-0.7(using dhcpcd) with
>                 wifi, and see a
>                 > lot of messages like  "Received DHCP event from
>                 unexpected PID xxxx
>                 > (expected xxxx)" in the syslog.
>                 
>                 
>                 What version of NM specifically?  The final NM release
>                 from late
>                 November 2008, or a snapshot from some other point in
>                 time?
>                 
>                 >      I also see more than one dhcpcd processes
>                 running. Sometimes,
>                 > only one dhcpcd process lefts at last, but sometimes
>                 not. (searching
>                 > google, find the same problem someone has reported:
>                 >
>                 https://bugs.launchpad.net/ubuntu/+source/network-manager/+bug/255829)
>                 >      Following the source code(NM-0.7), in the
>                 function
>                 > <nm_dhcp_client_start> (nm-dhcp-dhcpcd.c), it kills
>                 the previous
>                 > dhcpcd process and saves the new one's pid in
>                 device->pid. But the
>                 > function <dhcp_watch_cb>, knowing the termination of
>                 the previous
>                 > dhcpd process, sets device->pid to zero. So the
>                 current dhcp process's
>                 > pid is lost, and <nm_dhcp_manager_handle_event> will
>                 print log like
>                 > "Received DHCP event from unexpected PID 8812
>                 (expected 0)".
>                 
>                 
>                 Obviously the previous one isn't getting cleaned up
>                 correctly.  It might
>                 be that dhcpcd returns a state NM doesn't expect, and
>                 thus isn't getting
>                 cleaned up in nm_dhcp_manager_begin_transaction():
>                 
>                        if (state_is_bound (device->state) ||
>                 (device->state == DHC_START)) {
>                                /* Cancel any DHCP transaction already
>                 in progress */
>                                nm_dhcp_manager_cancel_transaction_real
>                 (device);
>                        }
>                 
>                 Basically, it should *always* be the case that a
>                 previous dhcp client
>                 instance is cleaned up before starting a new one, so
>                 the watch callback
>                 should certainly be removed before starting a new dhcp
>                 client run.
>                 
>                 >      Also the previous dhcpcd process's time_out
>                 handler may expire,
>                 > and affects the current running dhcpcd process?
>                 
>                 
>                 The timeout gets cleared whenever DHCP options get
>                 returned, or when the
>                 dhcp device gets destroyed, or when
>                 nm_dhcp_manager_cancel_transaction()
>                 is called to clean up any ongoing DHCP transaction.
>                 cancel_transaction() gets called when the device is
>                 deactivated.
>                 
>                 In any case, I've fixed up a few issues with the DHCP
>                 client handling
>                 code and made it somewhat more robust against this
>                 issue.  If you can
>                 possibly test the latest git, or at least figure out
>                 what state the DHCP
>                 client is left in (device->state) when
>                 nm_dhcp_manager_begin_transaction() gets called, that
>                 would be great.
>                 
>                 Dan
>                 
>                 >      It seems the function
>                 <supplicant_interface_acquire>
>                 > (nm-device-wifi.c) registers a function
>                 > <supplicant_iface_connection_state_cb> as the
>                 handler of the signal
>                 > "connection-state" .
>                 > And:
>                 > <supplicant_iface_connection_state_cb> shcedules
>                 > <supplicant_iface_connection_state_cb_handler>,
>                 which calls
>                 >
>                 <nm_device_activate_schedule_stage3_ip_config_start>,
>                 which shcedules
>                 > <nm_device_activate_stage3_ip_config_start> , which
>                 calls
>                 > <real_act_stage3_ip_config_start>, which calls
>                 > <nm_dhcp_manager_begin_transaction>.
>                 > So it is possiable that more than one dhcpcd
>                 processes running
>                 > meanwhile.
>                 >
>                 >     To fix this, shoud I do:
>                 > 1) Protect the corresponding data struct
>                 > or
>                 > 2) Add a clean step when a second dhcpcd process to
>                 be started
>                 >
>                 >      Any ideas? Thanks.
>                 
>                 > _______________________________________________
>                 > NetworkManager-list mailing list
>                 > NetworkManager-list gnome org
>                 >
>                 http://mail.gnome.org/mailman/listinfo/networkmanager-list
>                 
>         
> 



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]