Re: About "Received DHCP event from unexpected PID xxxx (expected xxxx)"



Hi, Dan
     I've reviewed  the code: when starts a new dhcp process (on the same net interface), the previous one will be terminated. So the there should always be one dhcp process for the same net interface, am I right?
     But in my test (still with tag NM-0.7.0 & dhclient-3.1.0), switching between APs, some log message shows NM can't get pid from pidfile (the pidfile doesn't  exist), but the *previous* dhcp process is actully alive . This is really strange.

2009/2/19 cee1 <fykcee1 gmail com>
Hi, Dan
Thanks for the replay.
The code of NetworkManger is downloaded from gnome's ftp server with version 0.7.0.
I've tried this NM-0.7 with both dhcpcd-4.0.10 and dhclient-3.1.0.

Following your advice, I add some log output code, and find the device->state in function <nm_dhcp_manager_begin_transaction> is DHC_NBI (0) or DHC_END (15).

In the case of dhcpcd-4.0.10, I see some log messages like: "wlan0: dhcpcd already running on pid 1626 (/var/run/dhcpcd-wlan0.pid)", and NM forgets some dhcpcd processes.
Also I see log messages like "dhcpcd[1630]: write signal 15: Bad file descriptor", does this suggests that the dhcpcd process can't write the pid file??

In the case of dhclient-3.1.0, (in my testing period), no <dhcp_watch_cb> called (doesn't like the case of dhcpcd-4.0.10), but still, log messages like "nm_dhcp_manager_handle_event(): Received DHCP event from unexpected PID 19924 (expected 0)" appear.
I see only one log message like "dhclient: There is already a pid file /usr/var/run/dhclient-wlan0.pid with pid 19911".
This time, NM forgets dhclient processes more easily. When I switch connections between APs, the number of dhclient processes increases to 4 or 5 in a short time (which doesn't like the case of dhcpcd-4.0.10).

Do these suggest both of the two dhcp clients above having problems?
Is the pidfile reliable?(<nm_dhcp_manager_cancel_transaction_real> uses device->pid to kill the dhcp process, but <nm_dhcp_client> uses pidfile)

BTW, thanks for your work, I will try the code with the tag 0.7.1rc1 later.

2009/2/19 Dan Williams <dcbw redhat com>

On Wed, 2009-02-18 at 17:10 +0800, cee1 wrote:
> Hi,
>      I'm using NetwokManager-0.7(using dhcpcd) with wifi, and see a
> lot of messages like  "Received DHCP event from unexpected PID xxxx
> (expected xxxx)" in the syslog.

What version of NM specifically?  The final NM release from late
November 2008, or a snapshot from some other point in time?

>      I also see more than one dhcpcd processes running. Sometimes,
> only one dhcpcd process lefts at last, but sometimes not. (searching
> google, find the same problem someone has reported:
> https://bugs.launchpad.net/ubuntu/+source/network-manager/+bug/255829)
>      Following the source code(NM-0.7), in the function
> <nm_dhcp_client_start> (nm-dhcp-dhcpcd.c), it kills the previous
> dhcpcd process and saves the new one's pid in device->pid. But the
> function <dhcp_watch_cb>, knowing the termination of the previous
> dhcpd process, sets device->pid to zero. So the current dhcp process's
> pid is lost, and <nm_dhcp_manager_handle_event> will print log like
> "Received DHCP event from unexpected PID 8812 (expected 0)".

Obviously the previous one isn't getting cleaned up correctly.  It might
be that dhcpcd returns a state NM doesn't expect, and thus isn't getting
cleaned up in nm_dhcp_manager_begin_transaction():

       if (state_is_bound (device->state) || (device->state == DHC_START)) {
               /* Cancel any DHCP transaction already in progress */
               nm_dhcp_manager_cancel_transaction_real (device);
       }

Basically, it should *always* be the case that a previous dhcp client
instance is cleaned up before starting a new one, so the watch callback
should certainly be removed before starting a new dhcp client run.

>      Also the previous dhcpcd process's time_out handler may expire,
> and affects the current running dhcpcd process?

The timeout gets cleared whenever DHCP options get returned, or when the
dhcp device gets destroyed, or when nm_dhcp_manager_cancel_transaction()
is called to clean up any ongoing DHCP transaction.
cancel_transaction() gets called when the device is deactivated.

In any case, I've fixed up a few issues with the DHCP client handling
code and made it somewhat more robust against this issue.  If you can
possibly test the latest git, or at least figure out what state the DHCP
client is left in (device->state) when
nm_dhcp_manager_begin_transaction() gets called, that would be great.

Dan

>      It seems the function <supplicant_interface_acquire>
> (nm-device-wifi.c) registers a function
> <supplicant_iface_connection_state_cb> as the handler of the signal
> "connection-state" .
> And:
> <supplicant_iface_connection_state_cb> shcedules
> <supplicant_iface_connection_state_cb_handler>, which calls
> <nm_device_activate_schedule_stage3_ip_config_start>, which shcedules
> <nm_device_activate_stage3_ip_config_start> , which calls
> <real_act_stage3_ip_config_start>, which calls
> <nm_dhcp_manager_begin_transaction>.
> So it is possiable that more than one dhcpcd processes running
> meanwhile.
>
>     To fix this, shoud I do:
> 1) Protect the corresponding data struct
> or
> 2) Add a clean step when a second dhcpcd process to be started
>
>      Any ideas? Thanks.
> _______________________________________________
> NetworkManager-list mailing list
> NetworkManager-list gnome org
> http://mail.gnome.org/mailman/listinfo/networkmanager-list





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]