> 6) Now there are two DHCP processes running
>
> May this is the reason why NM forgets some DHCP processes.
>
> 2009/2/20 cee1 <
fykcee1 gmail com>
> Hi, Dan
> I've tried 0.7.1-rc1, and modify the code slightly:
> <nm_dhcp_client_start>: kill the DHCP client according
> device->pid. In my test, the problem seems to be fixed.
>
> Is it a little reasonable that killing a DHCP client by
> device->pid?
> First, the logical is consistent with
> <nm_dhcp_manager_cancel_transaction_real>, which kills a DHCP
> client by device->pid.
> Second, it seems more reliablable.
> Third, although it is possible to kill the DHCP client which
> is not created by NM, by pidfile,
> that DHCP client may not use the same pidfile. (since the code
> in <nm-dhcp-dhclient.c> specified the path of the pidfile)
>
> BTW, I still see some log message like "NetworkManager: <WARN>
> nm_dhcp_manager_handle_event(): Received DHCP event from
> unexpected PID 16219 (expected 16221)" (but I'm sure the
> process has exited eailier.) I guess NM received the data from
> the process eailier, but was busy, and then it had chance to
> process the data, but the process had exited. Am I right?
>
> I'm runing neither SELinux nor AppArmor. I will try to find
> out the reason why the pidfile disappears while there is still
> a DHCP client running : a) The DHCP client fails to create
> pidfile ? b) The DHCP client delete the pidfile when it is
> running? c) ...
>
> 2009/2/20 Dan Williams <
dcbw redhat com>
>
>
> On Thu, 2009-02-19 at 20:37 +0800, cee1 wrote:
> > Hi, Dan
> > I've reviewed the code: when starts a new dhcp
> process (on the
> > same net interface), the previous one will be
> terminated. So the there
> > should always be one dhcp process for the same net
> interface, am I
> > right?
>
>
> Correct.
>
> > But in my test (still with tag NM-0.7.0 &
> dhclient-3.1.0),
> > switching between APs, some log message shows NM
> can't get pid from
> > pidfile (the pidfile doesn't exist), but the
> *previous* dhcp process
> > is actully alive . This is really strange.
>
>
> Yeah, it's strange. Are you running something like
> SELinux or AppArmor?
>
> Dan
>
>
> > 2009/2/19 cee1 <
fykcee1 gmail com>
> > Hi, Dan
> > Thanks for the replay.
> > The code of NetworkManger is downloaded from
> gnome's ftp
> > server with version 0.7.0.
> > I've tried this NM-0.7 with both
> dhcpcd-4.0.10 and
> > dhclient-3.1.0.
> >
> > Following your advice, I add some log output
> code, and find
> > the device->state in function
> > <nm_dhcp_manager_begin_transaction> is
> DHC_NBI (0) or DHC_END
> > (15).
> >
> > In the case of dhcpcd-4.0.10, I see some log
> messages like:
> > "wlan0: dhcpcd already running on pid 1626
> > (/var/run/dhcpcd-wlan0.pid)", and NM forgets
> some dhcpcd
> > processes.
> > Also I see log messages like "dhcpcd[1630]:
> write signal 15:
> > Bad file descriptor", does this suggests
> that the dhcpcd
> > process can't write the pid file??
> >
> > In the case of dhclient-3.1.0, (in my
> testing period), no
> > <dhcp_watch_cb> called (doesn't like the
> case of
> > dhcpcd-4.0.10), but still, log messages like
> > "nm_dhcp_manager_handle_event(): Received
> DHCP event from
> > unexpected PID 19924 (expected 0)" appear.
> > I see only one log message like "dhclient:
> There is already a
> > pid file /usr/var/run/dhclient-wlan0.pid
> with pid 19911".
> > This time, NM forgets dhclient processes
> more easily. When I
> > switch connections between APs, the number
> of dhclient
> > processes increases to 4 or 5 in a short
> time (which doesn't
> > like the case of dhcpcd-4.0.10).
> >
> > Do these suggest both of the two dhcp
> clients above having
> > problems?
> > Is the pidfile
> >
> reliable?(<nm_dhcp_manager_cancel_transaction_real>
> uses
> > device->pid to kill the dhcp process, but
> <nm_dhcp_client>
> > uses pidfile)
> >
> > BTW, thanks for your work, I will try the
> code with the tag
> > 0.7.1rc1 later.
> >
> > 2009/2/19 Dan Williams <
dcbw redhat com>
> >
> >
> > On Wed, 2009-02-18 at 17:10 +0800,
> cee1 wrote:
> > > Hi,
> > > I'm using
> NetwokManager-0.7(using dhcpcd) with
> > wifi, and see a
> > > lot of messages like "Received
> DHCP event from
> > unexpected PID xxxx
> > > (expected xxxx)" in the syslog.
> >
> >
> > What version of NM specifically?
> The final NM release
> > from late
> > November 2008, or a snapshot from
> some other point in
> > time?
> >
> > > I also see more than one
> dhcpcd processes
> > running. Sometimes,
> > > only one dhcpcd process lefts at
> last, but sometimes
> > not. (searching
> > > google, find the same problem
> someone has reported:
> > >
> >
>
https://bugs.launchpad.net/ubuntu/+source/network-manager/+bug/255829)
> > > Following the source
> code(NM-0.7), in the
> > function
> > > <nm_dhcp_client_start>
> (nm-dhcp-dhcpcd.c), it kills
> > the previous
> > > dhcpcd process and saves the new
> one's pid in
> > device->pid. But the
> > > function <dhcp_watch_cb>, knowing
> the termination of
> > the previous
> > > dhcpd process, sets device->pid to
> zero. So the
> > current dhcp process's
> > > pid is lost, and
> <nm_dhcp_manager_handle_event> will
> > print log like
> > > "Received DHCP event from
> unexpected PID 8812
> > (expected 0)".
> >
> >
> > Obviously the previous one isn't
> getting cleaned up
> > correctly. It might
> > be that dhcpcd returns a state NM
> doesn't expect, and
> > thus isn't getting
> > cleaned up in
> nm_dhcp_manager_begin_transaction():
> >
> > if (state_is_bound
> (device->state) ||
> > (device->state == DHC_START)) {
> > /* Cancel any DHCP
> transaction already
> > in progress */
> >
> nm_dhcp_manager_cancel_transaction_real
> > (device);
> > }
> >
> > Basically, it should *always* be the
> case that a
> > previous dhcp client
> > instance is cleaned up before
> starting a new one, so
> > the watch callback
> > should certainly be removed before
> starting a new dhcp
> > client run.
> >
> > > Also the previous dhcpcd
> process's time_out
> > handler may expire,
> > > and affects the current running
> dhcpcd process?
> >
> >
> > The timeout gets cleared whenever
> DHCP options get
> > returned, or when the
> > dhcp device gets destroyed, or when
> > nm_dhcp_manager_cancel_transaction()
> > is called to clean up any ongoing
> DHCP transaction.
> > cancel_transaction() gets called
> when the device is
> > deactivated.
> >
> > In any case, I've fixed up a few
> issues with the DHCP
> > client handling
> > code and made it somewhat more
> robust against this
> > issue. If you can
> > possibly test the latest git, or at
> least figure out
> > what state the DHCP
> > client is left in (device->state)
> when
> > nm_dhcp_manager_begin_transaction()
> gets called, that
> > would be great.
> >
> > Dan
> >
> > > It seems the function
> > <supplicant_interface_acquire>
> > > (nm-device-wifi.c) registers a
> function
> > >
> <supplicant_iface_connection_state_cb> as the
> > handler of the signal
> > > "connection-state" .
> > > And:
> > >
> <supplicant_iface_connection_state_cb> shcedules
> > >
> <supplicant_iface_connection_state_cb_handler>,
> > which calls
> > >
> >
> <nm_device_activate_schedule_stage3_ip_config_start>,
> > which shcedules
> > >
> <nm_device_activate_stage3_ip_config_start> , which
> > calls
> > > <real_act_stage3_ip_config_start>,
> which calls
> > >
> <nm_dhcp_manager_begin_transaction>.
> > > So it is possiable that more than
> one dhcpcd
> > processes running
> > > meanwhile.
> > >
> > > To fix this, shoud I do:
> > > 1) Protect the corresponding data
> struct
> > > or
> > > 2) Add a clean step when a second
> dhcpcd process to
> > be started
> > >
> > > Any ideas? Thanks.
> >
> > >
> _______________________________________________
> > > NetworkManager-list mailing list
> > >
NetworkManager-list gnome org
> > >
> >
>
http://mail.gnome.org/mailman/listinfo/networkmanager-list
> >
> >
> >
>
>
>
>