> 5) NM tries to read the file "/var/run/dhcpcd-wlan0.pid", but it gets
> nothing, so the second dhcpcd process for wlan0 is forked
>
>
>
> > 6) Now there are two DHCP processes running
> >
> > May this is the reason why NM forgets some DHCP processes.
> >
> > 2009/2/20 cee1 <
fykcee1 gmail com>
> > Hi, Dan
> > I've tried 0.7.1-rc1, and modify the code slightly:
> > <nm_dhcp_client_start>: kill the DHCP client
> according
> > device->pid. In my test, the problem seems to be
> fixed.
> >
> > Is it a little reasonable that killing a DHCP client
> by
> > device->pid?
> > First, the logical is consistent with
> > <nm_dhcp_manager_cancel_transaction_real>, which
> kills a DHCP
> > client by device->pid.
> > Second, it seems more reliablable.
> > Third, although it is possible to kill the DHCP
> client which
> > is not created by NM, by pidfile,
> > that DHCP client may not use the same pidfile.
> (since the code
> > in <nm-dhcp-dhclient.c> specified the path of the
> pidfile)
> >
> > BTW, I still see some log message like
> "NetworkManager: <WARN>
> > nm_dhcp_manager_handle_event(): Received DHCP event
> from
> > unexpected PID 16219 (expected 16221)" (but I'm sure
> the
> > process has exited eailier.) I guess NM received the
> data from
> > the process eailier, but was busy, and then it had
> chance to
> > process the data, but the process had exited. Am I
> right?
> >
> > I'm runing neither SELinux nor AppArmor. I will try
> to find
> > out the reason why the pidfile disappears while
> there is still
> > a DHCP client running : a) The DHCP client fails to
> create
> > pidfile ? b) The DHCP client delete the pidfile when
> it is
> > running? c) ...
> >
> > 2009/2/20 Dan Williams <
dcbw redhat com>
> >
> >
> > On Thu, 2009-02-19 at 20:37 +0800, cee1
> wrote:
> > > Hi, Dan
> > > I've reviewed the code: when starts
> a new dhcp
> > process (on the
> > > same net interface), the previous one will
> be
> > terminated. So the there
> > > should always be one dhcp process for the
> same net
> > interface, am I
> > > right?
> >
> >
> > Correct.
> >
> > > But in my test (still with tag
> NM-0.7.0 &
> > dhclient-3.1.0),
> > > switching between APs, some log message
> shows NM
> > can't get pid from
> > > pidfile (the pidfile doesn't exist), but
> the
> > *previous* dhcp process
> > > is actully alive . This is really strange.
> >
> >
> > Yeah, it's strange. Are you running
> something like
> > SELinux or AppArmor?
> >
> > Dan
> >
> >
> > > 2009/2/19 cee1 <
fykcee1 gmail com>
> > > Hi, Dan
> > > Thanks for the replay.
> > > The code of NetworkManger is
> downloaded from
> > gnome's ftp
> > > server with version 0.7.0.
> > > I've tried this NM-0.7 with both
> > dhcpcd-4.0.10 and
> > > dhclient-3.1.0.
> > >
> > > Following your advice, I add some
> log output
> > code, and find
> > > the device->state in function
> > >
> <nm_dhcp_manager_begin_transaction> is
> > DHC_NBI (0) or DHC_END
> > > (15).
> > >
> > > In the case of dhcpcd-4.0.10, I
> see some log
> > messages like:
> > > "wlan0: dhcpcd already running on
> pid 1626
> > > (/var/run/dhcpcd-wlan0.pid)", and
> NM forgets
> > some dhcpcd
> > > processes.
> > > Also I see log messages like
> "dhcpcd[1630]:
> > write signal 15:
> > > Bad file descriptor", does this
> suggests
> > that the dhcpcd
> > > process can't write the pid file??
> > >
> > > In the case of dhclient-3.1.0, (in
> my
> > testing period), no
> > > <dhcp_watch_cb> called (doesn't
> like the
> > case of
> > > dhcpcd-4.0.10), but still, log
> messages like
> > > "nm_dhcp_manager_handle_event():
> Received
> > DHCP event from
> > > unexpected PID 19924 (expected 0)"
> appear.
> > > I see only one log message like
> "dhclient:
> > There is already a
> > > pid
> file /usr/var/run/dhclient-wlan0.pid
> > with pid 19911".
> > > This time, NM forgets dhclient
> processes
> > more easily. When I
> > > switch connections between APs,
> the number
> > of dhclient
> > > processes increases to 4 or 5 in a
> short
> > time (which doesn't
> > > like the case of dhcpcd-4.0.10).
> > >
> > > Do these suggest both of the two
> dhcp
> > clients above having
> > > problems?
> > > Is the pidfile
> > >
> >
> reliable?(<nm_dhcp_manager_cancel_transaction_real>
> > uses
> > > device->pid to kill the dhcp
> process, but
> > <nm_dhcp_client>
> > > uses pidfile)
> > >
> > > BTW, thanks for your work, I will
> try the
> > code with the tag
> > > 0.7.1rc1 later.
> > >
> > > 2009/2/19 Dan Williams
> <
dcbw redhat com>
> > >
> > >
> > > On Wed, 2009-02-18 at
> 17:10 +0800,
> > cee1 wrote:
> > > > Hi,
> > > > I'm using
> > NetwokManager-0.7(using dhcpcd) with
> > > wifi, and see a
> > > > lot of messages like
> "Received
> > DHCP event from
> > > unexpected PID xxxx
> > > > (expected xxxx)" in the
> syslog.
> > >
> > >
> > > What version of NM
> specifically?
> > The final NM release
> > > from late
> > > November 2008, or a
> snapshot from
> > some other point in
> > > time?
> > >
> > > > I also see more
> than one
> > dhcpcd processes
> > > running. Sometimes,
> > > > only one dhcpcd process
> lefts at
> > last, but sometimes
> > > not. (searching
> > > > google, find the same
> problem
> > someone has reported:
> > > >
> > >
> >
>
https://bugs.launchpad.net/ubuntu/+source/network-manager/+bug/255829)
> > > > Following the
> source
> > code(NM-0.7), in the
> > > function
> > > > <nm_dhcp_client_start>
> > (nm-dhcp-dhcpcd.c), it kills
> > > the previous
> > > > dhcpcd process and saves
> the new
> > one's pid in
> > > device->pid. But the
> > > > function
> <dhcp_watch_cb>, knowing
> > the termination of
> > > the previous
> > > > dhcpd process, sets
> device->pid to
> > zero. So the
> > > current dhcp process's
> > > > pid is lost, and
> > <nm_dhcp_manager_handle_event> will
> > > print log like
> > > > "Received DHCP event
> from
> > unexpected PID 8812
> > > (expected 0)".
> > >
> > >
> > > Obviously the previous one
> isn't
> > getting cleaned up
> > > correctly. It might
> > > be that dhcpcd returns a
> state NM
> > doesn't expect, and
> > > thus isn't getting
> > > cleaned up in
> > nm_dhcp_manager_begin_transaction():
> > >
> > > if (state_is_bound
> > (device->state) ||
> > > (device->state ==
> DHC_START)) {
> > > /* Cancel
> any DHCP
> > transaction already
> > > in progress */
> > >
> > nm_dhcp_manager_cancel_transaction_real
> > > (device);
> > > }
> > >
> > > Basically, it should
> *always* be the
> > case that a
> > > previous dhcp client
> > > instance is cleaned up
> before
> > starting a new one, so
> > > the watch callback
> > > should certainly be
> removed before
> > starting a new dhcp
> > > client run.
> > >
> > > > Also the previous
> dhcpcd
> > process's time_out
> > > handler may expire,
> > > > and affects the current
> running
> > dhcpcd process?
> > >
> > >
> > > The timeout gets cleared
> whenever
> > DHCP options get
> > > returned, or when the
> > > dhcp device gets
> destroyed, or when
> > >
> nm_dhcp_manager_cancel_transaction()
> > > is called to clean up any
> ongoing
> > DHCP transaction.
> > > cancel_transaction() gets
> called
> > when the device is
> > > deactivated.
> > >
> > > In any case, I've fixed up
> a few
> > issues with the DHCP
> > > client handling
> > > code and made it somewhat
> more
> > robust against this
> > > issue. If you can
> > > possibly test the latest
> git, or at
> > least figure out
> > > what state the DHCP
> > > client is left in
> (device->state)
> > when
> > >
> nm_dhcp_manager_begin_transaction()
> > gets called, that
> > > would be great.
> > >
> > > Dan
> > >
> > > > It seems the
> function
> > >
> <supplicant_interface_acquire>
> > > > (nm-device-wifi.c)
> registers a
> > function
> > > >
> > <supplicant_iface_connection_state_cb> as
> the
> > > handler of the signal
> > > > "connection-state" .
> > > > And:
> > > >
> > <supplicant_iface_connection_state_cb>
> shcedules
> > > >
> >
> <supplicant_iface_connection_state_cb_handler>,
> > > which calls
> > > >
> > >
> >
> <nm_device_activate_schedule_stage3_ip_config_start>,
> > > which shcedules
> > > >
> >
> <nm_device_activate_stage3_ip_config_start> , which
> > > calls
> > > >
> <real_act_stage3_ip_config_start>,
> > which calls
> > > >
> > <nm_dhcp_manager_begin_transaction>.
> > > > So it is possiable that
> more than
> > one dhcpcd
> > > processes running
> > > > meanwhile.
> > > >
> > > > To fix this, shoud I
> do:
> > > > 1) Protect the
> corresponding data
> > struct
> > > > or
> > > > 2) Add a clean step when
> a second
> > dhcpcd process to
> > > be started
> > > >
> > > > Any ideas? Thanks.
> > >
> > > >
> >
> _______________________________________________
> > > > NetworkManager-list
> mailing list
> > > >
>
NetworkManager-list gnome org
> > > >
> > >
> >
>
http://mail.gnome.org/mailman/listinfo/networkmanager-list
> > >
> > >
> > >
> >
> >
> >
> >
>
>
>