Re: Getting NM to re-try DHCP



On Fri, 2011-05-20 at 14:05 +0200, Jirka Klimes wrote:
> On Thursday 07 of April 2011 00:24:44 Dan Williams wrote:
> > On Tue, 2011-04-05 at 11:37 -0400, Derek Atkins wrote:
> > > Hey all,
> > > 
> > > I have a strange issue.  I lost power last night and one of my systems
> > > came up before my DHCP server did (which is surprising, because my DHCP
> > > server usually comes up pretty quick!)  This "client" system was
> > > supposed to get itself on the network (it has an auto-logon system).
> > > However, NM didn't succeed because my DHCP server wasn't responding,
> > > yet.
> > > 
> > > This is a hard-wired system (not wireless).  Is there any way to get NM
> > > to periodically retry DHCP if at first it does not succeed?
> > > 
> > > I realize that DHCP has its own retry mechanism, but if the whole
> > > process times out, can I set NM to retry every, say, 5 minutes?
> > 
> > We'd need some code changes in NM; basically for wired connections if
> > the activation attempt fails a certain number of times (currently 3)
> > then the connection is marked "invalid".  What probably should happen is
> > that internally, in nm-policy.c, a timeout handler should be scheduled
> > for the connection (using g_timeout_add_seconds()) that triggers after 5
> > minutes or so and if the connection isn't currently active (ie check the
> > NMManager's active connection list) then the invalid flag is cleared
> > from the connection, which will let it be automatically retried.
> > 
> > It's pretty simple from a code perspective, just hasn't been done yet in
> > the run-up to 0.9 and 0.8.4.  Any takers?
> > 
> > Dan
> 
> A patch doing that is in the attachment.

A few concerns with the patch; it might get more complicated because of
them.  First, if the connection is deleted before the idle handler runs,
we'll be left with garbage.  So we'd need to do g_object_weak_ref() on
the connection object, cache the idle ID, and run g_source_remove() on
that idle ID from the weak ref callback.  Next, in
reset_connection_retries() we'd want to remove that weak ref when
freeing the ResetRetriesData structure.

Second, if the connection gets activated again manually by the user
while we're waiting the 5 minutes then we should g_source_remove() the
idle ID.  That gets more complicated.

Perhaps a simpler approach would be to have a single, global idle
handler in the NMPolicy that runs over the connection list and resets
the retries as appropriate?  First, in device_state_changed() we could
attach the current seconds-since-epoch time to the object using
g_object_set_data() when its # retries reaches 0, and then if no reset
idle handler was scheduled, schedule one for 300 seconds later.
(otherwise allow the existing idle handler to run earlier since
presumably it was scheduled by an earlier failed connection we want to
reset).

Then, when the reset idle handler does run, iterate over each
connection, save the earliest reset timestamp, and reschedule the idle
handler for that timestamp + 300 seconds.  During this iteration of
course we reset the retries count for every connection that has a reset
timestamp earlier than now.

If the connection gets activated, nm-policy.c's signal handlers can
listen for that and clear out the invalid timestamp data too.

If that would all work, that would allow us to avoid doing all the
alloc/dealloc of a custom data structure, plus we only have to manage
one idle handler.  Plus we don't have to care too  much about stuff like
connection deletion and activation happening before our idle handler
runs since those will be easier to deal with.

Thoughts?

Dan



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]