Re: [RFC] Revise NM behavior for Unmanaged Devices and Assuming Connections (bgo 746440)



On Mon, 2015-11-09 at 17:43 +0100, Thomas Haller wrote:
Hi,


Regarding bug https://bugzilla.gnome.org/show_bug.cgi?id=746440,
about
unmanaged devices and assuming connections.

The current behavior is broken and for nm-1-2 this shall be fixed as
follows:

Basically, let NM be less smart and instead of "assuming connections"
let it "activate-gracefully".




NetworkManager State and Device State
=====================================

For doing the right thing when NM starts (after systemctl restart),
we must know the previous device state.

Whenever NM activates, manages or unmanges a device, it will write
the
device-state to /var/lib/NetworkManager/NetworkManager.state The
statefile will get the following new entries:

    [main]
    # When we write the statefile, we also remember the boot-id
    # from /proc/sys/kernel/random/boot_id. We need it to properly
    # detect last_seen_current_boot below.
    boot_id=<ID>


    # A unique keyfile section for each device-entry.
    # The following keys are under this group.
    [d/43]
    # a mandatory ifname
    ifname=eth0

I'm wondering about device renames, which can happen at any point. 
 Should we just use the ifindex here instead?  That won't change for
any device unless the device has been deleted and re-created (or
hotplugged) at which point we can probably consider it a new device
anyway, at least for virtual devices.

    # optionally, the hwaddress if available. Note that the
ifname/hwaddr
    # pair consists the key for the entry. When checking the state
for
    # a device, we first look for an entry which has the same
ifname/hwaddr
    # pair. If not found, we ignore the hwaddr.
    hwaddr=00:11:22:33:44:55

    # the managed state with
    # STATE = true | false
    managed=STATE

    # last-seen timestamp in POSIX-time (time()). This
    # is so that we can prune the statefile from devices
    # that we didn't see for a while.
    last_seen_timestamp=1447082472

    # Had NM a connection on the device active? This
    # only makes sense for managed=true. Note that
    # this entry will be ignored when the boot-id differs,
    # that means: it's only considered after a restart, not
    # after system reboot.
    last_connection=$CONNECTION_UUID


The last_seen_timestamp is there to eventually garbage collect the
state entry. Note that we write a state-entry for every device we
see.
Docker for example creates veth devices with unique names so we must
eventually forget about devices that we didn't see for a while.
I think NM should handle <configurable> many devices and if the state
-
list grows larger, we must prune the older ones.

I don't see a great need to make the number configurable right now.  I
know some systems have hundreds of software devices, but maybe we just
put a cap on the number (500?) and see if it needs to be configurable
through experience.


Whether to manage a device
==========================

When NM sees a new device, it will very early decide whether to
manage
it (nm_device_finish_init()).

Preferably, we look at the statefile. If the statefile from a
previous
NM run indicates whether the device should be mananged/unmanged, do
it.
That means, when the user sets
  nmcli device eth0 set managed on|off
we will remember that decision in the statefile (yeay).

In the absence of a statefile entry, we obey other configuration,
such
as udev's NM_MANAGED or NM.conf's unmanaged-devices.

In the absence of any configuration, we manage hardware devices but
don't manage software devices.

Do you mean how NM currently does, where software devices it creates
are managed, but externally created ones are unmanaged by default?  If
so I agree.


Note that NM_UNMANAGED_DEFAULT_UNMANAGED goes away (WIP: refactor
default-unmanged bug 746566 , lr/default-unmanaged-bgo746566).



Unmanaged Devices
=================

For unmanaged devices, that's pretty much it. The device is in state
unmanaged and NM doesn't touch it. No IFF_UP, no sysctl change, etc.
User can manage the device anytime via `nmcli device set eth0 managed
yes` or just by activating a connection on it (unless the device
cannot
be managed for other reasons like NM_UNMANAGED_PLATFORM_INIT).

Note that even for unmanaged devices, we already expose the
Ip4Config/Ip6Config objects and other runtime information. So the
user
can see the IP configuration also for unmanaged devices.

What we will do here however, if we succeed to
nm_device_generate_connection() on the device, we will add a new in-
memory NMSettingsConnection and pretend that that connection is
active
on the device. That setting will be updated whenever the device
changes.
We will not nm_utils_match_connection(), we just generate a new one.
However, the user can persist the in-memory connection and he can
even
activate the connection (which makes the generated-unmanaged
-connection 
persistent and manages the device). Note that this is non-destructive
(see next point).

Agreed.


Activating a connection and Assuming Connections
================================================

Whenever we activate a connection on a device, we *always* try to
preserve the existing state.
It must always be as non-destructive as possible:
E.g.
  - if the device is already IFF_UP, we don't down&up it before 
    activating it.


Some kernel stuff requires an up/down cycle though, like if the kernel
doesn't support userspace IPv6LL we need to do this to ensure that the
device gets an IPv6LL address.

  - if the device already has an IP address we anyway want to add, we
    ensure that we don't delete it during activation.
  - re-connecting to the same network, doesn't break your
connectivity

This will violate some existing behavior "guarantees" though.  Long ago
we initially allowed this because of stupid wifi drivers where a wedged
driver could be re-initialized.  I don't think this is case is very
relevant any more though.  Still, it's a behavior change, because to
ensure connectivity doesn't get broken on re-up, we can't re-initialize
any L2 properties like WiFi or WWAN or mess with pppd for PPPoE.

  - re-upping a connection doesn't break connectivity.

"assuming a connection" then means the same as "activating a
connection". This concept, doesn't exist anymore -- which we will
always try to do it non-destructively and gracefully.

That I certainly agree with.


Managed Devices and Assume Connections
======================================

Whenever a device becomes managed we check which connection to
activate:

- After a `nmcli connection up`, it is clear which connection to
activate.

- after nm_device_finish_init(), if the statefile indicates a
previous
connection of the same boot, we check whether that connection is
currently active on the device. This is the only time we do
nm_utils_match_connection(), but only to skip activating the last-
stored connection if it looks like it was not active.

- Otherwise, and after `nmcli device set eth0 managed yes`, we search
for a connection to autoactivate. If we find one, we activate it
(gracefully). If not, transition to NM_DEVICE_STATE_DISCONNECTED /
UNAVAILALE including clearing the IP configuration so that the device
is truly disconnected. Especially, we don't try to activate the
generated-unmanaged-connection and we don't create any connections
(as
we do currently).

Might be worth pointing out that devices can't get to this point with
existing configuration.  Since if they had existing configuration
they'd already be "assumed".  Also, what about devices that are
unmanaged, but active-and-assumed and the user sets them to be managed?
 Ideally nothing happens, so I would add a case to this section saying
that "if the device is already active and becomes managed NM preserves
the existing connection without touching the device."

Dan


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]