[RFC] Revise NM behavior for Unmanaged Devices and Assuming Connections (bgo 746440)

From: Thomas Haller <thaller redhat com>
To: networkmanager-list gnome org
Subject: [RFC] Revise NM behavior for Unmanaged Devices and Assuming Connections (bgo 746440)
Date: Mon, 09 Nov 2015 17:43:35 +0100

Hi,


Regarding bug https://bugzilla.gnome.org/show_bug.cgi?id=746440, about
unmanaged devices and assuming connections.

The current behavior is broken and for nm-1-2 this shall be fixed as
follows:

Basically, let NM be less smart and instead of "assuming connections"
let it "activate-gracefully".




NetworkManager State and Device State
=====================================

For doing the right thing when NM starts (after systemctl restart),
we must know the previous device state.

Whenever NM activates, manages or unmanges a device, it will write the
device-state to /var/lib/NetworkManager/NetworkManager.state The
statefile will get the following new entries:

    [main]
    # When we write the statefile, we also remember the boot-id
    # from /proc/sys/kernel/random/boot_id. We need it to properly
    # detect last_seen_current_boot below.
    boot_id=<ID>


    # A unique keyfile section for each device-entry.
    # The following keys are under this group.
    [d/43]

    # a mandatory ifname
    ifname=eth0

    # optionally, the hwaddress if available. Note that the
ifname/hwaddr
    # pair consists the key for the entry. When checking the state for
    # a device, we first look for an entry which has the same
ifname/hwaddr
    # pair. If not found, we ignore the hwaddr.
    hwaddr=00:11:22:33:44:55

    # the managed state with
    # STATE = true | false
    managed=STATE

    # last-seen timestamp in POSIX-time (time()). This
    # is so that we can prune the statefile from devices
    # that we didn't see for a while.
    last_seen_timestamp=1447082472

    # Had NM a connection on the device active? This
    # only makes sense for managed=true. Note that
    # this entry will be ignored when the boot-id differs,
    # that means: it's only considered after a restart, not
    # after system reboot.
    last_connection=$CONNECTION_UUID


The last_seen_timestamp is there to eventually garbage collect the
state entry. Note that we write a state-entry for every device we see.
Docker for example creates veth devices with unique names so we must
eventually forget about devices that we didn't see for a while.
I think NM should handle <configurable> many devices and if the state-
list grows larger, we must prune the older ones.



Whether to manage a device
==========================

When NM sees a new device, it will very early decide whether to manage
it (nm_device_finish_init()).

Preferably, we look at the statefile. If the statefile from a previous
NM run indicates whether the device should be mananged/unmanged, do it.
That means, when the user sets
  nmcli device eth0 set managed on|off
we will remember that decision in the statefile (yeay).

In the absence of a statefile entry, we obey other configuration, such
as udev's NM_MANAGED or NM.conf's unmanaged-devices.

In the absence of any configuration, we manage hardware devices but
don't manage software devices.


Note that NM_UNMANAGED_DEFAULT_UNMANAGED goes away (WIP: refactor
default-unmanged bug 746566 , lr/default-unmanaged-bgo746566).



Unmanaged Devices
=================

For unmanaged devices, that's pretty much it. The device is in state
unmanaged and NM doesn't touch it. No IFF_UP, no sysctl change, etc.
User can manage the device anytime via `nmcli device set eth0 managed
yes` or just by activating a connection on it (unless the device cannot
be managed for other reasons like NM_UNMANAGED_PLATFORM_INIT).

Note that even for unmanaged devices, we already expose the
Ip4Config/Ip6Config objects and other runtime information. So the user
can see the IP configuration also for unmanaged devices.

What we will do here however, if we succeed to
nm_device_generate_connection() on the device, we will add a new in-
memory NMSettingsConnection and pretend that that connection is active
on the device. That setting will be updated whenever the device
changes.
We will not nm_utils_match_connection(), we just generate a new one.
However, the user can persist the in-memory connection and he can even
activate the connection (which makes the generated-unmanaged-connection 
persistent and manages the device). Note that this is non-destructive
(see next point).



Activating a connection and Assuming Connections
================================================

Whenever we activate a connection on a device, we *always* try to
preserve the existing state.
It must always be as non-destructive as possible:
E.g.
  - if the device is already IFF_UP, we don't down&up it before 
    activating it.
  - if the device already has an IP address we anyway want to add, we 
    ensure that we don't delete it during activation.
  - re-connecting to the same network, doesn't break your connectivity
  - re-upping a connection doesn't break connectivity.

"assuming a connection" then means the same as "activating a
connection". This concept, doesn't exist anymore -- which we will
always try to do it non-destructively and gracefully.


Managed Devices and Assume Connections
======================================

Whenever a device becomes managed we check which connection to
activate:

- After a `nmcli connection up`, it is clear which connection to
activate.

- after nm_device_finish_init(), if the statefile indicates a previous
connection of the same boot, we check whether that connection is
currently active on the device. This is the only time we do
nm_utils_match_connection(), but only to skip activating the last-
stored connection if it looks like it was not active.

- Otherwise, and after `nmcli device set eth0 managed yes`, we search
for a connection to autoactivate. If we find one, we activate it
(gracefully). If not, transition to NM_DEVICE_STATE_DISCONNECTED /
UNAVAILALE including clearing the IP configuration so that the device
is truly disconnected. Especially, we don't try to activate the
generated-unmanaged-connection and we don't create any connections (as
we do currently).






Comments?
Thomas

Attachment: signature.asc
Description: This is a digitally signed message part

Follow-Ups:
- Re: [RFC] Revise NM behavior for Unmanaged Devices and Assuming Connections (bgo 746440)
  - From: Thomas Haller
- Re: [RFC] Revise NM behavior for Unmanaged Devices and Assuming Connections (bgo 746440)
  - From: Beniamino Galvani
- Re: [RFC] Revise NM behavior for Unmanaged Devices and Assuming Connections (bgo 746440)
  - From: Dan Williams

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]