Why NM seems to behave differently in initrd from in real root?



Hi NM developers,

This is Coiby from the Red Hat Kernel Debug team who is responsible for Fedora/RHEL's kexec-tools. Currently, kexec-tools parses ifcfg-* or .nmconnection to build up dracut cmdline parameter like ip= to set up kdump initrd network which is tedious and error-prone. Recently, I'm implementing a different approach which is to set up kdump initrd network by copying connection profiles from real root to initrd directly. However, one unexpected thing is NM seems to behave differently in initrd from in real root and the same connection profiles copied from the real root lead to different result in kdump initrd. So is there a general reason why NM behaves differently in initrd and real root? Is it a better approach that kexec-tools sets up kdump initrd network by copying connection profiles from real root to kdump initrd? It will be appreciated if NM developers could provide answers or comments on these questions since you are experts on this type of problems.

For the details of how NM behaves differently in kdump initrd, I've reported some of the inconsistent behaviours as bugs [1] [2]. connection.wait-device-timeout=6000 and connection.autoconnect=false could be used to bypass [1] and [2] respectively so the same connections could be brought up in initrd. A third issue for which I haven't found a workaround is the case of bridging network over VLAN network over teaming network where I create a teaming network interface which is used as the parent interface of a VLAN interface which is in turn a slave interface of network bridge. The problem is the network bridge sometimes gets the IP address belonging the VLAN subnet but sometimes not. Btw, the third
issue is found on a physical machine and can't be reproduced on a VM.

I've tested the modified kexec-tools [3] by setting up different networks including the aforementioned bridging network over VLAN network over teaming network. Other tests including bridging network over physical interface/bonding network/teaming network/VLAN network, VLAN network over physical interface/bonding network/teaming work and etc. All tests have passed for VM. And except for the bridging network over VLAN network over teaming network, the tests have also passed for one physical machine. But I'm not sure if they are sufficient considering there is machine-specific issue like znet network device. Any suggestion is
welcome.

Thanks!


[1] https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/803
[2] https://bugzilla.redhat.com/show_bug.cgi?id=2007563
[3] https://src.fedoraproject.org/fork/coiby/rpms/kexec-tools/commits/direct_nm


--
Best regards,
Coiby



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]