issues with pppd error handling

I'm looking into couple issues we stumbled on with pppd and
NetworkManager. It's tested on NetworkManager 1.8.4, ModemManager
1.6.8 and pppd 2.4.7. In general there are sporadic issues with
reconnecting when pppd exits because of LCP echo timeout and when it
exits due to other reasons.

In simplest scenario with active pppd connection when pppd crashes/is
forcibly killed and nm pppd plug-in doesn't have a chance to send the
status update via D-Bus to NM the connection is not deactivated (nmcli
c shows it's up).
I've done some analysis and NM does notice the pppd process exiting,
it enters nm-device-modem.c:ppp_failed() but because the "may-fail"
ipv4 setting defaults to TRUE not much happens (the connection losses
IP address and gateway setting but is still active).
The behavior doesn't seem correct as according to the nm-settings(5)
the connection should fail if none of ipv4 and ipv6 are available
(even when they have may-fail setting to TRUE).

There are actually more issues with nm-device-modem.c:ppp_failed().
Normally when pppd session ends (e.g. because of LCP echo timeout) it
seems there are 2 messages sent from pppd which trigger ppp_failed():
Additionally pppd process exiting triggers the ppp_failed() for the
3rd time. In such conditions NM deactivates connection with error:
<warn>  [1511357452.8500] device (ttyACM0): PPP failure in unexpected state 100
This seems to be a side effect of the may-fail logic and the fact that
multiple events result in call to ppp_failed() but this side effect
deactivates the device and triggers reconnect later on (ie fixes the

This though doesn't work always like that: I observed that
sporadically the nm pppd plugin doesn't send the NM_PPP_STATUS_DEAD
state (observed with dbus-monitor). I think it's related with
g_dbus_proxy_call() being used in the plugin code, which is
asynchronous and it may sometimes not complete before pppd exit.
In such case and some amount of bad luck ppp_failed() is called
because of NM_PPP_STATUS_DISCONNECT and pppd exiting only (i.e. only
two times). If in such case ip4_state is configured (with may-fail)
and e.g. ip6_state == IP_CONF then the connection will not be
deactivated and reconnection won't be triggered.

I think for the LCP echo timeout error case setting may-fail to FALSE
on ipv4 on ppp connections should solve the issue. But if anything
from above makes sense to you it seems there are at least couple bugs
there. I think the simplest solution would be to replace
nm-device-modem.c:ppp_failed() contents with:

nm_device_state_changed (device, NM_DEVICE_STATE_FAILED, reason);

Or is there any case when pppd failure should leave the connection active?
Additionally perhaps g_dbus_proxy_call in
nm-pppd-plugin.c:nm_phasechange() could be changed to sync

Best regards, Piotr.

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]