Re: SARA-R4 unreliable LTE-M connection problem



Hei hei,

just answering myself with what I found so far …

Am Tue, Jun 15, 2021 at 04:52:58PM +0200 schrieb Alexander Dahl:
Hello everyone,

I need some help to further debug a mobile broadband modem connection problem.

We are using a mikroe LTE IoT Click board [1] with an u-blox SARA-R410M-02B 
cellular modem (LTE-M, NB-IoT) to connect some custom embedded ARM SoC based 
hardware to the internet. The LTE module is connected through the serial UART 
only.

This is somewhat cumbersome. RTS/CTS for hardware flow control is
currently not working, probably due to issues with the old 4.9 kernel
on i.MX6. However I have trouble reading the datasheets for the
SARA-R4 modules regarding flow control. In general you can choose
between hardware (RTS/CTS), software (xon-xoff) and none, and
different hardware variants seem to support different things. Options
for ModemManager and NetworkManager overlap at least for rts/cts. 

In some places hardware flow control is highly recommended, while in
other places it seems the module does not support that.

The setting automatically chosen by ModemManager/NetworkManager at
least works sometimes, the log indicated however it tried to switch to
hardware flow control, although that could not have worked correctly
according to oscilloscope measurements of the rts/cts lines.

The current software stack is running on a custom ptxdist based board support 
package (BSP) with Linux kernel 4.9.201, ModemManager 1.16.6, NetworkManager 
1.30.4, and pppd 2.4.9. I have full control over the software, and can apply 
and test patches if needed.

I will updat the kernel to v5.10 series to rule out problems with
rts/cts, we need that to test with a different, SARA-R5 based module.
Maybe that helps somehow?

Provider is Deutsche Telekom (DT), we are using some special SIM cards in some 
so called Business Smart Connect plan.

They were actually kind and willing to help, provided me with some
helpful information on roaming etc.

The symptoms we face are like this: after reboot of the whole system, 
NetworkManager successfully connects. We see that both in Linux, there's a 
ppp0 device with the correct IPv4 address, route setup looks fine, resolv.conf 
looks fine, `mmcli -m 0` shows the modem is connected, `journalctl -u 
ModemManager` looks fine and so does `journalctl -u NetworkManager`. We can 
see the modem is connected in the dashboard provided by DT [2].

However we can't receive any data. :-/

Problem persists.

After roughly 10 minutes (some random time between 9 and 11 minutes) we get a 
disconnect (LCP terminated by peer), and this happens always, every time. 
Sometimes after automatic reconnect, we can send/receive data then, but 
reconnect is not always successful. :-/

See nm journal output of such a disconnect:

      Feb 01 00:10:17 unit pppd[230]: LCP terminated by peer
      Feb 01 00:10:17 unit pppd[230]: nm-ppp-plugin: status 8 / phase 'network'
      Feb 01 00:10:17 unit NetworkManager[230]: LCP terminated by peer
      Feb 01 00:10:17 unit pppd[230]: Connect time 9.8 minutes.
      Feb 01 00:10:17 unit pppd[230]: nm-ppp-plugin: status 5 / phase 'establish'
      Feb 01 00:10:17 unit NetworkManager[230]: Connect time 9.8 minutes.
      Feb 01 00:10:17 unit NetworkManager[230]: Sent 22602 bytes, received 0 
bytes.
      Feb 01 00:10:17 unit pppd[230]: Sent 22602 bytes, received 0 bytes.
      Feb 01 00:10:17 unit NetworkManager[130]: <info>  [1612138217.8665] device 
(ppp0): state change: disconnected -> unmanaged (reason 'connection-assumed', 
sys-iface-state: 'external')
      Feb 01 00:10:20 unit pppd[230]: nm-ppp-plugin: status 11 / phase 
'disconnect'
      Feb 01 00:10:20 unit NetworkManager[230]: Connection terminated.
      Feb 01 00:10:20 unit pppd[230]: Connection terminated.
      Feb 01 00:10:21 unit pppd[230]: Modem hangup
      Feb 01 00:10:21 unit pppd[230]: nm-ppp-plugin: status 1 / phase 'dead'
      Feb 01 00:10:21 unit NetworkManager[230]: Modem hangup
      Feb 01 00:10:21 unit NetworkManager[130]: <info>  [1612138221.8974] device 
(ttymxc4): state change: activated -> failed (reason 'ip-config-unavailable', 
sys-iface-state: 'managed')
      Feb 01 00:10:21 unit pppd[230]: Exit.
      Feb 01 00:10:21 unit pppd[230]: nm-ppp-plugin: cleaning up
      Feb 01 00:10:21 unit NetworkManager[130]: <error> [1612138221.9571] kill 
child process 'pppd' (230): failed due to unexpected return value -1 by 
waitpid (No child processes, 10) after sending SIGTERM (15)

This is all not the root cause of the problem. Provider disconnects
inactive (no data transmitted/received) connections after some
timeout, things go wrong later, usually with the SARA-R4 not answering
to AT commands anymore and ModemManager dropping the modem eventually.

What I could not get up to now is logs from a successful connection,
to compare with the failing connections. This could shine some light
on the root cause? :-(

I'm currently struggling to debug the whole thing. I see at least 4 components 
interacting (kernel, mm, nm, pppd), and I'm not sure where to start debugging, 
but I think nm is worth a try.

I get logs as shown above, however I could not get NetworkManager to increase 
log level. I tried to set it in /etc/NetworkManager/NetworkManager.conf like 
this:

      root@unit:~ cat /etc/NetworkManager/NetworkManager.conf
      [main]
      plugins=ifupdown,keyfile
      rc-manager=file
      
      [ifupdown]
      managed=false
      
      [logging]
      domains="MB:DEBUG,PPP:DEBUG"

This worked:

    level=DEBUG

The connection itself is defined like this:

      root@unit:~ cat /etc/NetworkManager/system-connections/gsm-ttymxc4
      [connection]
      id=gsm-ttymxc4
      type=gsm
      interface-name=ttymxc4
      permissions=
      autoconnect=yes

Must be:

    autoconnect=true

      autoconnect-retries=0
      
      [gsm]
      apn=iot.telekom.net
      
      [ipv4]
      dns-search=
      method=auto
      
      [ipv6]
      addr-gen-mode=stable-privacy
      dns-search=
      method=auto

Set ipv6 to method=ignore for now, because I read about problems with
IPv6 with u-blox modules …

I'm a little puzzled about that log message:

      Feb 01 00:59:34 unit NetworkManager[342]: <warn>  [1612141174.8636] config: 
invalid logging configuration: Unknown log level 'DEBUG"'

Can certain log levels set deactivated by meson options on build?

Yes, indeed. I had to change meson build option "more_logging" from
false to true to enable debug level log messages.

So far.

Greets
Alex



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]