Re: deployment fallback and u-boot



On 23/08/2017 03:54, Colin Walters wrote:

Hi,

On Tue, Aug 22, 2017, at 05:52 AM, Guy Shapiro wrote:
Hi all,

I am working with OSTree in the context of embedded device updates,
utilizing the meta-updater Yocto layer.
Great!

I want to add support for fallback in case of unbootable deployment.
A lot boils down to the precise definition of "unbootable"; there's
a related issue here:
https://github.com/ostreedev/ostree/issues/380

This is also potentially interesting/related:
https://github.com/projectatomic/rpm-ostree/pull/892

What specific scenarios are you thinking of here?  I can think of some:

 - Kernel breaks on just one of a few supported device types
 - Filesystem/drive corruption affecting just new tree
 - Hardware-independent logical error (e.g. shared library update breaks
   critical userspace daemon startup)

The last one for example I'd say should usually be caught by testing
in VMs/containers on the OS creator side, but perhaps it's a userspace 
daemon that interacts
with hardware (think video capture or the like)?
The main scenario I aim is the first. My system is going to be slightly
inaccessible during it's lifetime. I want to make sure that even if some
update render some units connectionless, I won't be locked out of the
device.
Hopefully the tests preformed before publishing updates will catch all
the issues, and the fallback mechanism will never be used. However, I
try to reduce the cost of such issues if they do happen.

Looks like most of the needed logic is already in place:
OSTree stores the last bootable deployment. On bootloaders other than
u-boot, both the new deployment and the last booted one available on the
boot loader menu.  U-boot itself has mechanism for counting failed boots
and running an alternative boot commands in case the count exceeds a
limit. [1]
How does it measure "failed"?   Does userspace have to set something
in a config file in /boot or so?
U-Boot supports storing the boot count on several non-volatile memory
types. On my case it will probably be stored on an EEPROM chip. Another
option is to store the count on "u-boot environment" storage, resides on
configurable non-partitioned space.
The userspace is responsible to know where the count stored, and to zero
it when the system boots "successfully". The userspace program will do
some sanity checks before preforming this operation.

When the sanity check passes, I plan to also remove the old deployment
entirely, preventing unintended rollback from working version.

Additional hardware "watchdog" component resets the device in case the
kernel is unable to load at all.

One of my inspirations is the rollback mechanism of Mender.io updates
system:
https://docs.mender.io/development/architecture/overview#commit-and-rollback

The options I see are:
1) Write a second file, "uEnv2.txt", to "/boot/loader/".
2) Add kernel_image2/ramdisk_image2/ostree2 lines to "uEnv.txt"

Do you have reason to prefer one of these options? Other options I
haven't thought about?
Does this approach sounds reasonable at all?
Honestly, I don't have a lot of expertise with u-boot myself; not enough
to be able to have an informed opinion on 1) vs 2) there.   Maybe someone
else does?
As nobody shared an opinion, I picked the option that looked easier to
implement; that was 2).  I'll send a pull request soon.

I did take a glance at
http://git.denx.de/?p=u-boot.git;a=blob;f=README;h=392b5fdbbbba334b3844b543c1e38eba1b4b0adf;hb=HEAD
and
http://git.denx.de/?p=u-boot.git;a=tree;f=doc;h=d43977d6bf2ced7ff3eb0f1b81080c967fd908fa;hb=HEAD
for the first time....impressive.



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]