Re: API question & Rust API bindings



On Sat, 27 Oct 2018, at 10:10, Felix Krull wrote:
I'm also interested in what you're using libostree for, if you care to share?
(But you don't have to of course!)

My original goal was to build OSTree-based systems from Debian, i.e. like rpm-ostree compose but using Debian packages and repositories. My sub-goal with that was that I wanted an OSTree-based system to run on my Raspberry Pis. (Which I could probably do with Fedora as well, to be fair.)

I know endless have done some work on this:

https://github.com/dbnicholson/deb-ostree-builder
https://debconf17.debconf.org/talks/41/

and so have we.  I've not looked at their implementation, but here's a description of ours:

Summary
=======

We commit a debian package index file[3] as kind of lockfile to our source git repository to ensure reproducibility of built ostrees and to help manage security updates.

Background information
===================

I've also got a system for building debian (actually Ubuntu) - based ostree images which isn't public.  It's not really separable from the rest of our build system so it's difficult to share, but I can share some details and lessons-learnt over the last 18 months of using and iterating on it.

We build ostree images on x86 for deployment to an ARMv7 device. So development happens on our developer laptops and it's deployed to the device using ostree - even during development. We're not currently using ostree admin unlock or the like.

Our build system uses:

* ninja to decide when to build what https://ninja-build.org/
* bubblewrap for chroot https://github.com/projectatomic/bubblewrap
* multistrap to create base images https://wiki.debian.org/Multistrap
* qemu with binfmt support so we can run ARM binaries (like apt-get install) inside the chroot. https://wiki.debian.org/QemuUserEmulation
* fakeroot so we don't need to run the build as root but the images will still have the right permissions https://github.com/stb-tester/fakeroot/tree/ostree

Each stage of our build essentially consists of checking out a tree, running a command in the chroot and then checking in the result.  So we create a lot of intermediate trees in the process. Thanks to OSTree this doesn't use much disk space and is quite manageable. It goes:

1. We start with an empty ostree image
2. Run multistrap in there to get a base-system
3. Perform misc other setup like adding users
4. We can then run apt-get install using that as a base to create different roots with different selections of packages as required
5. Add additional binaries and data on top.

The problem
==========

We store all our source in git. Reproducibility is an important requirement for us - when doing a build from the same source we want to end up with an image with the same packages and versions of packages installed no matter what machine we run it on or whether we run it sooner rather than later.  This is complicated with apt because it's more geared to keeping a traditional system up-to-date and the apt-mirrors don't keep the old packages in their indices.

Solution - lockfiles
==============

To fix this we took a leaf from modern programming language package managers.  We use the lockfile concept as used by rust's cargo package manager (cargo.lock[1]) or nodejs's npm (package-lock.json[2]).  The idea is that you have two files, one that is written by hand and lists the packages you want to install, and a second one generated from the first that lists all the versions of the packages and all their transitive dependencies.  This second file is the lockfile.

The key is that you check both files into your git repository.  The lockfile is a complete description of the packages you want to be installed on the target system. This determinism has a few advantages:

1. You can go back to a particular revision in the past and build a functionally identical image
2. Updates to the lockfiles are recorded in git so we can diff between source revisions to investigate any changes in behavior seen.
3. Security updates are now recorded in your git history and can be managed and deployed explicitly.

We have a CI job that runs every night updating the lockfiles - this is the equivalent of an apt-get update.  This kicks off builds and in the morning we can see exactly what packages changed and we have a fresh build with CI passing or failing so we have confidence that the image still works after applying the security updates.  We can then choose to roll it out (the equivalent of an apt-get upgrade).

It turns out that the lockfile is a kind of snapshot of the package metadata from the debian mirrors filtered by the top-level list of packages you want installed - and we implement it in exactly this way.  The format of the lockfile is a debian Package index[3] as used by apt.  This has a number of benefits over a plain list of package names and versions:

1. It contains MD5, SHA1 and SHA256 fields so we can be certain we're using exactly the package we want to be. This is nice and secure without having to faf around with gpg.
2. It (indirectly) contains the URL of the package so we can implement the downloading of the packages external to the chroots where they will be installed
3. apt understands the format, so you can keep a local (filesystem) mirror of the packages and add just the lockfile to sources.list.  The `apt install` in your chroot no longer needs network access.
4. You no longer get `E: Version ‘0.9.63’ for ‘package’ was not found` error messages if the apt mirrors have been updating when attempting `apt install package==0.9.63`

The concepts are good, but my existing implementation leaves something to be desired.  Updating the lockfiles is a heavyweight process that involves performing an `apt-get update && apt-get install` in a container followed by a `dpkg -i` to get the list of installed packages.  In theory this shouldn't depend on entering a chroot at all - all the information required for the update is in the Package files online anyway.  I'll have to see how multistrap does it.

[1]: https://doc.rust-lang.org/cargo/guide/cargo-toml-vs-cargo-lock.html
[2]: https://docs.npmjs.com/files/package-lock.json
[3]: https://wiki.debian.org/DebianRepository/Format#A.22Packages.22_Indices

Ambitions
========

There are improvements I want to make in the future too.  It would be nice to remove apt entirely from the images and use dpkg directly.  Apt is no longer being used to download packages, nor for choosing which packages should be installed - it's just used for determining the installation order of the packages.  If this could be recorded in the lockfile or cheaply calculated at install time apt wouldn't be needed in the image at all.

A (more fanciful) extension to this would be performing some of the dpkg operations in ostree directly for the sake of speed. Extracting packages can be a little sluggish under qemu.  Speed is important: When you want a new package installed you will be installing all of the packages into a fresh image so it's nice if the process is a quick as possible.  multistrap already does this for the base-image.

I would like to extract each deb immediately after downloading into its own ostree and then short-circuit the extracting stage by just combining the ostrees together.  Combining ostrees is fast: see [#1643].  This would also save more disk-space: we'd no longer need to store the debs themselves, but could refer to the contents by ostree ref e.g. the ref dpkg/data/<sha256> might refer to the deb with that sha256.  The lockfile has these SHA256s recorded so you'd know which ostree refs to use.

This is of-course a much larger step - you'd still need to handle the metadata, pre-inst scripts, etc under control.gz which might be a little tricky, but multistrap manages it.

[#1643]: https://github.com/ostreedev/ostree/pull/1643

Apologies for the length.  I wonder if there's a better forum for this kind of information dump.  I think it might make an interesting conference talk, but I'm unsure of the appropriate conference.

Thanks

Will


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]