Re: Package manager integration with BuildStream



Hi Tristan,

On Fri, Apr 27, 2018 at 12:23 PM Tristan Van Berkom <tristan vanberkom codethink co uk> wrote:
[...]
First of all, I'll say that I am not interested in any external .bst
file generators, I think they run counter to the design and if they
exist, can exist completely outside of the scope of BuildStream as a
tool, so we need not discuss them here.

Good.  Having heard that approach a number of times it just externalizes
too much, and puts an additional burden on anyone working with an
integration that contains elements involving additional steps.

[...]
Now, I don't like the above approach much either, but in the case of
cargo/rust, it is a bit better because it doesnt require that anyone go
installing the rust toolchain on their host just because one package
out of 500 happens to use rust - eventually, when cargo and rustc
become more commonly available on distros, a source plugin to do this
legwork would be better.

You raise an interesting point there; the need of [compatible] host tools
to deal with certain element types.

Your fist proposed solution here is going in the right direction, but I
think can be simplified, also I dont like making sources "treeish" in
the way you did, it would be nice to keep them in a flat list.

+1.  I prefer the ordering to be significant over a tree approach.  It keeps
things simpler to reason about.

First, let's try to think about some commonality and rules for what we
could handle with useful plugins, and see if this covers the grounds,
also, lets call these "Source package managers" for technical purposes,
they are package managers but are specifically for source code as far
as I can see, not system installed binaries.

  * Source package managers are usually able to discover the
    dependencies by way of reading the depending source package.

    This can be actual source code, or metadata files like Cargo.toml
    or python's setup.py.

  * These source package managers MUST be able to obtain the required
    code and place it in the depending source package's subdirectory at
    build time.

    This is to say, that as much as cargo would love to put all the
    downloaded crates in some system wide or user wide location, we
    MUST have a way to beat it into submission, and force it to
    download the requirements into a specific location, like ./crates
    or ./vendor

Allow me to interpret this as: a package manager that cannot support
this, cannot be supported by BuildStream.

Conceptually +1.

  * These source package managers MUST have a technique for identifying
    an exact set of sources, such that a "ref" is a constant and there
    is a guarantee that you can never, ever get different data for the
    same ref in different fetch() sessions.

+1.

  * These source package managers MUST never take into account anything
    from the host system environment, or at least have configuration
    enabling this functionality (i.e. we can NEVER allow Source
    implementations to introduce host contamination).

One question is, can we have a sandbox that can be invoked at track/fetch
time, that does have network access, to avoid having to have host tools?
And at the same time isolate the tool from the host.
That is, can we provide package manager specific functionality without
additional host installation?
Let's park that question for now, it's orthogonal.
 
SourceTransform approach
~~~~~~~~~~~~~~~~~~~~~~~~
Designing a solution for situations which conform to the above points
can potentially be straight forward.

I would suggest that we consider a "SourceTransform" kind of source,
which is also a Source but behaves a little differently.

  * A SourceTransform has an additional directory in context, which
    contains the result of all previous sources.

  * It is an error to ever place a SourceTransform *before* a regular
    Source in an element declaration.

+1.
 
  * SourceTransform.track()

    This requires that previous Sources are not only tracked, but that
    they are also *fetched* for the tracked version, so we know that
    all previous sources are available to stage.

    Running SourceTransform.track() involves first staging all the
    previous sources to a temporary directory, and then running
    SourceTransform.track()

    The result of SourceTransform.track() is an updated ref, like any
    other Source.

    Taking rust as the example of choice, the result of it's track()
    implementation is a simple python dictionary representation of
    a Cargo.lock file.

+1.  This is fully in line with how I was thinking about this.  It results in
a simple declaration of the element.  Allows the tracking of elements
that involve package managers to be managed by BuildStream in the
common workflow.  And it keeps what is being tracked transparent.
 
  * SourceTransform.fetch()

    The result of the transform's fetch() implementation is that the
    transform will download the precisely required versions of all
    dependencies according to it's own ref, and cache them as normal
    in the source cache.

    Unlike SourceTransform.track(), SourceTransform.fetch() does not
    require the context of the previous sources.

+1.

  * Element.stage_sources()

    When it comes time to staging sources, all sources are staged in
    order in the regular way, such that:

      o The actual element's source code is staged first
      o The transform cached result is placed somewhere in the
        source's subdirectories where we expect that it will be found
      o Additional patches or downloads of auxiliary resources can
        still happen at any time here

An example of what the YAML might look like, for a rust package, might
be something like this:

    kind: rust
    sources:
    - kind: tar
      url: downloads:thispackage.tar.xz
    - kind: cargo

In the above example, we might expect to have a 'rust' element which
would take care of informing the build system that it should be looking
for it's external dependencies, at precisely the location where the
'cargo' SourceTransform placed it in the fully staged build directory,
and there is no extra typing for the user.

Otherwise, we might have an 'autotools' or 'meson' element using this,
in which case it *might* require some prepended configure commands to
ensure that the build system finds the crates at the correct location.

Note however, for the specific case of cargo/rust, there is a
prioritized configuration file, which the 'cargo' SourceTransform
plugin can additionally create at the root of the build directory, so
we could have the 'cargo' plugin at SourceTransform.stage() time, do
the following:

  o Create a ./vendor directory containing the crates
  o Create a .cargo/config file in the root of the build tree
    which informs cargo that it should look for dependencies
    in the ./vendor directory.

Of course, plugins can introduce their own specific configuration
options, which can help us to deal with special circumstances and
corner cases, such as rust packages which already have a
.crates/config, or a ./vendor directory, and what to do in those cases.


While this is completely different from your first proposed solution,
it is in the same vein as we prefer automation and fitting into the
BuildStream ecosystem by using track()/fetch()/stage() in the regular
ways.


How do you like SourceTransform() ?

+1.
 
Any other great ideas that differ from the two presented ideas ?

Thanks Tristan.  I am already biased towards this approach.  Would
love to hear if others can find holes in this approach, and/or have an
even more elegant proposal.
 
Cheers,
    -Tristan

Cheers,

Sander
 
_______________________________________________
Buildstream-list mailing list
Buildstream-list gnome org
https://mail.gnome.org/mailman/listinfo/buildstream-list
--

Cheers,

Sander


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]