Re: Package manager integration with BuildStream
- From: Tristan Van Berkom <tristan vanberkom codethink co uk>
- To: Antoine Wacheux <awacheux blp gmail com>, buildstream-list gnome org
- Subject: Re: Package manager integration with BuildStream
- Date: Fri, 27 Apr 2018 19:22:54 +0900
Hi Antoine,
Thanks for getting the ball rolling on this serious problem.
On Thu, 2018-04-26 at 14:20 +0000, Antoine Wacheux wrote:
Hi all,
This thread's purpose is to discuss about how do we integrate build systems
that
have package managers within BuildStream. I'll, first, state what is the
problem, then I'll propose two different solutions.
# Problem statement
Most of the popular languages these days have an embedded package manager.
This
package manager is used to specify dependencies and, in some cases, it
facilitates the publication of the project on sharing platforms. Examples of
package manager are pip for python, npm for the JavaScript ecosystem or
Cargo
for Rust.
Ok I'm going to dump most of my reply here instead of inline commenting
for today.
First of all, I'll say that I am not interested in any external .bst
file generators, I think they run counter to the design and if they
exist, can exist completely outside of the scope of BuildStream as a
tool, so we need not discuss them here.
Another approach
~~~~~~~~~~~~~~~~
There is another approach which I fairly dislike but will mention
for completeness, as we've been using this for GNOME's rust
projects. This is the cargo-fetcher project here:
https://gitlab.com/BuildStream/cargo-fetcher/
What the above does in a nutshell, is externalizes the process of
fetching the dependencies which projects need, and commits the
results to an external repository; in the case of rust we call
these "vendored crates".
In the pipeline, we introduce a "crates.bst" import element which:
o Places the crates repo content at /usr/share/crates
o Adds a system wide crates configuration file, which informs
cargo that it should look at /usr/share/crates instead
of doing crazy things, like trying to contact the internet.
Now, I don't like the above approach much either, but in the case of
cargo/rust, it is a bit better because it doesnt require that anyone go
installing the rust toolchain on their host just because one package
out of 500 happens to use rust - eventually, when cargo and rustc
become more commonly available on distros, a source plugin to do this
legwork would be better.
Your fist proposed solution here is going in the right direction, but I
think can be simplified, also I dont like making sources "treeish" in
the way you did, it would be nice to keep them in a flat list.
First, let's try to think about some commonality and rules for what we
could handle with useful plugins, and see if this covers the grounds,
also, lets call these "Source package managers" for technical purposes,
they are package managers but are specifically for source code as far
as I can see, not system installed binaries.
* Source package managers are usually able to discover the
dependencies by way of reading the depending source package.
This can be actual source code, or metadata files like Cargo.toml
or python's setup.py.
* These source package managers MUST be able to obtain the required
code and place it in the depending source package's subdirectory at
build time.
This is to say, that as much as cargo would love to put all the
downloaded crates in some system wide or user wide location, we
MUST have a way to beat it into submission, and force it to
download the requirements into a specific location, like ./crates
or ./vendor
* These source package managers MUST have a technique for identifying
an exact set of sources, such that a "ref" is a constant and there
is a guarantee that you can never, ever get different data for the
same ref in different fetch() sessions.
* These source package managers MUST never take into account anything
from the host system environment, or at least have configuration
enabling this functionality (i.e. we can NEVER allow Source
implementations to introduce host contamination).
SourceTransform approach
~~~~~~~~~~~~~~~~~~~~~~~~
Designing a solution for situations which conform to the above points
can potentially be straight forward.
I would suggest that we consider a "SourceTransform" kind of source,
which is also a Source but behaves a little differently.
* A SourceTransform has an additional directory in context, which
contains the result of all previous sources.
* It is an error to ever place a SourceTransform *before* a regular
Source in an element declaration.
* SourceTransform.track()
This requires that previous Sources are not only tracked, but that
they are also *fetched* for the tracked version, so we know that
all previous sources are available to stage.
Running SourceTransform.track() involves first staging all the
previous sources to a temporary directory, and then running
SourceTransform.track()
The result of SourceTransform.track() is an updated ref, like any
other Source.
Taking rust as the example of choice, the result of it's track()
implementation is a simple python dictionary representation of
a Cargo.lock file.
* SourceTransform.fetch()
The result of the transform's fetch() implementation is that the
transform will download the precisely required versions of all
dependencies according to it's own ref, and cache them as normal
in the source cache.
Unlike SourceTransform.track(), SourceTransform.fetch() does not
require the context of the previous sources.
* Element.stage_sources()
When it comes time to staging sources, all sources are staged in
order in the regular way, such that:
o The actual element's source code is staged first
o The transform cached result is placed somewhere in the
source's subdirectories where we expect that it will be found
o Additional patches or downloads of auxiliary resources can
still happen at any time here
An example of what the YAML might look like, for a rust package, might
be something like this:
kind: rust
sources:
- kind: tar
url: downloads:thispackage.tar.xz
- kind: cargo
In the above example, we might expect to have a 'rust' element which
would take care of informing the build system that it should be looking
for it's external dependencies, at precisely the location where the
'cargo' SourceTransform placed it in the fully staged build directory,
and there is no extra typing for the user.
Otherwise, we might have an 'autotools' or 'meson' element using this,
in which case it *might* require some prepended configure commands to
ensure that the build system finds the crates at the correct location.
Note however, for the specific case of cargo/rust, there is a
prioritized configuration file, which the 'cargo' SourceTransform
plugin can additionally create at the root of the build directory, so
we could have the 'cargo' plugin at SourceTransform.stage() time, do
the following:
o Create a ./vendor directory containing the crates
o Create a .cargo/config file in the root of the build tree
which informs cargo that it should look for dependencies
in the ./vendor directory.
Of course, plugins can introduce their own specific configuration
options, which can help us to deal with special circumstances and
corner cases, such as rust packages which already have a
.crates/config, or a ./vendor directory, and what to do in those cases.
While this is completely different from your first proposed solution,
it is in the same vein as we prefer automation and fitting into the
BuildStream ecosystem by using track()/fetch()/stage() in the regular
ways.
How do you like SourceTransform() ?
Any other great ideas that differ from the two presented ideas ?
Cheers,
-Tristan
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]