Package manager integration with BuildStream



Hi all,

This thread's purpose is to discuss about how do we integrate build systems
that
have package managers within BuildStream. I'll, first, state what is the
problem, then I'll propose two different solutions.

# Problem statement

Most of the popular languages these days have an embedded package manager.
This
package manager is used to specify dependencies and, in some cases, it
facilitates the publication of the project on sharing platforms. Examples of
package manager are pip for python, npm for the JavaScript ecosystem or
Cargo
for Rust.

As an example, Dep is a package manager for Go (
https://golang.github.io/dep/).
 From a dependency configuration file, Dep resolves dependencies to explicit
versions, and generates a lock file, which contains specific commit hashes
for each dependency to use.  This is how it ensures that dependencies remain
the same across builds.  Next it downloads and install the resolved
dependencies in a subdirectory (vendor/) of the source folder.

In addition, some build systems come directly with an integrated package
manager that allows to specify the dependencies in the build specification.
Ant, Maven or Gradle are examples of such build systems.

Even though they are handy, those package managers are conflicting with the
way
BuildStream is supposed to work. In BuildStream, dependencies should be
provided via other BST files only.

The question is how do we integrate this "package manager" workflow within
the
BuildStream ecosystem?

In the following sections, I'm going to detail some solution ideas.

# Solution 1 : Pipelining source elements

The core idea is that the package manager itself is modeled as a source
element. This source element will be responsible of invoking the underlying
package manager when the sources are being fetched.

If we take Go's dep as an example, to build a Go project, one BST file is a
go
build element whose source will be a godep element. This godep element
would be
given the url of the repository containing the Gopkg.toml file, which is the
file describing the dependencies of the project.

When we stage the sources, the godep element would clone the repository
containing the Gopkg.toml and then install all the specified dependencies
in that
file. At the end, sandbox would contain the source code of the application
we
want to compile, plus all the dependencies required to compile it.

However, depending on which source the initial Gopkg.toml is fetched from,
the
godep element needs to use either git, svn, http or any other mean to get
it.
This can be done by either implementing a big plugin that handle all theses
different technologies or by a set of small plugins such as `godep_git`,
`godep_svn`, etc. None of these solutions seems acceptable. This is where
the
concept of pipelining source elements comes in.

The idea is to modify the semantic of the source section of BST files to
users
to specify hierarchy of sources. One would end up with a BST file such as:

```yaml
# skiped elements specific entries
sources:
   - kind: godep
     ref: "<whatever ref>"
     deps:
       - kind: git # To change the gopkg.toml provdier, one just have to
change this kind
         repo: github:myorg/my_go_repo.git
         ref: <whatever commit sha>
```

The godep element depends on the underlying git element. `my_go_repo`
contains
the Go source code of the application as well as the Gopkg.toml file.

```
-src
     <go project sources>
- Gopkg.toml
- Gopkg.lock
```

The new behavior is that when BuildStream needs to perform an action of a
specified source element, it must stage the underlying dependent source
elements.

As an example, let say that the user issued a command that triggers the
fetching of the above godep element.

The first step for BuildStream consists in staging the git source
dependency.
We assume that the tracking and the fetching of the git element has been
done
by a means that still needs to be figured out.  After that, BuildStream is
able
to figure out a path that contains the staged source of the git element.

Then, BuildStream starts fetching the Go dep source element. The key point
is
that the previously staged git sources are now visible to the godep element.
This mean that any tool used to fetch the godep source element can now use
any
file from the staged git element to accomplish its work.

In this particular example, the dep command, executed by the godep plugin,
will
be able to use the Gopkkg.toml file from the cloned git repository in order
to
fetch the dependencies to compile the Go project.

The final state is shown below. This is that will be staged into the
sandbox by
BuildStream during the build phase.

```
-src
     <go project sources>
-vendor # This directory has been created by the godep source element
     <go dependencies>
- Gopkg.toml
- Gopkg.lock
```

And this is what the BST file looks like with all the references set.

```yaml
# skiped elements specific entries
sources:
   - kind: godep
     ref: "<content of Gopkg.lock>"
     deps:
       - kind: git # To change the gopkg.toml provdier, one just have to
change this kind
         repo: github:myorg/my_go_repo.git
         ref: <whatever commit sha>
```

As another example, let say that we want to build a python project that has
some dependencies coming from `pip`. Those dependencies are specified via a
`requirements.txt` file, and the project itself is hosted on a remote git
repository.

The source of the BST file that build this project will look like:
```yaml
sources:
     - kind: pip
       ref: <the pip lock file>
       deps:
         - kind: git
           repo: github:myorg/my_python_project.git
           ref: <HEAD's sha>
```

Similarly to the godep example, when the user wants to fetch the
dependencies,
BuildStream will first stage the git element. Then it will use the staged
git
directory as the current working directory to execute the fetch action of
the
pip source elements.

The above new format describes a tree of source elements. The new semantic
rules for
the sources section of a BST file would be:

1. Sibling nodes can't see each other's content
2. Sibling nodes contents are merged
3. A node can see the merged content of all its children

Note that, from this point of view, the actual sources section format is
equivalent to a one-level tree where every sources are siblings and have the
root of the tree as parent.

Also note that the BuildStream source tree is completely distinct from the
dependency tree processed by go's dep, python's pip, or any other package
manager. The package manager's dependency tree is totally invisible from
BuildStream's perspective.

This new semantic have some drawbacks though.

There are some points that still need to be defined. Tracking mechanism
must be
redefined to be able to handle a tree of sources. Do we have to track the
dependencies when we want to track a specific source element? Should the
ref of
a source element be dependent of the ref of its dependencies?  What about
cache
keys?

Another concern with this approach is that we delegate some part of the
BuildStream purpose, which is dependency tracking, to the underlying package
manager. In addition to that, some package manager can't guarantee that they
can recreate the same environment from the same input (eg: Non-pinned pip
dependencies). We won't be able to guarantee build reproducibility if those
package managers are used.

# Solution 2 : A conversion script

This approach is the opposite solution. Instead of relying on package
managers
to build and install a BST file dependency, a script is used to create BST
files from a package manager dependency specification.

For example, in the case of go, this script will be able to parse the
gopkg.toml and get the requested dependencies from the source. Then the same
action will be performed recursively on the dependencies. At the end, we
would
get a bunch of BST files that we can use to build our project completely
within
BuildStream.

This approach also has drawbacks.

First of all, the user experience is not as direct as before because an
external tool is involved in the process.

Then, because BuildStream doesn't have a proper support of element
versioning.
Handling multiple dependency versions is not an obvious problem. For
instance,
dep1 could depend on lib:v3 and somewhere else in the dependency tree, dep2
could depend on lib:v4.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]