[BuildStream] Solving the non deteministic "git describe" issue



Hello list,

For reference:
https://gitlab.com/BuildStream/buildstream/issues/487

I have proposed on ticket #487 a solution for dealing with non-
determinism of tags on git repository and `git describe`. However I
have been told we should discuss it on the mailing list.

Background
----------

`git describe` uses tags on repository in order to generate a human
friendly name for the commit. If the current commit is tagged, then  it
is the tag. Otherwise, it is the tag combined with a shortened hash.
This is of course configurable.

This is commonly used for versioning. Release commits are tagged with
their version, and then the project get the version back by calling
`git describe`.

Issues with git tags
--------------------

Git tags are not immutable. They are not part of the hash of the
commit. It is possible to change the commit for which a tag is aliased
to.

For that reason it is possible that builders fetch different states of
a repository and build the exact same reference with different tags.
This potentially changes the output of `git describe`. The build is not
repeatable anymore.

Notes on the git history
------------------------

1.2 use to keep the whole `.git` directory. But this can be big. To
reduce the size of build artifacts, in master we remove the directory
completly. However `git describe` cannot work at all. We plan to use a
shallow clones of the repository in order to fix that.

Proposed solution for git tag
-----------------------------

To make git tags immutable, we can store them in the .bst file or the
project.refs. Tracking can fetch the tag and store it. Then we retag 
the shallow cloned repository with the right tag at the expected hash.

Builds then are repeatable because `git describe` will always output
the same.

Because tags are not always on the hash we asked for we need to store
which hash the tag is for and shallow clone down to that hash.

Also `git describe --first-parent` might pick up a different tag we
need to store also that tag and hash if it is different. We would
shallow clone with two branches going to two different ancestors. When
it comes to the data format we can just support a list of pair (hash x
tag name).

Now, should we enable this feature by default if we find at least one
tag during tracking?

It is obvious that if there is no tag, we probably do not want it by
default, otherwise we need to fully cloned repository. This has to do
with the fact `git describe` can also output a revision number, which
is the number of commits since last tagged commit.



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]