Re: [BuildStream] Solving the non deteministic "git describe" issue



On Mon, 2018-10-29 at 21:36 +0100, Valentin David via BuildStream-list wrote:
Hello list,

For reference:
https://gitlab.com/BuildStream/buildstream/issues/487

I have proposed on ticket #487 a solution for dealing with non-
determinism of tags on git repository and `git describe`. However I
have been told we should discuss it on the mailing list.

Background
----------

`git describe` uses tags on repository in order to generate a human
friendly name for the commit. If the current commit is tagged, then  it
is the tag. Otherwise, it is the tag combined with a shortened hash.
This is of course configurable.

This is commonly used for versioning. Release commits are tagged with
their version, and then the project get the version back by calling
`git describe`.

Issues with git tags
--------------------

Git tags are not immutable. They are not part of the hash of the
commit. It is possible to change the commit for which a tag is aliased
to.

For that reason it is possible that builders fetch different states of
a repository and build the exact same reference with different tags.
This potentially changes the output of `git describe`. The build is not
repeatable anymore.

Not only this, but as Richard Maw pointed out at GUADEC, this is
*likely* to happen.

For instance, when smoke testing and building a release of your
software with BuildStream before performing the final step of cementing
it with a tag.

Notes on the git history
------------------------

1.2 use to keep the whole `.git` directory. But this can be big. To
reduce the size of build artifacts, in master we remove the directory
completly. However `git describe` cannot work at all. We plan to use a
shallow clones of the repository in order to fix that.

Proposed solution for git tag
-----------------------------

To make git tags immutable, we can store them in the .bst file or the
project.refs. Tracking can fetch the tag and store it. Then we retag 
the shallow cloned repository with the right tag at the expected hash.

I believe this additional information becomes *a part of the ref*.

This turns the git plugin's `ref` into a dictionary, and the git plugin
needs to remain backwards compatible with older versions where the
`ref` field is a simple string.

Builds then are repeatable because `git describe` will always output
the same.

Because tags are not always on the hash we asked for we need to store
which hash the tag is for and shallow clone down to that hash.

Also `git describe --first-parent` might pick up a different tag we
need to store also that tag and hash if it is different. We would
shallow clone with two branches going to two different ancestors. When
it comes to the data format we can just support a list of pair (hash x
tag name).

Now, should we enable this feature by default if we find at least one
tag during tracking?

It is obvious that if there is no tag, we probably do not want it by
default, otherwise we need to fully cloned repository. This has to do
with the fact `git describe` can also output a revision number, which
is the number of commits since last tagged commit.

An important part of the conversation we had which was omitted from
this proposal is that this functionality *forces* users to update the
refs with `bst track`, where a lot of use cases involve CI ecosystems
where people are manually proposing new refs.

In other words, this imposes a horrible user experience for the regular
use case where the `bst track` guess work is not used at all.

This is however a good proposal to a rather tricky problem. I think
that we should accept the downsides and make the feature opt-in with a
new `git` plugin level configuration.

An edge case to handle and think about here is where a user turns on
the git describe information but has not filled in the additional
fields required at staging time.

I *think* the right thing to do in this case would be to report
Consistency.INCONSISTENT, and perhaps also issue a warning at
`Source.get_consistency()` time.


Finally, I still worry about the case where a user wants to update a
git source to a specific sha without `bst track`, and what the workflow
of this will be - even though this is minimized by making the feature
opt-in.

From recollection, I think you recommended a workaround which was:

  * Edit the .bst file and set the `track` parameter to a specific ref
  * Run `bst track` on the said .bst file
  * Go back and restore the `track` parameter to the desired tracking
    branch (in case you *do* sometimes use `bst track`).

Since I think this is minimal damage, I would be fairly happy with this
behavior.

My only worry here is that contributors might insist on spending their
time trying to automate this further and improve this particular user
experience: Right now I cannot think of any way to improve this without
doing horrible things to the code, like making `bst track` more complex
than it needs to be.

So - I am happy with this proposal, as long as it is clear that I will
not be accepting added core complexity to `bst track`, only for the
sake of improving the user experience outlined above (for those hand
full of sources which might opt into it).

Cheers,
    -Tristan



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]