[BuildStream] Expressing file attributes in the yaml format



Hi all,

I'm writing to you today to start a discussion on how we would like to
express file attributes in the YAML format, because until last week we
have never really had a need to express anything about file attributes
in the format; as all of this can usually be expressed with `chmod` and
similar commands run in the build environment.

This is not so much of a concrete proposal as it is a decision we need
to take about how such things are expressed, depending on what is
decided, a more concrete specification needs to be drawn up.


Background
~~~~~~~~~~
Recently we've just landed a merge request[0] which adds a boolean
'executable' parameter to the recently added 'remote' source plugin.

Since this is the first time anything which expresses file attributes
has been added to the format, the question of consistency and symmetry
in the API arises.

If someone has a use case for this in the 'remote' plugin, it stands to
reason that this will not be the last time that someone might want to
express file attributes in the YAML format.

In order to expedite the patch and unblock other ongoing work, I've
decided to land the patch as is, and defer discussion to the mailing
list[1], in any case we have around 6 months before this API needs to
be frozen.

As far as I can see, there are two avenues we can take:


  Minimum configurability
  ~~~~~~~~~~~~~~~~~~~~~~~
  Since this issue has never arisen before, it may well be that the
  rational approach is to just say that the most a source plugin
  will ever want to decide about a file attribute, is whether a file is
  executable or not.

  I can imagine cases where it might be more convenient to specify
  various file attributes when using an import element, rather than
  declaring a separate element later on to set the attributes in
  a script.

  Still minimum configurability is a plausible avenue, and does not
  prevent file attributes to be setup in other ways - it has the
  benefit of providing a simpler API surface, without preventing
  more complicated pipelines from being expressed.


  Maximum configurability
  ~~~~~~~~~~~~~~~~~~~~~~~
  I think it stands to reason that if one ever wanted to express
  the 'executable' nature of a staged file, that expressing read
  and write permissions will also be desirable.

  Not only that, but rather any attributes such as owner/group
  UID, and extended attributes would also one day be interesting
  to someone.

  If we were to aim for maximum configurability, I think it would
  make sense to declare a common format which can represent a superset
  of every attribute which can ever be supported on any filesystem
  in any target platform we ever support.

  This would be documented in a central location in the reference
  manual, and utility functions would be provided for the sake of
  plugins to load the value from the YAML, consider it in the cache
  key, and apply it to files at staging time.

  Since we still suffer from infamous issue 38[2], I think it would
  make sense to have such a format start out only by supporting the
  read/write/execute mode bits for the owner/group/other fields.

    A.) This can be extended in a central location in such a way
        that any plugin using the related utilities will benefit
        when support is added for any additional file attributes.

    B.) We should ignore the fact that the new CAS based artifact
        cache introduces a pretty bad regression[3] when designing
        this format, let's not let temporary breakage inform our
        design choices.

        While other parts of issue 38 are technically challenging;
        supporting at least the read/write/execute bits for the
        owner/group/other fields does not present such technical
        difficulty (arguably, neither does the setuid bit).


I have a tendency towards maximum configuration, but I am much more
concerned that we have a consistent API moving forward than I am
concerned with the actual decision.

So I invite the community to participate in this choice, I do think
that minimum/maximum are the only good options, but feel free to
propose a third category of options if you feel you have a good idea :)

Cheers,
    -Tristan


[0]: https://gitlab.com/BuildStream/buildstream/merge_requests/581
[1]: https://gitlab.com/BuildStream/buildstream/merge_requests/581#note_92119402
[2]: https://gitlab.com/BuildStream/buildstream/issues/38
[3]: https://gitlab.com/BuildStream/buildstream/issues/38#note_87327672



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]