Re: Project options and other format enhancements (and dropping "variants")

From: Angelos Evripiotis <angelos evripiotis gmail com>
To: Tristan Van Berkom <tristan vanberkom codethink co uk>
Cc: buildstream-list gnome org
Subject: Re: Project options and other format enhancements (and dropping "variants")
Date: Fri, 22 Sep 2017 05:17:01 -0700

Oh my, I quite recently underwent a 15+ hour full body procedure of
transportation... which was commencing around when this email was sent, sorry
for missing this email :)


No worries! I think the conversation moved on to what's important for right now
anyways, so I'm happy this became a side-line. I'm also now on hols for two
weeks so will be less responsive for a bit.

Thanks for taking the time to reply so fully after spotting it.

Regarding your PS, I should just mention for your own benefit that you have
forked this thread, this might mean that you did not group reply this email
originally; and also caused it to fall behind in my message history (actually
I found this message because my iPhone did something weird and associated the
two threads together using some kind of extra sensory heuristics, or I might
have missed this message entirely).


Ah yes, thanks for pointing that out, I think my problem was trying to reply to
'mailing list digest', I've opted for individual mails now. I'll look out for
that in future.

  o If I have per element options, I can use them to apply only an
    option to a specific element, but I can't make sweeping statements
    about how to build, all the autotools elements in the the entire
    project for example.

  o If I have project wide options, I can use them to apply a project
    wide sweeping statement.

    And, I can *also* use them to make element specific statements.

In fact, I would say that per-element options introduce more
complexity, both in terms of implementation and in terms of using
buildstream.


I agree with all this, I read the first point as "If I *only* have per-element
options". Per-element options do give users more rope to tangle themselves
with.

Taking the debug example; if you have a few hundred elements, and they all
have to declare their own individual 'debug' booleans; you suddenly have to
*declare* a few hundred debug booleans right ?


Yeah I think it would be terrible if elements had lots of repetition like this,
where they're all just specifying less optimization and exporting debug
symbols. I'm thinking it wouldn't have to be that way, but I don't have worked
example to offer. I figure they should only specify what's special about
themselves, the rest can be global.

Seems to me that a simple (contains debug_elements "myelement.bst")
will cover that use case with less hassle.


For 1.0 I think this could work well enough.

I'm wondering if this global config change would invalidate the cache for the
whole pipeline. Will 'bst build' be smart enough not to rebuild everything
after such an edit?

Right, I am trying to veer away from element centric options for now
unless (or until) we can really make a case that we need that level of
complexity.

As far as configurability goes, project centric options are enough,
beyond that; asserting matters of compatible options is unfortunately
not as automated and implicit as variants was, one would need to write
conditions which produce assertions.


I agree for 1.0, less is more, and we have enough for now.

One could compare this with a state machine in a C program, or a
horribly written kernel driver with a dozen boolean checks on each
entry point of it's file handle (seen it before...).

The analogy suggests that one should instead write many smaller,
coherent C programs which work together - this is where we are going
with inter project dependencies (recursive pipelines); and is another
opportunity to have projects actually dictate how projects they depend
on are built.


I think I can imagine that nightmare too, where each element is responsible for
expecting the unexpected, thanks to a lack of global guarantees, or inadequate
library functions.

+1 for smaller, coherent, and co-operative. Also I think we should pull much of
the Unix philosophy in general into our build pipelines. I think it's related
to why I can't shake the idea that elements can help us more.

<vague dreaming of a smarter-element future>
Intuitively I can see a use for both project-level and element-level options.
Perhaps a level in-between - it might be covered by recursive pipeline config.
I want to do better than intuition at some point later, so I can talk about
actual use-cases :)

I get some inspiration from the output of 'brew info python', there's an
options section particular to Python:

    ==> Options
    --with-berkeley-db@4
        Build with berkeley-db@4 support
    --with-poll
        Enable select.poll, which is not fully implemented on macOS
        (https://bugs.python.org/issue5154)
    --with-quicktest
        Run `make quicktest` after the build (for devs; may fail)
    --with-sphinx-doc
        Build HTML documentation
    --with-tcl-tk
        Use Homebrew's Tk instead of macOS Tk (has optional Cocoa and threads
        support)
    --without-gdbm
        Build without gdbm support
    --without-readline
        Build without readline support
    --without-sqlite
        Build without sqlite support
    --HEAD
        Install HEAD version

I really like that I get options specific to Python, without reading the
formula for Python, and without needing to learn about global options for
Homebrew.

I'm thinking about some of these cases for config:

- Global, project-level: for things like 32-bit vs. 64-bit.

- Specified on dependencies: for enabling features like '--with-sqlite',
  without needing to know about them at the project level. Users would likely
  rely on 'bst show' to determine what got selected.

- Also specified on dependencies: requesting a particular version, for
  upgrading piece-wise when breaking changes are introduced into an element.

- Overridden per-element for dev purposes: e.g. enable expensive asserts,
  optimisations off, single-process, link in special dumper, etc. Orthogonal
  things that you don't want to flip globally, maybe not crowd the global
  namespace with either.

</vague dreaming of a smarter-element future>

On (2) Allowing elements to be more specific:

Yes, this is really what I like so much about variants.

The problems I have with moving forward for a full blown variant
solution with competing orthogonal variants, are mostly that:

o It's going to be very, very tricky to implement.

We already have an imperfect algorithm for resolving variants when
there can only be one variant per element. The solution which works
exactly to spec takes several minutes to solve (so already we need
work to get it right with a proper constraint solving algorithm or
engine, which still wont work in linear time but should be usable).

o While it may be easy enough to explain the rules of constraint
resolution to the user, it will often be difficult for the user
to easily predict what variant of what element will be chosen.


Now that I've read the tricky.bst[1] test case for variants, I realise they do
much more than I thought. Automatic resolution is cool, I like the maximum
ambivalence. I see what you mean that it leads to implementation difficulty and
sometimes user confusion.

I think they'd still be very usable without automatic resolution. Maybe I need
to read more of those test cases :)

In the case of tricky.bst, as a user I'd be happy with getting an error message
for that and resolving the conflict myself. I'd be ok with fixing tricky.bst to
depend on the second variant of tricky-first, adding a comment as to why it's
there so it might go away if tricky-first changes. If we added some sort of
semantic tag in the comment about the resolution, then we might be able to
automate some of that outside of BuildStream.


Okay, I think I follow, but I dont believe it's practically workable.

The tricky.bst is the smallest test case I could put together which
causes the engine to choose:

             tricky
             /    \
            /      \
           /        \
   tricky-first   tricky-second(second)

Where:

   tricky-first(first) -> tricky-second(first)


Things will get more complex, once you add a bit of depth to your build
graph:

                   tricky
                  /     \
                 /
   \
             pizza      buffalo
        (all dressed)   (naked)

        /             \
             /               \
            /
           \
   tricky-first        tricky-second(second)

Where again:

   tricky-first(first) -> tricky-second(first)


Now tricky.bst doesnt want to care about how a naked buffalo can
prepare an all dressed pizza; yet the same condition arises.

While one *could* give tricky explicit knowledge about how a pizza gets
dressed using a 'tricky-first' and suchlike; but I think things get
even more hectic once you consider that tricky.bst is not necessarily a
toplevel target, but only sometimes a target, or sometimes used in the
context of another target.

Let's try another:

            target A    target B
                  \      /     \
                   \    /       \
                    \  /         \
                   tricky         \
                   /   \           \
                  /     \           \
                 /       \
  \
             pizza      buffalo       \
        (all dressed)
(naked)        \
              /             \        tricky-
second(second)
             /               \
            /
  \
   tricky-first          tricky-second

Still:

   tricky-first(first) -> tricky-second(first)


Depending on the context in which tricky.bst was built, we'll get
different variants of tricky-first and tricky-second.

I guess in this case we've just shifted the responsibility up the tree
to target B, still I'm not sure how workable this would be in a larger
project with a few variants in play.

Thinking about what the user has to do in order to say; add a new
element or build a new target, it looks like when one introduces a new
low level dependency with variants; then all of the toplevels which
indirectly depend on it need to be explicit (which kind of leaves the
main advantages of ambivalence dead in the water).

e.g.

    tricky.bst:
    ...
    depends:
      - filename: tricky-first.bst
        variant: second  #!resolves: tricky-second.bst:second
      - filename: tricky-second.bst
        variant: second

However, what we dont have is the element declaring the value of an option on
an element it depends on; depending on how the element itself was configured;
this all leads back down the variant path where an agreement between elements
must be reached.


When you say this about options in project.conf, I think something similar
about options in elements for large projects:

These options should have some metadata which can be used to declare the
defaults, assert valid values of the options, and also a description string
which the CLI can use to communicate the meaning of project options to
buildstream users (not all users building a project wrote the project.conf).


It feels like in addition to the top-level choices of project.conf and the CLI,
elements could encapsulate some config complexity and provide an interface to
things that depend on them. I think that's another thing that draws me (and I'm
guessing you) to variants.

Here is a similar example to the one I brought for my 'point (2)' before, now
a bit fuller:

    app.bst:
    ...
    depends:
      - lib1.bst
      - filename: lib2.bst
        variants: flying-ponies
    ...

    lib1.bst:
    ...
    depends:
      - filename: lib2.bst
        variants: dancing-badgers
    ...

    lib2.bst:
    ...
    variants:
      - name: flying-ponies
      - name: dancing-badgers
        description: Summons a clan of gyrating badgers for a short
                     period. The top speed of a badger is 30km/h, fact.
    variables:
    - conf-extra:
      '??':
        - condition: (ifvariant "flying-ponies")
          value: --enable-flying-ponies
        - condition: (ifvariant "dancing-badgers")
          value: --enable-dancing-badgers
        - condition: >-
            (and
              (ifvariant "dancing-badgers")
              (ifvariant "flying-ponies"))
          value: --enable-flying-ponies --enable-dancing-badgers
    ...

If the ponies / badgers combo is no good then we can do:

    lib2.bst:
    ...
    variables:
    - conf-extra:
      '??':
        - condition: (ifvariant "flying-ponies")
          value: --enable-flying-ponies
        - condition: (ifvariant "dancing-badgers")
          value: --enable-dancing-badgers
        - condition: >-
            (and
              (ifvariant "dancing-badgers")
              (ifvariant "flying-ponies"))
          value: '!!': Sorry, it's badgers or ponies, not both.
    ...

Some criticism of my own example:

- You can see I've mangled the syntax of variants in with conditions, I'm not
  very attached to my choices there, it's more to show that it's not a big step
  away from your proposal. It also shows it doesn't need automatic constraint
  resolution, it's all explicit.

- Even with only two variants (they could have been options), I can see it is
  harder to read. I at least need to factor in architecture, debug, etc. too.

- It's a contrived example, I need to relate it back to actual problems I have
  in the BuildStream adoption POC I'm working on.

I'm torn, of course I wish variants was going to work perfectly. On the other
hand I have to concede that the implicit nature of how things resolved was
already a little bit complex for the user to digest, and increasing that
complexity by making them orthogonal (both to implement and to use) seems to
be unwise.


Would you consider dropping the automatic resolution, or is that maybe one of
the main attractions for you?


The ambivalence is rather the main attraction, but for that you need
resolution.

For reference (you might or might not want to read through it, but it
has some examples of what I intended to use variants for), here is the
email of my original proposal for variants... from almost 2 years ago
now when we were talking about baserock:

  https://listmaster.pepperfish.net/pipermail/baserock-dev-baserock.org/2015-November/013337.html

So "the point" of variants was to reduce duplication and make it easy
to define full systems with least redundancies possible, and without
opening the door on the combinatorial explosion too much (or naturally
limiting the possible combinations).

Consider a project that describes a small OS, now you want to build the
whole thing but swap out one of the low level dependencies for another;
ambivalence let's us proceed with the following workflow:

  o Take tls.bst, which never previously had variants, and now
    add 2 variants to it

  o The first (default) variant will be called 'openssl', and should
    result in no change, in cache key or anything; so by default
    tls.bst has not changed.

  o The second variant of tls.bst is 'nss'

  o Lets say you have a stack element that is bootable-os.bst, lets
    add a new stack which depends on it, called bootable-alt-os.bst

  o Lets now make bootable-alt-os.bst now explicitly ask for the 'nss'
    variant. The explicit dependency from the toplevel to the bottom
    layers is made only for the sake of tailoring that system build
    output; but all of your hundreds of elements in between remain
    unchanged and ambivalent and reusable for other systems you might
    want to deploy based on the same code bases.

These are my motivations for variants.


Thanks for the write-up! I like the (fixed) diagrams! This whole bit is great
reference, it's nice to see the idea has been cooking for some time. I'll put
it on my holiday reading list :)

To be honest; while I think that for now the prudent thing to do is to
introduce options *at the project level only*, I wonder if any kind of
element specific configuration be added in the future, if it would make
sense to just bring back variants as is in that case.


The rationale for this ran like:

  o We need to specify multiple orthogonal settings

  o Variants is only one setting, per element

  o Variants would be way too complex if an element was allowed
    to list more than one named *group* of variants (i.e. allow
    them to be parallel/orthogonal)

  o So we'll need something additional in order to cater to those
    typical use cases; use cases which are not really about *what*
    we're building; but rather *how* we're building it.

  o If we're going to have something else, it makes little sense to
    keep variants around.


It could be that I faltered on that last 5th point, it could very well
be that it makes perfect sense to have variants as a "one-variant-per-
element" thing, which helps you dictate *what* you build...

...where *what* you build includes different configurations of systemd
or a GTK+(wayland) vs GTK+(x11) vs GTK+(both).

And then the project centric options would be there to augment things
for *how* you build what you're building (and this is were we could do
debugging and profiling and lots of stuff; using sweeping project wide
statements or options that are lists; ala debug_elements = "this.bst").


I feel like this would not be so horrible API wise, but we still need
to optimize and perfect the variant resolution algo.


But in any case I still think for right now it's prudent to stick with
project centric options and then explore the additional element variants
approach whenever/ifever we run out of road with what we have.


Sounds good to me, it's really valuable to see your thoughts here, so thanks
for laying them out. I figure we'll have the next 6 months after 1.0 to
identify any concrete issues with project-centric config and work through them.

Cheers!
Angelos

References:
- re: Project options and other format enhancements (and dropping "variants")
  - From: Angelos Evripiotis
- Re: Project options and other format enhancements (and dropping "variants")
  - From: Tristan Van Berkom
- Re: Project options and other format enhancements (and dropping "variants")
  - From: Angelos Evripiotis
- Re: Project options and other format enhancements (and dropping "variants")
  - From: Tristan Van Berkom

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]