Re: Project options and other format enhancements (and dropping "variants")



On Mon, 2017-09-18 at 12:35 +0100, Sam Thursfield wrote:
On 16/09/17 07:22, Tristan Van Berkom wrote:
On Fri, 2017-09-15 at 15:53 +0100, Sam Thursfield wrote:
Hi Tristan,

In general I agree with your premises, and I think the proposal is
workable. I don't have anything better to propose.
     Option Declaration
     ------------------
     A project declares which options are valid for the
     project in the project.conf.

     These options should have some metadata which can be used
     to declare the defaults, assert valid values of the options,
     and also a description string which the CLI can use to communicate
     the meaning of project options to buildstream users (not all users
     building a project wrote the project.conf).

Are you expecting to support only enum values, or freeform strings and
integers too?

I certainly dont want to support only enums, although I was undecided
on whether the data would simply be in string form and have conditional
statements deal with typing; or to have typing encoded into the option
metadata.

I would much prefer typed values. Otherwise you end up in the land of 
"do I turn off this feature with 'no', 'false' or 'off'" ?

I would also like to avoid or at least strongly discourage having 
freeform strings. From an analysis point of view, an enum option with 15 
possible values represents 15 possible variants of your build. A 
freeform string option represents effectively infinite variants of your 
build. (In small projects you may be able to drill down and see how many 
string values are actually used, but in huge projects that becomes 
impractical).

Yeah I can get behind this.

One idea I am liking more and more is the (contains list_opt "string")
kind of conditional, where one could check for the presence of a word
in a whitespace separated wordlist, which one could use to whitelist
elements for a feature or to test for a single feature in a list (like
the compiler tuning example I made in my reply to Sander).

You probably already made the connection between this idea and Gentoo's 
USE flags: https://gentoo.org/support/use-flags/

It's a good system provided the possibilities are limited. So the option 
definition should have to declare all the possible flags that can be 
used in the list, rather than it just being a freeform string.

I did, but honestly I was more thinking about this directory and it's
include subdirectory:

http://git.yoctoproject.org/cgit/cgit.cgi/poky/tree/meta/conf/machine

For instance looking at the cortexa17 configuration:

http://git.yoctoproject.org/cgit/cgit.cgi/poky/tree/meta/conf/machine/include/tune-cortexa17.inc

Basically only enumerates the features available on a given machine
architecture; and by exporting these separately, allows the definitions
of gcc configure flags (or 'tunes') sitting in the arm/ subdirectory to
apply the correct flags for all the features available on the board.

<slight deviation from topic>

As you've been looking into bootstrapping compilers with BuildStream
maybe you can shed some light on what we could do for this, because I
feel my approach doesnt solve it perfectly.

 From what I understand, currently we can only single case symbolic
machine names and make a huge list of full tunings flags depending on
that symbolic name. This is an area I think yocto excels at and I would
like to have a solution that allows enough flexibility for this (of
course without being shell scripts which execute and source eachother).

At the basic level, maybe this could be done by allowing a project to:

   A.) Define symbolic names, maybe they are "macros" or "presets"
   B.) The symbolic preset defines values for options
   C.) Write conditionals based on the options

It's just the first approach that comes to mind, but would allow us to
define feature lists associated to symbolic machine names, and then
write conditional YAML fragments based on the resulting feature sets
instead of having to special case every machine name individually.

Any other ideas ?

Implementing a solution to this should not block our landing of a
project options feature; however, our approach should probably be
informed by if/how we intend to address this kind of complex case.

My goal with architecture names in Baserock has always been to limit the 
possibilities to a fixed set that we can exhaustively test. So in a way 
it's a feature rather than a bug that the way to enable extra 
project-wide compile flags is to modify gnu-toolchain/gcc.bst or 
project.conf directly.

I'm fine with people setting random GCC tuning flags if they want, but 
it should be be clear that if they do this then they have diverged from 
the upstream reference builds and any breakages are for them to fix, not us.

I dont believe this is workable for a low level compiler bootstrapping
project if that project is to be competitive and usable for a wide
variety of actors; this approach will quickly become unwieldy to
maintain when supporting only 10 specific gcc tunings for various
cpu/board setups.

You want to be able to:

  o Reuse some code; if the same gcc configure option enables vfp4 on 
    every single arm targetting gcc; then it would seem to me redundant
    to enumerate every gcc configure flag manually for every one of the
    (potentially dozens) of valid arm board configurations.

    Even if this amounted to similar redundancies (e.g. we have lists
    of symbolic feature names instead of lists of options specified),
    we still get to error out on an unrecognized feature before
    passing an `--no-such-switch` option to gcc configure (which might
    just give us a nice "WARNING: unrecoginzed --no-such-switch"
    message to ignore in our successful compilation.

  o Have some structure and pattern to follow

    When I approach such a project and I really only want to add
    a new arm board configuration, it would be nice to just look at
    the other 5 or 10 existing ones and tweak it.

    Besides it being easier to contribute to a project with such a
    structure (compared to forcing one to write an entire tuning
    just for "cortex a7"), it also ensures some consistency; i.e.
    if I have 10 arm configurations and those share the majority
    of configuration; I wouldnt ever want something that is common,
    done differently from one arm tuning to another.

    Doing things differently can range simply from specifying
    configure options in a different order (pretty minor offense),
    to using entirely different build approaches (e.g. one uses
    sysroots and configure options exclusively; another uses env vars
    to specify which cross tools to use at what stage, etc).


So I don't think we need any complex solution here. However there are 
certain variations where we probably do need to support both upstream 
and for those encoding info in the architecture name sucks. The only 
current example I can think of is ARM hard-float versus soft-float.

Even hard/soft float is not at all that simple, one machine has one
version of vfp (vfp3, vfp4 now...) others have the neon simd; some
cases you want to make sure you did not enable vfp3 even though the
board in question supports it, because it supports it as a sort of
backwards compatibility option while you really should be using neon.

But I know you know these things, just trying to stress that when it
comes to tailoring a compiler build to a specific machine/appliance;
you want to consider all of these details. Take this in contrast with
something like Flatpak arm builds; those are intentionally targetting a
very narrow subset of arm tunings which will "work on most modern 32bit
arm boards".

Having a "CPU architecture" flag set which can contain specific flags 
based on the overall symbolic architecture name seems like the nicest 
approach there. So if your arch name is "armv8l64" you get 
arch_flags=hard-float and you can override that to arch_flags=soft-float 
if you wish. If your arch name is "x86_64" then neither option is valid 
(or perhaps more accurately 'hard-float' is the only valid option).

Perhaps, I'm not sure and this is why I'm raising the question to you.

Lets phrase it a different way:

  If I wanted to salvage what I consider to be one of the most
  interesting bits of yocto (the referred machine database above
  and all of those rich compiler tunings);

  Would I be able to translate that database into some a single
  arch features list; and also have gcc bootstrapping build
  instructions which consume that list seamlessly ?

  What will I be missing ? Will I get an error (before running
  the actual build), when I specify an arm specific board feature
  but I'm trying to build a mips targeting compiler ?

  How do I, as a user, know exactly which flags to specify for
  this exact CPU architecture ? Can't I browse a list of implemented
  architectures and hope that mine is there ?


Again, this kind of support does not need to be there today in order
get by our present variants hurdle. But, if another project is going to
be serious about bootstrapping fine tuned compilers for a multitude of
machine architectures, I would hope that:

  A.) BuildStream makes this convenient for them

  B.) The project itself is an ambitious one, i.e. at least as workable
      as yocto is in terms of fine tuning compilers.


</slight deviation from topic>



As I've said else where I've migrated from:

   '??':
     condition:
     ...

to:

   (?):
     condition:
     ...

I feel like it will stand out more and I dont like the quotes. That
said I'm open to changing the conditionals to something more
conforming, if we really expect that the result is going to be more
legible.

I don't like the quotes either, (?) is probably better.

Ok so frankly I'm not attached to parenthesis nested lists esthetically
  speaking, I am however quite attracted by the simplicity of it, we
probably can achieve similar simplicity with something else.

I'm not convinced that a:

    condition: |

      %{foo} == "bar"

kind of notation is simple though; it tries to be very human friendly
and programing languagey, and then leaves us a bit blind if we want to
later extend the operators, what would we use for the kind of word in
list 'contains' conditional ? (maybe we do like python sets and use C
bitwise field test operators on them ?).

This operator comparison expression approach is also more rigid and
demanding, it would have to be done perfectly the first time.

We dont have a nice relaxed namespace in which we can deprecate the
'ifeq' symbol for a new 'equals' symbol on the day that we figure out
that comparisons should have been case insensitive, we might instead be
in a corner left with yucky workaround alternatives, advising users
that they should use the '===' operator in new projects, instead of the
existing but botched and unrecommended '==' operator.

I think deprecating "ifeq" for "equals" is pretty horrible too. We 
should just get it right first time, whichever approach we choose :-)

Of course any need to rethink and deprecate something is undesirable,
but I think you know that is beside the point.

Right now I need to make a decision yesterday about how to proceed last
week, so that it's done by friday and we can all breath easy again.

In this case of course, the most desirable path to follow is the one
where least concrete decisions are made - the safest would be to assume
that everything we're going to introduce is going to change at least
once, and it needs to happen in a backward compatible way.


I agree the simplicity of s-expressions and nested lists is a strength, 
but I still think it comes at a usability cost. Also, just because it's 
simple now doesn't mean that it won't have grown to be monster in 2 
years time because everyone who uses it gets excited and sends in a 
patch to add their favourite language construct.

What I'm suggesting is that we avoid creating a new DSL for expressions 
in the first place, we try and reuse an existing one instead. Ansible's 
use of Jinja proves that this is possible. I'm not entirely sure how it 
works though -- if I get time I will try and make a proof of concept 
expression parser and we can see if it can beat your 218 lines of code :-)

Myeah... I'm going to take another look at this but the comparison your
making is not the one I was looking for.


  o import sexpdata

    The above is ~600 rather loose literal lines of code, certainly
    less than 300 logical lines of code.

    This means the dependency is not complex and burdensome, I can
    easily throw it away and write a replacement, if every I dont
    like the license or for whatever reason.

    Also by virtue of the library itself not being too complex, we
    probably improve our chances of repeatability.

  o import jinja2

    I still have no idea what impact this has, how huge it is, what
    part of it we're using, how difficult it might be to fork out
    only the part we need, etc...


A lot to think about...

Cheers,
    -Tristan




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]