Re: [BuildStream] API proposal - enable iterating through all elements



Hi Douglas,

Thanks for getting back.

Some comments inline.

On Mon, Jan 27, 2020 at 5:28 PM Douglas Winship via buildstream-list
<buildstream-list gnome org> wrote:

Can you please clarify what you mean here? Do you wish to get the raw value of
the alias?

Yes (assuming I understand what you mean by raw value). If the url is defined in the bst file as 
"foo:bar/baz.git" then I'm looking for something that outputs "foo". It shouldn't matter whether 
project.conf defines "foo:" is as an alias for "foolab.net/", "github.com/", or "gnome.org/"file. In any of 
those cases, the output should still be just "foo".

Sorry, terminology is confusing. I could've been clearer. I think you have
rightly decided to go with an example. So, let me try to explain what I meant.

Continuing with the sample "foo:bar/baz.git", I was thinking that the first
command to output source fields would give the entire url as-is.

And the second command for listing aliases could either print all aliases as
key-value pairs, and/or get the value for a given key. This would tell you
whether or not "foo" is defined as an alias.

This means that you will have to find/replace the alias with its value
afterwards yourself. It is also worth noting that an alias may not be defined.
This is not an error though, BuildStream will use the URL as-is in this case. In
the above example, if "foo" is not defined as an alias, BuildStream will treat
it as literally "foo:bar/baz.git".
(This may or may not work depending on what kind of source it is. For example,
"foo:bar/baz.git" is a valid url for a git repository assuming the user has
defined an alias "foo" in their ssh config.)

So, while none of the two commands I suggested would output just "foo", I think
this approach should still work for what you are trying to do. Let me know what
you think.

So, even if `translate_url` method becomes a public static function,
BuildStream still
won't know which URLs to translate. Of course, you as the author of this tool,
may know which plugins you use and what do they call their URL fields.

Interesting. If I've understood, you're saying that BuildStream defines a format for specifying the 
aliases, and it defines functions that can do the appropriate substitution if you give them a URL to 
translate, but BuildStream doesn't actually define any standard method for listing all the URLs that need 
translating?

Precisely!

For instance, all the plugins that I've seen so far (at least, the ones which use source URLs), have used 
the same format. The element includes a list called "sources", each entry in the list is a YAML dictionary, 
and each dictionary generally has exactly one source url, stored in a field called "url:". And what you're 
pointing out is that this structure isn't actually guaranteed? Other plugins could have their source data 
stored in different formats, or have more than one URL listed per source, or even have source URLs 
hard-coded into the plugin itself?

This is definitely accurate for source plugins, less so for element plugins. Let
me explain.

Source plugins have full control over how to structure their data. This includes
everything you see under the "sources" field. There are some source fields that
are special, and BuildStream treats them differently. But, "url" isn't one of
them. The only such field we have currently is "directory". You can refer to
https://docs.buildstream.build/master/buildstream.source.html#core-source-builtins
for more details on defaults.

"url" is just a convention, albeit a popular one as you have noticed so far.
But, plugins can define fields containing urls however they like. For example,
docker source plugin has a field called "registry-url" that accepts urls which
can be defined as aliases
(https://buildstream.gitlab.io/bst-plugins-container/sources/docker.html).

Coming back to .bst file format, the top-level fields are all well-defined and
controlled by BuildStream itself. So, the "sources" part you noticed is actually
a rule, not just a convention. For details on this bit, you can refer to
https://docs.buildstream.build/master/format_declaring.html.

That makes sense, and I can see that it would make it impossible to write a generalized function into 
BuildStream that just lists all source URLs. That's... frustrating.

But it's also interesting, because I think it helps me clarify the kind of API access I'm looking for.

I can sympathize with that. But, this is also part of what makes BuildStream a
very flexible tool.

All I really want to be able to do with an element, is to explore the data structure defined in the .bst 
file for that element. I can almost do that just by importing a YAML library and parsing the file without 
using any buildstream functionality at all, but that wouldn't correctly process things like the "include" 
directive. I want to be able to look at the nodes and the data after all the include-files have been 
included, after buildstream has substituted the appropriate variables, and possibly after conditional 
statements have been resolved. But also (if possible) before anything has actually been 'built'.

Is that an achievable / realistic goal?

What I find interesting, is that if I write a plugin then I can already access exactly that kind of 
information. So that means the individual functions that I want to access for an element, are already part 
of the public facing API. It's just that doing it with a plugin adds some unpleasant overhead: (1) I have 
to add a whole new element to my project, as an instance of the new plugin. (2) I have to add the element 
I'm actually interested in, as a dependency of the new element. And (3) I have to "bst build" my new 
element. And at least in my particular use case, I then couldn't avoid having to build the target element I 
was originally interested in, along with its entire dependency tree.

Points 1 & 2 are annoying but not too hard to deal with. Point 3 is the real kicker, for me. It feels like 
what I'm trying to achieve should just be a glorified yaml.load(), it should take a few seconds. But 
instead, the only way to actually execute the code I've written is to run "bst build", and if I don't 
already have things cached that can mean downloading hundreds of different sources from the internet, and 
waiting hours for an enormous project to compile from source code when I didn't actually need any of it to 
be built.

No, this is definitely not the most optimal way, and we shouldn't subject anyone
to this much effort just to get parsed element definitions :) Writing a plugin
to process element definitions doesn't feel right to me.

So my rephrased question for the mailing list is this:

Can anyone see a good way to let users explore the nodes in an element definition, and run the same kind of 
arbitrary processing that you can _already_ include in a plugin, just without enforcing the overhead that's 
usually associated with 'bst build'?

When you say "arbitrary processing", are you also expecting to access any
BuildStream API methods, besides getting the data?

In other words, would it be sufficient to get just the parsed data (with proper
substitution, composition etc) about the element, its dependencies and the
project itself? Or, do you also wish access to any other API as well?

If it's the former and you just need that data, I think that can be solved
entirely by extending the CLI. And this would require merely a "bst show" to get
the required output. For example, something like:

    $ bst show --deps none --format 'url - %{sources['url']}' target.bst
    url - foo:bar/baz.git

    $ bst show --deps none --format 'alias - %{aliases[foo]}' target.bst
    alias - foobar.com      # Assuming it's defined, error otherwise

This doesn't require fetching/building anything, and can be extended to work
with dependencies as well by controlling the "--deps" option accordingly.

---

If we do this, there's nothing special about aliases so we should also consider
providing access to other project configuration options as well. Similarly, the
mechanism to get the "url" field should likely be a generic way to query element
definitions.

Thanks!
Chandan


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]