Re: [BuildStream] API proposal - enable iterating through all elements



Hi Chandan, thanks for the comments, and sorry for the late reply.

On 13/01/2020 18:41, Chandan Singh wrote:
I appreciate this use case and what you are trying to do. I was trying to do a
similar thing in the past where I wanted to ensure that all elements were using
_some_ alias for their URLs. That has similar requirements so I'm glad to see
other related use cases as well.

I also think that you are right in saying that the current public API can't
solve this fully.

<snip>
Thanks for the positive feedback
While we support some format options for `bst show` (see help text), we don't
have anything that can output source fields just yet. But if we had the ability
to print those and all available aliases, do you think that would be useful for
your use case?

That would definitely be useful! (So long as the output is formatted to be machine-readable and unambiguous.)

Providing access to all alias definitions seems less controversial and more
feasible to me. We can even expose that as a CLI format option as I proposed
above.

Agreed. Getting full access to the "original source" aliases, and to the "download mirror" aliases would be very useful, and presumably much easier to achieve than a major API overhaul.

- Desirable: a function or an attribute for sources, that gives the alias.
Can you please clarify what you mean here? Do you wish to get the raw value of
the alias?

Yes (assuming I understand what you mean by raw value). If the url is defined in the bst file as "foo:bar/baz.git" then I'm looking for something that outputs "foo". It shouldn't matter whether project.conf defines "foo:" is as an alias for "foolab.net/", "github.com/", or "gnome.org/"file. In any of those cases, the output should still be just "foo".

But I do think it should check that the alias is defined as _something_ in project.conf. (ie it should check that "foo" is actually an alias that exists in the project).

There could be functions like "get_alias" and "get_mirror_urls" that work the same way as translate_url() works, ie Buildstream defines them as functions that accept a raw URL as input, but individual plugins are left to define where that URL comes from when they call the function.

At the moment, I'm just taking the raw URL, and extracting whatever's on the left of the first colon, and treating that as the alias. But then I have to validate that I've got an actual alias, and check it against the list of aliases in project.conf. That means I'm making some assumptions about the project layout, that I'd rather not have to make.

- Desirable: access to the value stored in project.element_path (so that the script doesn't have to assume a directory called 'elements').
Assuming that we open up some sort of API, definitely +1 in that case. If not,
I again wonder if the format options of `bst show` can be enhanced to satisfy
this requirement as well?

I think a "bst show" option would certainly work for this.

---

Elements, by definition, only make sense in the context of project.
So, we are implicitly asking for a public way to instantiate Project
objects as well. Which means that we have to make X, Y & Z public, and
so on. My point being that this will require some careful thought.
Agreed, and I've got no idea what the best way forward is. That's why I've been encouraged to try and start the conversation on the ML. I'm very curious to see where things might go in the future.
Wishlist: things we think we'd need for this use case:
- Essential: A function for source objects, that that produces mirror urls (like 'translate_url' does for the main source URL).
  - (OR access to the mirror-aliases, so we can do our own substitutions)
This bit may not be as straightforward as it seems, since it's not BuildStream
itself that calls the `translate_url` method. The plugins do. And BuildStream
has no way of knowing where that URL comes from. For all BuildStream knows,
the plugin could be passing it a hardcoded url each time.

So, even if `translate_url` method becomes a public static function,
BuildStream still
won't know which URLs to translate. Of course, you as the author of this tool,
may know which plugins you use and what do they call their URL fields.

Interesting. If I've understood, you're saying that BuildStream defines a format for specifying the aliases, and it defines functions that can do the appropriate substitution if you give them a URL to translate, but BuildStream doesn't actually define any standard method for listing all the URLs that need translating?

For instance, all the plugins that I've seen so far (at least, the ones which use source URLs), have used the same format. The element includes a list called "sources", each entry in the list is a YAML dictionary, and each dictionary generally has exactly one source url, stored in a field called "url:". And what you're pointing out is that this structure isn't actually guaranteed? Other plugins could have their source data stored in different formats, or have more than one URL listed per source, or even have source URLs hard-coded into the plugin itself?

That makes sense, and I can see that it would make it impossible to write a generalized function into BuildStream that just lists all source URLs. That's... frustrating.

But it's also interesting, because I think it helps me clarify the kind of API access I'm looking for.

All I really want to be able to do with an element, is to explore the data structure defined in the .bst file for that element. I can almost do that just by importing a YAML library and parsing the file without using any buildstream functionality at all, but that wouldn't correctly process things like the "include" directive. I want to be able to look at the nodes and the data after all the include-files have been included, after buildstream has substituted the appropriate variables, and possibly after conditional statements have been resolved. But also (if possible) before anything has actually been 'built'.

Is that an achievable / realistic goal?

What I find interesting, is that if I write a plugin then I can already access exactly that kind of information. So that means the individual functions that I want to access for an element, are already part of the public facing API. It's just that doing it with a plugin adds some unpleasant overhead: (1) I have to add a whole new element to my project, as an instance of the new plugin. (2) I have to add the element I'm actually interested in, as a dependency of the new element. And (3) I have to "bst build" my new element. And at least in my particular use case, I then couldn't avoid having to build the target element I was originally interested in, along with its entire dependency tree.

Points 1 & 2 are annoying but not too hard to deal with. Point 3 is the real kicker, for me. It feels like what I'm trying to achieve should just be a glorified yaml.load(), it should take a few seconds. But instead, the only way to actually execute the code I've written is to run "bst build", and if I don't already have things cached that can mean downloading hundreds of different sources from the internet, and waiting hours for an enormous project to compile from source code when I didn't actually need any of it to be built.

So my rephrased question for the mailing list is this:

Can anyone see a good way to let users explore the nodes in an element definition, and run the same kind of arbitrary processing that you can _already_ include in a plugin, just without enforcing the overhead that's usually associated with 'bst build'?



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]