Re: [BuildStream] Proposal: Allowing download-only sources to work on local files



Hi Ben,

On Wed, 2019-12-11 at 09:45 +0000, Benjamin Schubert wrote:
On Monday, 02 December 2019 10:19, Tom Mewett via buildstream-list
<buildstream-list gnome org> wrote:

Firstly, I'm going to use the term "URL" in quite a general way here.
That's just to keep with existing terminology; it should probably be
changed to avoid confusion.

I disagree with changing the terminology here. Nothing in 'URL' means it should be only to
target a resource on the web. A local file path _is_ a valid url. As such, it is also
understood in the same way by all python tooling relative to URLs. More on this later.

While semantically a URL can definitely point to a local file (and the
path can be relative or absolute also as an URL), I want to point out
here that syntactically not every local file path is a valid URL. E.g.,
space, ':', '@' and '%' can't be used in a URL without percent encoding
and non-ASCII characters depend on whether we use URLs/URIs or IRIs .

I.e., depending on what characters are used in local filenames, URLs
can be more painful to the user.


I propose that the act of "resolving a URL to a file" be generalised and have
plugins only operate on the result of that resolution, not knowing or
caring whether the file was downloaded or is local.

While this would work in theory, I don't think this should be done in practice.
The reason for this is that we would then have a set of sources (say Zip,Deb,Tar) that
would work with both a local path or a remote one. However other plugins like 'git'
would then only be supporting remote? Why such a discrepancy?

The git example seems a bit far-fetched to me. We only support local
paths within the project path. Having a git/bzr/svn repository as a
subdirectory of a project is a bit odd. It could theoretically be
usable with git submodules. However, I think it would be unusual for a
BuildStream project to use git submodules for element sources as that
functionality should already be covered by BuildStream's `fetch`,
`track` and `ref` functionality.

That said, if there was a use case for a non-file-based source plugin
to support local paths, the plugin could presumably still add explicit
support for `path`.

And then, if we want support for all kind of sources, with this
proposal, we would need to split plugins in two: SourceFetchers and
Sources, which would add complexity and make it less easy to
extend/write new plugins.

We already have the split (DownloadableFileSource) to avoid code
duplication for file-based plugins. I don't see how this proposal would
make this any worse or how your counter-proposal would simplify that
aspect. I might well be forgetting something here, in which case,
please elaborate.


- All source plugins which operate on files are unified as subclasses of
  a single base class, say FileBasedSource
- This base class handles the 'url' and 'ref' keys of the source config
- First it checks whether 'url' is a fully-qualified URL or is just a
  relative path. If it's the former, it is fetched as necessary and
  stored in the source cache
- If the URL is a relative path, specifying a ref is optional. If it is
  given and is different from what is calculated, an error is thrown

The 'local' and 'remote' sources could then also be unified to some
kind of 'copy' source, as 'copy' would act as either one depending on
whether a local path or full URL was given.

If desirable, the 'url' key could be split into mutually-exclusive
'url' and 'path' keys which would decide the behaviour.

This would lead to two radically different handling of the _same_ source type, making it
harder ot understand what is happening and how to write a source element correctly.
I think this is too complex and will lead to confusion for newcomers.

Are you also opposed to making an improved version of
`DownloadableFileSource` public? If yes, what are you suggesting as
alternative that doesn't involve code duplication? If no, why can't
`DownloadableFileSource` (or rather, `FileBasedSource`) transparently
handle local files as well without adding complexity to source plugins
that use it?


Another advantage would be that local files imported into the project
can be given refs, meaning that they would not need to be present to
compute cache keys of depending elements. (This is not possible with
the current 'local' source.)

That would mean that depending on whether a ref is set or not, a change will
be picked up or not. I think this will lead to too many confusion with editing
said files.

Yes, I'd be cautious about this as well. I'd suggest we keep using ref
only for remote files for this proposal and consider refs for local
files in a separate discussion if/when a use case arises.


---
Based on those points, I dislike the current proposal and believe we can do
better at providing an easy experience for those cases.

Here is my alternate proposition, which builds upon Tristan's one a lot:

1) We add a '%{project-path}' variable to BuildStream, handled by the core,
   which resolves to the path to the directory in which 'project.conf' is.
   Sources that want to be local _have_ to use the 'file://' protocol.
2) The rest stays as is.

 * How would we handle cache keys? We don't want the cache key to
   depend on the project path.
 * Wouldn't this mean that `ref` has to be used for local (tar) files?
   You mentioned above that this may lead to confusion (and I agree).
 * As I understand it, the `local` source plugin would continue to
   exist. This means there would be a discrepancy between local files
   specified with the `local` source plugin (regular paths) and local
   files specified with other sources plugins (URLs). I fail to see how
   this is better than the original proposal. To me it provides less
   consistency.
 * We also need to prohibit file URLs that point to files outside the
   project directory.

Cheers,
Jürg



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]