Re: [BuildStream] Proposal: Configuration format for CAS/artifact server split



Hi,

>> Hi,
>>
>> I've started looking at [supporting separate endpoints for
>> CAS/artifact services][1]. While the issue seems pretty
>> straightforward, we'll need to change the configuration format
>> slightly, and I wanted to check if I'm going in the right direction
>> since I've not really played with CAS all that much.

For an update on this, the work now lives in [this MR][1], and I've
had some preliminary review.

From Tristan's reply:

>> The configuration format I'd like to use looks like this:
>>
>> ```yaml
>> artifacts:
>>    # Un-split caches can still be defined using the old spec
>>    - url: https://foo.com:11001
>>      server-cert: foo.crt
>>
>>    # Split caches are made up of two separate specs like this
>>    - metadata:   # Can anyone think of more obvious names for users?
>>        url: https://foo2.com:11001
>>        server-cert: foo2.crt
>>      blobs:      # Can anyone think of more obvious names for users?
>>        url: https://foo3.com:11001
>>        server-cert: foo3.crt
>
> I think this looks confusing, as in without heavily commenting your
> config file, it won't be very obvious to anyone what this stuff means.
>
> Right now `artifacts` is clearly a list of "servers", and from what I
> gather from your proposal; it must be beneficial to us if the
> relationship of the CAS cache and the artifact metadata cache can be
> declared and known (which is why they appear in the same dictionary in
> the config I presume).
>
> If it is not important to express that a given artifact server is
> related to a given CAS server, then I suggest to make this config more
> readable and just add a new "type" field, letting the user decide what
> to store in this server (payload, artifact data, or both).

It seems my original reply to this was eaten by the void, but I
*thought* it was important because we'd need to query lots of CAS
servers to find our blobs if we didn't know what the pairing is.

Some discussion with Jürg makes me believe this isn't the case, simply
because it should be uncommon for anyone to *use* more than one CAS
server, especially because it goes against the guidelines specified
by [Google][2].

There are a couple other advantages that Jürg outlines in his [comment
on the above MR][3]:

> This could be interesting in some cases. E.g., CAS servers don't
> need to fully trust the client to allow pushing while artifact
> servers do. Organizations might set up central big CAS servers (and
> use it also for remote execution) with push access for lots of
> developers while setting up various instances of the BuildStream
> artifact server with limited access for project-specific needs. In
> this scenario the user might want to configure the CAS server as
> global CAS server in userconfig and specify only the artifact server
> per project.

With that restriction, the code for this can also be a lot simpler,
and the configuration format (as pointed out by Tristan) less
confusing.

To ensure this is the case, is there anyone listening to this list
who'd think they'd want to specify more than, say, 5 CAS caches, or
can anyone see a use case where we would end up with that many?

Thanks,

Tristan Maat

---

[1]: https://gitlab.com/BuildStream/buildstream/merge_requests/1540
[2]: https://docs.google.com/document/d/17WJ4cz150IHeTgvJGxcbSGK1Zg2vsLpcgUSVffrwGfk/edit [3]: https://gitlab.com/BuildStream/buildstream/merge_requests/1540#note_204325382

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]