Re: [BuildStream] Invalid characters for element names



On Mon, 2018-12-10 at 09:12 +0000, Daniel Silverstone via BuildStream-list wrote:
On Fri, Dec 07, 2018 at 17:35:47 +0100, Jürg Billeter wrote:
Also, if we are staying with ASCII for now, do we want to support
extended ASCII?

I don't understand what you're asking. So-called "Extended ASCII" would
mean not staying with ASCII. Either we support full Unicode/UTF-8 with
clearly defined exceptions or we support a subset of (7-bit) ASCII
(e.g., the one that Bazel uses). It doesn't make sense to support any
of the non-Unicode extensions to ASCII, in my opinion.

I think it's worth jumping in here and saying that (at least) UNIX filenames
are *NOT* unicode/utf-8.  They are bytestrings.  As such we either limit
to 7-bit ASCII or we punch holes in the valid *bytes* permitted.  We *cannot*
ascribe meaning to the byte sequences on Linux or other UNIXlike OSes.

While you're correct about POSIX filenames being bytestrings, I think
it would be very bad if BuildStream accepted non-UTF-8 bytestrings.
POSIX platforms are allowed to arbitrarily restrict supported
bytestrings, so it generally wouldn't allow interoperability across
POSIX systems (not to mention Windows). And modern UNIX environments
typically use UTF-8, this includes macOS and typical Linux setups. It
would also be painful to have non-UTF-8 .bst files.

I'd be happy with Chandan's proposal of restricting element names to a
sensible 7-bit ASCII subset for now, and we can consider opening the
can of worms in the future if the need arises.

The proposed subset includes characters that are not allowed on
Windows, though. Should we reduce the set of valid characters further
or shall we leave this up to individual projects (if/when we support
non-WSL Windows at all)?

Cheers,
Jürg



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]