Re: C character encodings in libgudev and GLibby projects



On Mo, 23.11.20 16:50, John Scott via gnome-devel-list (gnome-devel-list gnome org) wrote:

Hello,

It's my understanding from the docs that unless specified otherwise, functions
in GNOME's realm leverage UTF-8 strings. The (unstable) GLib docs say
Glib uses UTF-8 for its strings, and GUI toolkits like GTK+ that use GLib do
the same thing. If you get a file name from the file system, for example, from
readdir() or from g_dir_read_name(), and you wish to display the file name to
the user, you will need to convert it into UTF-8.
The opposite case is when the user types the name of a file they wish to
save: the toolkit will give you that string in UTF-8 encoding, and you will
need to convert it to the character set used for file names before you can
create the file with open() or fopen().

In this example, I presume that it would require conversion to UTF-8 if I were
to use a GNOME function to show the filename, and otherwise I ought to be able
to use puts() or printf() as per usual.

This seems a little complicated if there are many things talking to each
other, like DBus or other backends. At the moment they're not on
developer.gnome.org but one can see the libgudev docs at [1] (thanks Bastien
Nocera).

At [2] one sees g_udev_client_new() does specify UTF-8 is necessary. For the
functions returning file paths though this isn't so clear [3] but the
distinction is especially important.

The code [4] reveals that the string is returned unchanged from a udev
function [5]. Since the libudev (without the 'g') functions don't specify an
encoding I'd normally presume, as with strtol() and such, that it's in the
user's default encoding (of course assuming I've called setlocale(LC_ALL, ""))

Then again libudev and systemd are under the seemingly-not-so-far
FreeDesktop.org umbrella so maybe they tend to follow the same UTF-8-only
practices.

Is there some rule of thumb missing or misunderstanding on my part, or does
this look like underspecification for which I should reach out to the authors?

udev escapes non-ASCII strings (such as in disk labels, device strings
and so on) in C-style. This all strings udev exports should generally
be UTF-8 (given that it is a superset of ASCII).

Or to say this differently: if there are non-UTF8 strings exported by
the udev db to apps we'd consider that a bug.

Lennart

--
Lennart Poettering, Berlin


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]