Encoding in g_filename_to_uri()



Here is what I think is a bug.  Do this:

0. Make sure you are using GtkFileSystemUnix.
1. export LANG=es_MX.ISO8859-1
2. export G_FILENAME_ENCODING= locale
3. gedit
4. Save a filename called ��. Hit File/Open
6. Select that file
7. Gedit will tell you that the file does not exist and would you like
to create it.

The filename on disk is 5 bytes long with non-ASCII characters, as it is
in ISO8859-1.  If it had been created without G_FILENAME_ENCODING, it
would be 10 bytes long and in UTF-8.

Internally, Gedit uses gtk_file_chooser_get_uris(), and then for each
one of those it does gnome_vfs_uri_exists() --- note that this has
nothing to do whether you are using GtkFileSystemUnix or
GtkFileSystemGnomeVFS; Gedit does use gnome-vfs for itself.

gtk_file_chooser_get_uris() gets the list of internal GtkFilePath, which
for the unix backend are filenames in the local encoding, and converts
them to URIs using g_filename_to_uri().

However, g_filename_to_uri() does essentially this:

char *
g_filename_to_uri (char *filename)
{
  char *utf8_filename;
  char *escaped;

  utf8_filename = g_filename_to_utf8 (filename);
  escaped = g_escape_file_uri (utf8_filename);

  return escaped;
}

g_filename_to_utf8() takes the local encoding for filenames and uses it
to convert the filename to UTF8.  So, our 5-byte filename from above
gets converted into a 10-byte UTF-8 string.  Later, g_escape_file_uri()
turns this into a percent-escaped string and prepends a "file://".  The
end result is something like

	file:///home/federico/%C3%A1%C3%A9%C3%AD%C3%B3%C3%BA

which is a valid URI, but does *not* refer to the filename above.  I
think the result should be

	file:///home/federico/%E1%E9%ED%F3%FA

That is, the hexadecimal representation of "��in ISO8859-1.

When gnome-vfs gets the URI to see if it exists, it decodes it and fails
to locate the file because the URI is encoded incorrectly in glib.

I think g_filename_to_uri() should not call g_filename_to_utf8() and
just pass the filename to g_escape_file_uri().

Is this analysis correct?

  Federico




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]