Re: Can someone please comment on this short program

dannym wrote:

std::string fnIso8859_15Foobar = dirhandle.read_next()->get_name();
Glib::ustring u8Foobar = filename_to_utf8(fnIso8859_15Foobar);
Uri uriFoobar = Uri::create(u8Foobar);

Well, except that you renamed the variables, this is what I do :)
That doesn't help solving the problem however.

Is filename_to_utf8 a glibmm wrapper or the original function? If it is
a wrapper it should be made to have a signature like:
Glib::ustring filename_to_utf8(std::string fnSource);
(if it doesn't have already, that is)

Yes, that's right.

  // error! Invalid byte sequence in conversion input
  RefPtr<Uri> uri = Uri::create(filename);

If the filename is some kind of whack local encoding, this is expected
and actually good. Think if it didn't notice and you passed the url
around to some unsuspecting innocent other machine...

I don't think users which use encodings other than UTF-8 (which are still prominent by the way) will feel using my filemanager if it breaks each time it reads a filename not encoded in UTF-8. So no, this is not a good thing :)

Uri::create() expects UTF-8, so you /have/ to convert the filename to UTF-8 using the glib conversion functions:

  RefPtr<Uri> uri = Uri::create(filename_to_utf8(filename));

No conversion error anymore, but now the Uri object is effectively useless, because it doesn't point to an existing entity anymore; there is no file on the system by the name the conversion yields.

What do you mean? It should "point" to the right file...

No, it doesn't. That's because special characters such as umlauts have different character codes in UTF-8 than they have in ISO. I think my initial post showed that this doesn't work:

File "t�":

Filename encoded in ISO-8859-1:
  file:///home/matthias/t%E4st | exists: false

Filename encoded in UTF-8 (and THIS is actually pointing to something, although it's the same "name"):
  file:///home/matthias/t%C3%A4st | exists: true

So yes, it does make a difference. Both times the same file, but in different encodings; for one the Uri says it doesn't exist, while in the other encoding it says it does.

... it just passes whatever bytes it finds. So how is glib finding out that
this filename on that partition is "ISO-8859-1" encoded?

Only by looking at the environment or by the user telling it that filenames are encoded in whatever G_FILENAME_ENCODING is holding.

I really don't see the point in clinging to this anymore, we should just
make _all_ the filesystem names UTF-8 (or at least look like that to
userland for filesystems that have assumptions of encoding)...

I very much agree, but that still doesn't solve my problem :)


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]