Re: Glib::filename_to_unicode bug?

From: Tadej BorovÅak <tadeboro gmail com>
To: Dave Hayes <dave jetcafe org>
Cc: gtk-perl-list gnome org
Subject: Re: Glib::filename_to_unicode bug?
Date: Thu, 15 Apr 2010 00:12:37 +0200

Hi.

Hm. I'm unclear why either of these should matter in what I would expect
to be a simple filename character conversion, but here you go:


This is important, since GLib on Windows always uses UTF-8 for
filenames, no matter how those names are encoded on disk. On other
platforms, filenames are returned in encoding, specified by current
locale.

This is on FreeBSD 8.0 using perl 5.10.1 with gtk 2.18.7 and glib
2.22.4. The file names are from a public archive of files where ghod
only knows who's written what, so there could very well be malformed
unicode strings in them. I was very deliberate in my choice of test
cases.


Hmm, looks like perl doesn't care about encoding and succeeds opening
that file simply because bytes stored in $foo match those on disk.
GLib on the other hand asks locale machinery in which encoding
filenames should be and then tries to convert them to the UTF-8. And
this is where the problems start.

(16 bytes):3036 3620 4465 6a61 a020 5675 2e6d 3461 Â Â Â066 Deja. Vu.m4a


Looking at this hexdump, I would say that the offending character here
is probably the 'a0' one (in Extended ASCII known as non-breaking
space). This one probably produces invalid sequence.


As for the solution, I'm not entirely sure how to solve this. Renaming
file to filenames that have only characters with codes < 128 might do
the trick.

Tadej

-- 
Tadej BorovÅak
tadeboro.blogspot.com
tadeboro gmail com
tadej borovsak gmail com

References:
- Re: Glib::filename_to_unicode bug?
  - From: Dave Hayes

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]