Re: Can someone please comment on this short program



Hi,

Am Dienstag, den 13.12.2005, 21:18 +0100 schrieb Matthias Kaeppler:
> Nelson Ben�z wrote:
> > On Tue, December 13, 2005 1:39 pm, Murray Cumming said:
> > 
> >>>Some tips for encoding charsets troubles:
> >>
> >>Isn't this for file contents, not file names?
> > 
> > 
> >  Yes, I think the problem is the emacs c file that contain the filename is
> > on iso-8859-x and so the filename, while if the c file were in utf8 and
> > so the filename then didn't failed. That's what I understood.
> 
> The whole point isn't about Emacs anyway. The problem to my 
> understanding lies in the Uri class expecting UTF-8 encoded strings, 
> while FileInfo handling strings in the locale's encoding; both don't go 
> well together for non-ASCII characters.
> 
> Example:
> You read the file '�' from the disk which name is encoded in ISO-8859-15:

I have the kernel store stuff in UTF-8 on disk. There is no point in
using legacy encodings if you don't have to (if you have to, that's
different :))

Really makes my life a lot easier (of course users who don't store stuff
in UTF-8 get to discover the bugs I didn't notice, but ... too bad for
them :))

> 
>    // returns std::string in the locale's encoding
>    std::string filename = dirhandle.read_next()->get_name();
> 
> You want to create an URI object from this filename, but...

The problem being either that 
1) encodings other than UTF-8 are used at all anymore
2) string class doesn't include the encoding

When getting rid of reason 1 is impractical, need to fight reason 2 (of
course STL does not provide for that, so you mostly emulate it with your
brain by using hungarian notation... or you can derive from std::string,
maybe)

For the special case of unicode strings there is Glib::ustring (nice to
see :)), so you might get it somewhat more type-checking with that...

So:

std::string fnIso8859_15Foobar = dirhandle.read_next()->get_name();
Glib::ustring u8Foobar = filename_to_utf8(fnIso8859_15Foobar);
Uri uriFoobar = Uri::create(u8Foobar);

Is filename_to_utf8 a glibmm wrapper or the original function? If it is
a wrapper it should be made to have a signature like:
Glib::ustring filename_to_utf8(std::string fnSource);
(if it doesn't have already, that is)

> 
>    // error! Invalid byte sequence in conversion input
>    RefPtr<Uri> uri = Uri::create(filename);

If the filename is some kind of whack local encoding, this is expected
and actually good. Think if it didn't notice and you passed the url
around to some unsuspecting innocent other machine...

> 
> Uri::create() expects UTF-8, so you /have/ to convert the filename to 
> UTF-8 using the glib conversion functions:
> 
>    RefPtr<Uri> uri = Uri::create(filename_to_utf8(filename));
> 
> No conversion error anymore, but now the Uri object is effectively 
> useless, because it doesn't point to an existing entity anymore; there 
> is no file on the system by the name the conversion yields.

What do you mean? It should "point" to the right file...

That said, I have no idea how the filename_to_utf8 finds out what
encoding the original filename is in... I wouldn't know how to find out
at all... the kernel is basically ignorant about filename encodings so
it just passes whatever bytes it finds. So how is glib finding out that
this filename on that partition is "ISO-8859-1" encoded?

I really don't see the point in clinging to this anymore, we should just
make _all_ the filesystem names UTF-8 (or at least look like that to
userland for filesystems that have assumptions of encoding)...

(I wonder why the kernel doesn't default to that...)

> 
> And /that/ is the dilemma; regardless how you bend and turn it, one of 
> the actions will fail.
> 
> - Matthias

cheers,
   Danny





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]