Re: ustring::validate() costs?



Chris Vine wrote:
Are you worried about the codeset of a file's contents or of its filename? You begin by referring to filenames, but you appear to end by referring to the codeset in which a file has been written to. All filenames in any one system will use the same codeset - you cannot have "files with mixed encodings", as you put it, in that sense.

Yes, I am only referring to the names of files, not their contents. I don't think what you say is true though. For example, on my notebook my lcoale is set to ISO-8859-1 and if I create a file in Nautilus, then Nautilus will encode the name in UTF-8 and not in my current locale encoding. Maybe that's what you are referring to in your other post but it's very likely that this happens, actually.

<snip>
If all you want to do is to force a conversion of a filename from the locale codeset to UTF-8 and you don't want to bother with the G_BROKEN_FILENAMES or G_FILENAME_ENCODING environmental variables, just use Glib::locale_to_utf8() (this will have the same effect as calling Glib::filename_to_utf8() with the G_BROKEN_FILENAMES environmental variable set). You lose the flexibility of being able to cater for the locale codeset and the filename codeset being different, but how many systems would do something as insane as that anyway?

Wait a second, you always separate between locale codeset and filename codeset. Aren't filenames always encoded in the codeset specified by the current locale, unless some application explicitly creates them in a different codeset? (For example all Gtk+ based apps where you enter a filename in some widget and which don't call filename_from_utf8() before writing it to the disk.)

Well, maybe I'll just post the code I have written so far. Let me tell you that it doesn't work, I'm still getting conversion errors thrown at me as soon as I read something encoded in non-UTF-8 with umlauts in it:

// Here I am reading files from the disk
            file_info = dir.read_next(file_exists);
            if (!file_exists)
                break;

            filename = file_info->get_name();
            if (filename.validate())
            {
                Glib::setenv("G_FILENAME_ENCODING", "UTF-8");
                Glib::setenv("G_BROKEN_FILENAMES", "0");
            }
            else
            {
                std::string charset;
                Glib::get_charset(charset);
                std::cout << "Current locale: " << charset << std::endl;
                Glib::setenv("G_FILENAME_ENCODING", charset);
                Glib::setenv("G_BROKEN_FILENAMES", "1");
            }
	    // this call throws if the filename contains special chars
	    // and is not encoded in UTF-8, how can this happen??
            filename = Glib::filename_to_utf8(filename);

Regards,
Matthias




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]