ustring::validate() costs?



Hey guys,

I am reading filenames from the harddisk which may or may not be in UTF-8 encoding. So, since Gtk+ and Glib naturally expect UTF-8, I somehow have to make sure my code doesn't break when the user's filenames are encoded differently.

I spent quite some time searching the web and reading source code and documentation how to do this properly, and what I could figure out so far is this:

If G_BROKEN_FILENAMES is set to 1 in the environment, then g_filename_to_utf8 will try to convert from the current locale to UTF-8, otherwise, the string is copied 1:1. For some reason this variable isn't mentioned in the documentation of the Glib character set conversion functions, so maybe my information is outdated--It only mentions G_FILENAME_ENCODING to determine the character set when you're converting /from/ UTF-8 to the locale's encoding.

Anyway, I really don't want to force the user to set some obscure environment variable just so the program will work for him (since there are still users who do not use UTF-8 yet this is just not acceptable).

So I thought I could do this:
For every file I read, I first check if it's valid UTF-8 using ustring::validate(). If it isn't, I get the locale's character encoding with Glib::get_charset() and pass it to Glib::setenv("G_FILENAME_ENCODING", result_of_get_charset). Otherwise I set the env-variable to "" again. Bottom line, in any case the call to Glib::filename_to_utf8() will succeed (that's the intention at least).

This way I can be sure that even files with mixed encodings (UTF-8 and non-UTF-8) are converted correctly, plus I don't need to force the user to supply these values.

However, I'm concerned about runtime costs. How exactly does validate() work? How expensive is it to call on say 1000 files?

This whole conversion topic is bugging me for months now, I'd appreciate your input--preferrably in an encoding I can read ;)

Best regards,
Matthias Kaeppler




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]