ustring::validate() costs?
- From: Matthias Kaeppler <matthias finitestate org>
- To: gtkmm-list gnome org
- Subject: ustring::validate() costs?
- Date: Thu, 01 Dec 2005 21:41:06 +0100
Hey guys,
I am reading filenames from the harddisk which may or may not be in
UTF-8 encoding. So, since Gtk+ and Glib naturally expect UTF-8, I
somehow have to make sure my code doesn't break when the user's
filenames are encoded differently.
I spent quite some time searching the web and reading source code and
documentation how to do this properly, and what I could figure out so
far is this:
If G_BROKEN_FILENAMES is set to 1 in the environment, then
g_filename_to_utf8 will try to convert from the current locale to UTF-8,
otherwise, the string is copied 1:1. For some reason this variable isn't
mentioned in the documentation of the Glib character set conversion
functions, so maybe my information is outdated--It only mentions
G_FILENAME_ENCODING to determine the character set when you're
converting /from/ UTF-8 to the locale's encoding.
Anyway, I really don't want to force the user to set some obscure
environment variable just so the program will work for him (since there
are still users who do not use UTF-8 yet this is just not acceptable).
So I thought I could do this:
For every file I read, I first check if it's valid UTF-8 using
ustring::validate(). If it isn't, I get the locale's character encoding
with Glib::get_charset() and pass it to
Glib::setenv("G_FILENAME_ENCODING", result_of_get_charset). Otherwise I
set the env-variable to "" again. Bottom line, in any case the call to
Glib::filename_to_utf8() will succeed (that's the intention at least).
This way I can be sure that even files with mixed encodings (UTF-8 and
non-UTF-8) are converted correctly, plus I don't need to force the user
to supply these values.
However, I'm concerned about runtime costs. How exactly does validate()
work? How expensive is it to call on say 1000 files?
This whole conversion topic is bugging me for months now, I'd appreciate
your input--preferrably in an encoding I can read ;)
Best regards,
Matthias Kaeppler
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]