Re: ustring::validate() costs?



On Friday 02 December 2005 06:55, Matthias Kaeppler wrote:

[snip]

> Yes, I am only referring to the names of files, not their contents. I
> don't think what you say is true though. For example, on my notebook my
> lcoale is set to ISO-8859-1 and if I create a file in Nautilus, then
> Nautilus will encode the name in UTF-8 and not in my current locale
> encoding. Maybe that's what you are referring to in your other post but
> it's very likely that this happens, actually.

But does it do it if you have G_BROKEN_FILENAMES or G_FILENAME_ENCODING 
correctly set (before you start Nautilus, which usually occurs when GNOME 
starts up)?  If it does, you probably need to report a bug.

> Wait a second, you always separate between locale codeset and filename
> codeset. Aren't filenames always encoded in the codeset specified by the
> current locale, unless some application explicitly creates them in a
> different codeset? (For example all Gtk+ based apps where you enter a
> filename in some widget and which don't call filename_from_utf8() before
> writing it to the disk.)

Your first and second sentences contradict each other.  To your question, yes 
normally, but glib provides Glib::filename_to_utf8() (and vice versa) to do 
differently if you wish.  Since there is that option, I assume that there is 
some set of circumstances in which it may be desirable to have different 
codesets.  Forgetting to call Glib::filename_from_utf8() would not be one of 
them.

Some history: the glib developers have taken rather a high-handed approach to 
filename codesets, as represented by the name of the environmental varia - 
particularly relevant if you are using networked filesystems such as NFS or 
CIFSble G_BROKEN_FILENAMES.  They would like you to use UTF-8 whatever your 
locale codeset (probably so that filenames are portable between systems - 
particularly relevant if you are using networked filesystems such as NFS or 
CIFS).  Most people ignore that advice.

> Well, maybe I'll just post the code I have written so far. Let me tell
> you that it doesn't work, I'm still getting conversion errors thrown at
> me as soon as I read something encoded in non-UTF-8 with umlauts in it:
>
> // Here I am reading files from the disk
>              file_info = dir.read_next(file_exists);
>              if (!file_exists)
>                  break;
>
>              filename = file_info->get_name();
>              if (filename.validate())
>              {
>                  Glib::setenv("G_FILENAME_ENCODING", "UTF-8");
>                  Glib::setenv("G_BROKEN_FILENAMES", "0");
>              }
>              else
>              {
>                  std::string charset;
>                  Glib::get_charset(charset);
>                  std::cout << "Current locale: " << charset << std::endl;
>                  Glib::setenv("G_FILENAME_ENCODING", charset);
>                  Glib::setenv("G_BROKEN_FILENAMES", "1");
>              }
> 	    // this call throws if the filename contains special chars
> 	    // and is not encoded in UTF-8, how can this happen??
>              filename = Glib::filename_to_utf8(filename);

Your code setting environmental variables is pointless.  If you want to 
convert filenames from the locale codeset to UTF8 (if the filename codeset is 
not UTF-8) as a mandatory policy in your program, use Glib::locale_to_utf8().  
It is bizarre to programatically set G_BROKEN_FILENAMES or 
G_FILENAME_ENCODING so that Glib::filename_to_utf8() will do the same thing.

Though it is not relevant for the reasons mentioned above, you should not set 
both G_BROKEN_FILENAMES and G_FILENAME_ENCODING - you set one or the other.  
If you set both, glib resolves the conflict by choosing G_FILENAME_ENCODING.

Chris




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]