Re: ustring::validate() costs?

From: Murray Cumming <murrayc murrayc com>
To: Ole Laursen <olau hardworking dk>
Cc: gtkmm-list gnome org
Subject: Re: ustring::validate() costs?
Date: Wed, 21 Dec 2005 09:02:32 +0100

On Tue, 2005-12-20 at 23:46 +0100, Ole Laursen wrote:
> Stephan Puchegger <stephan puchegger univie ac at> writes:
> 
> >> It would be nice to figure out in my program for /each/ file I read,
> >> in which character set it is encoded. Is this possible? I only found
> >> functions so far which can either read the locale's character set or
> >> check if some filename is valid UTF-8 (or not), but no function
> >> which individually probes for a certain file in which character set
> >> its filename is encoded.
> >
> > I am no expert in "character-string encodings", but I guess that this is 
> > not possible, since no string contains information about the actual 
> > encoding type. The only thing it contains is the encoded string itself. 
> > The encoding type is usually taken from the locale if I am not 
> > completely mistaken.
> 
> This is a old thread but: most browsers have an auto-detect option for
> guessing the encoding of web pages because web authors are sloppy and
> forget to specify what encoding they are using.
> 
> With file names, you would only have very little data to base the
> guess on, but on the other hand you probably only have to worry about
> two encodings, UTF-8 and the encoding of the locale. So it should be
> doable. In fact, I think trying UTF-8 and then falling back to the
> encoding of the locale if the file name is not valid UTF-8 will get
> you through most cases intact, at least for European languages.

I _think_ that regexxer (a gtkmm app) guesses encodings and has a
fallback. gedit probably does too, because plain text files don't have
encoding information.

-- 
Murray Cumming
murrayc murrayc com
www.murrayc.com
www.openismus.com

References:
- ustring::validate() costs?
  - From: Matthias Kaeppler
- Re: ustring::validate() costs?
  - From: Chris Vine
- Re: ustring::validate() costs?
  - From: Matthias Kaeppler
- Re: ustring::validate() costs?
  - From: Chris Vine
- Re: ustring::validate() costs?
  - From: Matthias Kaeppler
- Re: ustring::validate() costs?
  - From: Stephan Puchegger
- Re: ustring::validate() costs?
  - From: Ole Laursen

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]