Re: ustring::validate() costs?



On Tue, 2005-12-20 at 23:46 +0100, Ole Laursen wrote:
> Stephan Puchegger <stephan puchegger univie ac at> writes:
> 
> >> It would be nice to figure out in my program for /each/ file I read,
> >> in which character set it is encoded. Is this possible? I only found
> >> functions so far which can either read the locale's character set or
> >> check if some filename is valid UTF-8 (or not), but no function
> >> which individually probes for a certain file in which character set
> >> its filename is encoded.
> >
> > I am no expert in "character-string encodings", but I guess that this is 
> > not possible, since no string contains information about the actual 
> > encoding type. The only thing it contains is the encoded string itself. 
> > The encoding type is usually taken from the locale if I am not 
> > completely mistaken.
> 
> This is a old thread but: most browsers have an auto-detect option for
> guessing the encoding of web pages because web authors are sloppy and
> forget to specify what encoding they are using.
> 
> With file names, you would only have very little data to base the
> guess on, but on the other hand you probably only have to worry about
> two encodings, UTF-8 and the encoding of the locale. So it should be
> doable. In fact, I think trying UTF-8 and then falling back to the
> encoding of the locale if the file name is not valid UTF-8 will get
> you through most cases intact, at least for European languages.

I _think_ that regexxer (a gtkmm app) guesses encodings and has a
fallback. gedit probably does too, because plain text files don't have
encoding information.

-- 
Murray Cumming
murrayc murrayc com
www.murrayc.com
www.openismus.com




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]