Re: How to get character encoding...
- From: "David Necas (Yeti)" <yeti physics muni cz>
- To: Micah Carrick <email micahcarrick com>
- Cc: gtk-list <gtk-list gnome org>
- Subject: Re: How to get character encoding...
- Date: Sun, 29 May 2005 22:08:12 +0200
On Sun, May 29, 2005 at 01:56:26PM -0400, Micah Carrick wrote:
> Is there a routine I can use to determine the character encoding of a
> text file so I can then convert it to UTF-8 for display in a gtkTextView?
Generally, no. For a short text in arbitrary language
and arbitrary encoding even humans may not be able to
determine it.
It's quite easy to tell apart legacy 8bit encoding and
unicode variants UTF-8, UTF-16, UCS-4. Quite a few programs
can do it (e.g., file) although there's no such routine in
GLib AFAIK.
But if you need to recognize legacy 8bit encodings, you are
in trouble (I've written a program Enca, that does it for
some East-European languages, but that's probably of little
help here; various detection routines for Asian languages
can be found on the web too; and methods to determine both
language and encoding exist too, but they need fairly
long/typical text). If it's reasonable to assume the text
is related to current locale somehow, you can simply try
nl_langinfo(CODESET) from non-Unicode version of that
locale. Or something like that, depending on the situation.
In all cases, if the file is user-supplied allow user to
choose the encoding.
Yeti
--
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]