Re: Detecting file encoding
- From: Jonathon Jongsma <jonathon quotidian org>
- To: Adrián Ortega <elfus0 1 gmail com>
- Cc: gtkmm-list gnome org
- Subject: Re: Detecting file encoding
- Date: Wed, 11 Aug 2010 16:04:54 -0500
On Wed, 2010-08-11 at 15:21 -0500, Adrián Ortega wrote:
> Hello,
>
>
> I'm making a small text editor to learn as much as I can from gtkmm
> and I've come across with one problem which I haven't been able to
> solve.
>
>
> The main issue is I don't know how to detect the file encoding of a
> given file. I've been reading a lot about this, and found that glibmm
> has some functions that could help me, i.e.
>
>
> bool
> get_charset ()
>
>
>
> bool
> get_charset (std::string& charset)
>
>
>
> std::string
> convert (const std::string& str,
> const std::string& to_codeset,
> const std::string& from_codeset)
>
>
>
> Glib::ustring
> locale_to_utf8 (const std::string&
> opsys_string)
>
>
>
> std::string
> locale_from_utf8 (const Glib::ustring& utf8_string)
>
>
>
>
> however, I haven't been able to detect the enconding of a file. I know
> these functions help me to convert from one encoding to another one,
> but for that I need to know the current file encoding.
> Do you have any idea, suggestion or reference that could help me?
> Sorry if this is not totally related with gtkmm but I think it's
> somewhat related to glibmm.
> Thanks in advance!
glib doesn't really provide any way to do this reliably. It's not a
simple problem to solve. ICU can do this
(http://userguide.icu-project.org/conversion/detection), and mozilla
also has their own character set detection algorithms
(http://www.mozilla.org/projects/intl/chardet.html). But most people
are not very excited about adding a dependency on either of those, so
some applications (e.g. gedit, I believe) just do a poor-man's charset
detection by trying a few common ones and using the first one that
succeeds (which is often good enough for 99% of common cases).
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]