How to deal with different encodings ?
- From: Debajyoti Bera <dbera web gmail com>
- To: Beagle <dashboard-hackers gnome org>
- Subject: How to deal with different encodings ?
- Date: Tue, 1 Apr 2008 09:14:08 -0400
Hey folks,
We are having a bit of trouble in deciding (*) how to deal with files in an
encoding different than the system encoding. By default, we use UTF8
everywhere and assume everything is in UTF8. Some file formats or data
sources specify their encoding (emails, html files, office documents etc.) so
those are not a problem.
If non-UTF8 is used for filenames and such, a lot of non-beagle things also
break; we are trying to use MONO_EXTERNAL_ENCODINGS to deal with this case.
(**).
For other files, depending on the file format, either UTF8 or the platform
encoding is used. Its really a clumsy affair. Apparently Windows XP has a
system setting "how should I handle non-unicode programs" where it is posible
to assign a ISO8859-1 codepage. I have no idea how it determines if data is
in non-UT8 encoding. So, even though someone could have a different system
encoding, a completely different encoding could be used for file data and
metadata. Its a perfect encoding mess :-/.
I know its not possible to always determine the right encoding. We could have
a BEAGLE_LANG variable, which if set, would specify the encoding to use while
extracting data regardless of the System encoding. Probably most apps will
fail while displaying that data, but being an indexer how far should beagle
push its indexing ability.
Any suggestions on what could be done to use the right encoding as closely as
possible ?
- dBera
(*) http://bugzilla.gnome.org/show_bug.cgi?id=524077
(**) "non UTF8 folders are not indexed" - in progress -
http://bugzilla.gnome.org/show_bug.cgi?id=440458
--
-----------------------------------------------------
Debajyoti Bera @ http://dtecht.blogspot.com
beagle / KDE / Mandriva / Inspiron-1100
[Date Prev][
Date Next] [Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]