Re: [gedit-list] Fix Bug 342918 and even support universal encoding detection



Whatever will not make me touch that code again is fine by me :)

On Tue, Oct 16, 2012 at 5:14 PM, Jesse van den Kieboom <jessevdk gnome org> wrote:
2012/10/16 Ma Xiaojun <damage3025 gmail com>
On Tue, Oct 16, 2012 at 9:14 AM, Jesse van den Kieboom
<jessevdk gnome org> wrote:
> Well, that's not what it says in the blog post you refer to (there it just
> mentions the list), but sure, I didn't look further into it. Maybe you can
> clarify what "using statistics" means?
It means that, instead of trial and error in a limited encoding list,
the prober try all possible encodings.
The prober must find the document is valid in multiple encodings.
However, since text documents, web pages are more or less filled with
natural language. Statistics can tell which encoding is mostly.
For example, There are two major Chinese GB(2312/K/18030) and BIG5(-HKSCS).
If a user use wrong encoding to interpret a text file, she would see
many rarely used, weird Chinese characters.
Then she may realize the encoding was wrong and try to open the same
file with GB* or UTF-8.
A computer program can do similar things, after notice that the text
file is non-UTF-8.
It can try both GB* and BIG5*, query character frequencies database,
and guess that the file is encoded in the encoding with more common
characters.
More advanced language model is also possible.


> gedit is not going to link to a KDE library (just as KDE apps are not going
> to link to gobject libs) because it pulls in too many dependencies, not to
> mention that I'm guessing it's written in C++. With regards to uchardet, it
> would be very interesting to use it, but we'll have to see if it's possible.
> For one, is the MPL compatible with GPL? I see that there is a uchardet
> package in ubuntu at least, I don't know about other distros (also the
> version of the package is 0.0.1). Traditionally, gedit does not depend on
> much software outside of the GNOME core libraries, but I don't see anyone
> reimplementing something like this any time soon.
Well, the origin Mozilla code is MPL 1.1/GPL 2.0/LGPL 2.1 (the
upstream of uchardet).
http://lxr.mozilla.org/seamonkey/source/extensions/universalchardet/
Is this acceptable to Gedit?
We may ask the author of uchardet to preserve such triple license?
Or we fork from Mozilla directly, choosing GPL or LGPL, and make a
GNOMEism library, say using GObject Introspection.

So, looking at the code, it seems the actual C wrapper thingie is in fact triple licenced. It's just the project as a whole that has the MPL 1.1 license (whether or not that is valid, I'm not a license expert). In any case, the C wrapped seems sufficiently easy to reimplement in GObject C and I think would be quite valuable. Maybe we can have some input from Paolo and Nacho before continuing down this path? (To me it seems like a good option at least to try). 


_______________________________________________
gedit-list mailing list
gedit-list gnome org
https://mail.gnome.org/mailman/listinfo/gedit-list




--
Ignacio Casal Quinteiro


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]