Re: [Tracker] Automatic Language Detection

I think I got it -- new patch in bugzilla.

> > > I just wrote a patch for #377891[1], could I get some of you to test
> > > it.  I ran some pdfs I found with and, and it
> > > seems to be working correctly...but more eyes the better.
> Both from
> > great stuff but we only support utf-8 - are all those language modules
> > utf-8 based?
> """Our main focus will be on compiling a list of fingerprints of UTF-8
> encoded languages, since Unicode is clearly the way to go and UTF-8 is
> usually the best way to do Unicode."""
> It works (for my tests) if I encode the buffer to UTF-8 first, and
> I've been able to get away with just sending the first 1K of the file.

before I accept patch can you:

1) just include langs we have stopwords/stemmers for
2) check and verify each lang we support with utf8 content
3) if (2) fails use g_convert to convert utf8 to necessary char_set

I will fiddle with once you have done the above,


