Re: spell checking (long, sorry...)



Hi Christophe, hi all!

Inspired by your mail, I tried spell checking on my box (LinuxPPC 2000, glibc
2.1.3). First the good news: it works for me, and after I installed the german
language package from http://aspell.sourceforge.net even in german. Here is a
list of the *spell related rpm's and libs on my box:

[albrecht@regulus albrecht]$ rpm -qa | grep spell
pspell-ispell-devel-0.10.2-0_helix_1
pspell-ispell-0.10.2-0_helix_1
ispell-german-3.1.20-25
pspell-devel-0.11.1-0_helix_2
pspell-0.11.1-0_helix_2
ispell-3.1.20-25
aspell-devel-0.32.1-0_helix_1
aspell-0.32.1-0_helix_1
[albrecht@regulus albrecht]$ /sbin/ldconfig -p | grep spell
        libpspell_ispell.so.0 (libc6) => /usr/lib/libpspell_ispell.so.0
        libpspell_ispell.so (libc6) => /usr/lib/libpspell_ispell.so
        libpspell_aspell.so.1 (libc6) => /usr/lib/libpspell_aspell.so.1
        libpspell_aspell.so (libc6) => /usr/lib/libpspell_aspell.so
        libpspell.so.2 (libc6) => /usr/lib/libpspell.so.2
        libpspell.so (libc6) => /usr/lib/libpspell.so
        libpspell-modules.so.1 (libc6) => /usr/lib/libpspell-modules.so.1
        libpspell-modules.so (libc6) => /usr/lib/libpspell-modules.so
        libpspell-impl.so.3 (libc6) => /usr/lib/libpspell-impl.so.3
        libpspell-impl.so (libc6) => /usr/lib/libpspell-impl.so
        libaspell.so.7 (libc6) => /usr/lib/libaspell.so.7
        libaspell.so (libc6) => /usr/lib/libaspell.so

But now for the bad news... for me it does not work for words with "umlauts"
(german national characters). Looking into src/spell-chek.c, line 1200, I
found the following rexexp to isolate words:

    const gchar *new_word_regex = "\\<[[:alpha:]']*\\>";

Apparently, my glibc implementation (yes LANG/LC_ALL are de_DE.ISO-8859-1)
does not recognise Umlauts neither in the regexp nor in a call to isalpha().
Not sure if this changed in glibc 2.2. Changing the expression to
"\\<[[:alpha:]äöüÄÖÜß']*\\>" helps a little, as most words are now recognised.
The exception are those *starting* with an Umlaut (like "ähnlich")...

An other problem might be the "empty word separator expression" (\< and \>).
During the discussions about the URL regexp's it emerged that there are
probabely more people around whose rexexp implementation does not support this
feature. So I guess we should think about rewriting this part of code, and
maybe replace the regexec stuff by something hardcoded. However, if the
isalpha implementation was not changed in recent glibc's, then we have the
problem that we had to hand-code all national character sets... Opinions?

Am 05.07.2001 18:50:00 schrieb(en) christophe barbé:
> > I admit though that I crashed balsa/pspell/aspell a number of times but
> > never had time to look closer at the problem...

I can confirm that I got a segfault after using the spell checker... Could not
reproduce it yet.

Cheers, Albrecht.
 
-- 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Albrecht Dreß  -  Monschauer Straße 22  -  D-53121 Bonn (Germany)
      Phone (+49) 228 6199571  -  E-Mail albrecht.dress@arcormail.de
_________________________________________________________________________




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]