Portable Spell Checker Interface Library




I have been working on the specs for a Portable Spell Checker Interface
Library.  I thought I would share it with the gnome team in case you have
any suggestions to make other than writting it in pure C.





               Portable Spell Checker Interface Library               
                                                                      
                            Kevin Atkinson                            
                          kevinatk@home.com                           

                            March 9, 2000                             
                             (Revision 2)                             


1 Goal

The goal of the library is to provide a generic interface to Spell
checker libraries installed on the system.

2 Overview

The Pspell library contains two main classes and several helper
classes. The two main classes are PspellConfig and PspellMaster. The
PspellConfig class is used to set inital defaults and to change spell
checker specific options. The PspellManager class does most of the
real work. It is resposable for managing the dictionaries, checking if
a word is in the dictrionary, and comming up with suggestions among
other things. There are many helper classes the important ones are
PspellWordList, PspellMutableWordList, Pspell*Emulation. The
PspellWordList classes us used for accessing the suggestion list, as
well as the personal and suggestion word list currently in use. The
PspellMutableWordList is used to manage the personal, and perhapes
other, word lists. The Pspell*Emulation classes are used for iterating
through a list.

A C interface will also be proved as well as a few STL like helper
classes for those who prefer more modern C++.

3 Usage

When your application first starts you should get a new configuration
class with the command:

    PspellConfig * spell_config = new_pspell_config();

which will create a new PspellConfig class. It is allocated with new
and it is your responsibility to delete it with or
delete_pspell_config or the standard C++ delete. Once you have the
config class you should set some variables. The most important one is
the language variable. To do so use the command:

    spell_config->replace("lang", "en-US");

which will set the default language to use to american english. The
language is expected to be the standard two letter ISO 639 language
code, with an optional two letter ISO 3166 country code after a dash
or underscore. Other things you might want to set is the encoding of
``char'' strings, the preferred spell checker to use, the search path
for dictionary's, and the like.

When ever a new document is created a new PspellManager class should
also be created. There should be one manager class per document. To
create a new manager class use the command.

    PspellManager * spell_checker = new_pspell_manager(spell_config);

which will create a new PspellManager class using the defaults found
in spell_config. If for some reason you want to use different defaults
simply clone spell_config and change the setting like so:

    PspellConfig * spell_config2 = spell_config->clone();
    spell_config2->replace("lang","nl");
    PspellManager * spell_checker = new_pspell_manager(spell_config2);
   
    delete_pspell_config(spell_config2);

Once the manager class is created you can use the check method to see
if a word in the document is correct like so:

    bool correct = spell_checker->check(<word>);

<word> can be any one of const char *, const u16int *, or const u32int
* where u16int and u32int is the unsigned 16 and 32 bit integer on the
current platform respectfully. Strings of const char * are expected to
use iso8859-1 or some other 256 bit character set as determined by the
current language in use. Other encoding are allowed such as UTF-8 but
they must be explicitly set via a configuration option before its
first use. Stings of const u16int * and const u32int * are expected to
be in Unicode.

If the word is not correct than the suggest method can be used to come
up with likely replacements.

    PspellWordList & suggestions = suggest(<word>); 
    PspellStringEmulation * elements = suggestions.elements();
    const char * word;
    while ( (word = elements.next()) != NULL ) {
      // add to suggestion list
    }
    delete elements;

(It is also possible to access elements as const u16int *, or const
u32int *. See the class reference section for how to do so.)

Once a replacement is made the store_repl method should be used to
communicate the replacement pair back to the spell checker (see
section 7.1 for why). It usage is as follows:

    spell_checker->store_repl(<misspelled word>, <correctly spelled 
    word>);

If the user decided to add the word to the session or personal
dictionary the the word can be be added using the add_to_session or
add_to_personal methods respectfully like so:

    spell_checker->add_to_session|personal(<word>);

It is better to let the spell checker manage these words rather than
doing it your self so that the words have a change of appearing in the
suggestion list.

Finally, when the document is closed the PspellManager class should be
deleted like so.

    delete_pspell_manager(spell_checker);

The standard C++ delete may also be used.

4 Class Reference

Methods that return a bool generally return false on error and true
other wise. To find out what went wrong use the error_num and
error_message methods. Unless otherwise stated methods that return a
const char * will return null on error. The charter string returned is
only valid until the next method which returns a const char * is
called.

STRING is used to represent one of const char *, const u16int *, or
const u32int *.

All methods are virtual and abstract, thus these classes are really
abstract base classes. Therefore you cannot simply store the object
directly. In order to make copies of the objects use the clone and
assign methods if they are provided.

4.1 PspellConfig

The PspellConfig class is used to hold configuration information it
has a set of keys which it will except.  Inserting are even trying to
look at a key that it does not know will produce an error. Extra
accepted keys can be added with the set_extra. method.

PspellConfig * clone() const

void assign(const PspellConfig *)

if the two objects are not of the exact same type the assign method is
undefined.

int error_num()

const char * error_message()

string valid until the next error

void set_extra(const PspellKeyInfo * begin, const PspellKeyInfo * end)

const PspellKeyInfo * keyinfo(const char * key) const

PspellKeyInfoEmulation * possible_elements(bool include_extra = true)
const

const char * get_default(const char * key) const

PspellStringPairEmulation * elements() const

bool insert(const char * key, const char * value)

Insert will NOT overwrite an existing entry

bool replace(const char * key, const char * value)

bool remove(const char * key)

All the retrieve methods will

 1. return the default if the value is not set
 2. give an error if the key is not requested as known
 3. give an error if the value is not in the right format

const char * retrieve (const char * key) const

const char * retrieve_list (const char * key) const

bool retrieve_list (const char * key, PspellMutableContainer &) const

int retrieve_bool(const char * key) const

return -1 on error, 0 if false, 1 if true

int retrieve_int(const char * key) const

return -1 on error

PspellConfig * new_pspell_config()

returns a new config class for setting things up before a manager
class is created

delete_pspell_config(PspellConfig *)

deletes a PspellConfig class. You can also use the sand C++ delete.

4.2 PspellManager

This class is responsible for keeping track of the dictionaries coming
up with suggestions and the like Its methods are NOT meant to be used
my multiple threads and/or documents. If you wish to have more than
one language per document simple have more multiple manger classes for
each document but DO NOT share a manauger class between more than one
document.

Most all if the manipulation of options is done via the Config class,
thus this class has precious few methods.

int error_num()

const char * error_message()

string valid until the next error

PspellConfig & config()
const PspellConfig & config ()

this config returned is NOT the same object as the one you pass in.

const char * lang_name() const

bool check(STRING) cons

bool add_to_personal(STRING)

bool add_to_session(STRING)

PspellWordList & master_word_list() const
PspellWordList & personal_word_list() const
PspellWordList & session_word_list() const

because the word lists may potently have to convert from non-uni to
uni or vise versa the pointer returned by the emulation is only valid
to the next call.

bool save_all_wls()

void clear_session()

PspellWordList & suggest(STRING)

the suggestion list and the elements in it are only valid until the
next call to suggest.

bool store_repl(STRING mis, STRING cor)

PspellManager * new_pspell_manager(const PspellConfig * config)

returns a new manager class, allocated with new,based on the settings
in config

delete_pspell_manager(const PspellManager *)

deletes a PspellManager class, you may also use the standard C++
delete

4.3 PspellWordList

bool empty() const

int size() const

StringEmulation * elements() const

ShortUniStringEmulation * short_uni_elements() const

UniStringEmulation * uni_elements() const

4.4 PspellMutableWordList

public PspellWordList

int error_num()

const char * error_message()

string valid until the next error

boll add(STRING)

bool clear_all()

bool save()

PspellMutableWordList * new_pspell_personal_word_list(PspellConfig *)

returns a new personal word list so that you can manage it

delete_pspell_mutable_word_list(PspellMutableWordList *)

deletes a PspellMutableWordList, you may also use standard C++ delete.

4.5 PspellEmulation

PspellEmulation * clone() const

void assign(const PspellEmulation *)

if the two objects are not of the exact same type the assign method is
undefined.

delete_pspell_emulation(PspellEmulation *)

deletes a PspelEmulation, you may also use standard C++ delete.

4.6 Pspell*Emulation

public PspellEmulation

All emulations have the following two methods.

<type> next()

bool at_end() const

where <type> is specific to the particulate emulation given by the
following table



           Name                           Type                        
           PspellStringEmulation          const char *                
           PspellShortUniStringEmulation  const u16int *              
           PspellUniStringEmulation       const u32int *              
           PspellKeyInfoEmulation         PspellKeyInfo *             
           PspellStringPairEmulation      PspellStringPair            




4.7 Other minor classes.

    class PspellMutableContainer {
    public:
      virtual void insert(const char *) = 0;
      virtual void remove(const char *) = 0;
      virtual void clear() = 0;
      PspellMutableContainer();
    };
     
    enum PspellKeyInfoType {Bool, String, Int, List};
     
    struct PspellKeyInfo {
      const char * name;
      PspellKeyInfoType  type;
      const char * def;
      const char * desc; // null if internal value
    };
     
    struct PspellStringPair { 
      const char * first;
      const char * second; 
    };

5 C Interface

An extrern C interface will also be provided. Method will be mapped to
functions in the following manner.

    <class name in lowercase with underscores>_<method name>([const] <
    Class> *, <other parameters if any>)

For example ``PspellManager::lang_name() const'' would become
``pspel_manager_lang_name(const PspellManager *)''. For methods which
overload based on the string type the u16int and u32int methods will
be mapped the same way with a final _16 or _32 added to the function
name. For example ``PspellManager::check(const 16int *)'' would have a
fucntion name of pspell_manager_check_16.

Methods that return a bool will instead return an int in the C
interface.

6 Modern C++ Helper Classes

An almost forward iterator class will be proved to wrap the Pspell*
Emulation classes in. It is almost a forward iterator becuase two
iterators will not be able to compared two each other unless it is to
check if the iterator is at the end.

I strongly recoment the use of auto_ptr with all pointers returned.
All pointers returned that you are responable to free will be able to
de deleted with the standard C++ delete.

These helper classes will provided in seperate header files so those
who do not which to use them will not have to.

7 Rational


7.1 store_repl method

This method is needed because Aspell (http://aspell.sourceforge.net/)
is able to learn from users misspellings. For example on the first
pass a user misspells beginning as beging so aspell suggests:

    begging, begin, being, Beijing, bagging, ....

However the user then tries "begning" and aspell suggests

    beginning, beaning, begging, ...

so the user selects beginning. However than, latter on in the document
the user misspelles it as begng (NOT beging). Normally aspell will
suggest.

    began, begging, begin, begun, ....

However becuase it knows the user mispelled beginning as beging it
will instead suggest:

    beginning, began, begging, begin, begun ...

I myself often misspelled beginning (and still do) as something close
to begging and two many times wind up writing sentences such as
"begging with ....".

8 Timeframe

An alpha version of this interface should be available by the end of
March, 2000 or the begging of April. An Aspell (http://
aspell.sourceforge.net) module will also be provided. I am hoping some
one else will come up with the Ispell (http://fmg-www.cs.ucla.edu/
fmg-members/geoff/ispell.html) module. Modules for other spell
checkers are more than welcome.

9 Future

Future versions of the interface will provide better support for
multilingual documents as well as methods for spell checking whole
regions of text. Letting the spell checker check whole region of text
will allow the spell checker to skip over formating commands, url, and
the like.

10 Feedback

As always feedback is most appreciated. I can be contacted at
kevinatk@home.com.

11 Other Formats

This document is available in several other formats:



         Format  Location                                             
         HTML    http://pspell.sourceforge.net/interface.html         
         Text    http://pspell.sourceforge.net/interface.txt          
         TEX     http://pspell.sourceforge.net/interface.tex          
         PS      http://pspell.sourceforge.net/interface.ps           
         Dvi     http://pspell.sourceforge.net/interface.dvi          
         LyX     http://pspell.sourceforge.net/interface.lyx          




About this document ...

Portable Spell Checker Interface Library

This document was generated using the LaTeX2HTML translator Version
99.2beta6 (1.42)

Copyright (C) 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based
Learning Unit, University of Leeds.
Copyright (C) 1997, 1998, 1999, Ross Moore, Mathematics Department,
Macquarie University, Sydney.

The command line arguments were:
latex2html -no_subdir -split 0 -no_navigation -local_icons
-show_section_numbers interface.tex

The translation was initiated by Kevin Atkinson on 2000-03-09
----------------------------------------------------------------------

Kevin Atkinson 2000-03-09
[sflogo]

---
Kevin Atkinson
kevinatk@home.com
http://metalab.unc.edu/kevina/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]