Hyphenation Design (Was Re: Possible Pango 1.4 ideas)



Here are my current ideas for the hyphenation API:


PangoHyphenator
===============

This is the object that PangoLayout uses to control the hyphenation
of the paragraph. It contains hyphenation routines and exception data
for each language.

Typically the application can just use the default
PangoHyphenator, but if it allows hyphenation exceptions to be set for
particular documents it may create more itself.


/* This will create a new PangoHyphenator, registering our basic
   hyphenation function for all the languages we have patterns
   files for (the patterns files and standard exceptions are only
   loaded when needed). */
PangoHyphenator *pango_hyphenator_new (void);


/* Simple way to add an exception for a word, e.g. 'ex-am-ple', or
   ba{k-}{k}{ck}en if the spelling changes when the word is broken.
   Passing NULL as word_hyphenated removes the exception. */
void pango_hyphenator_set_exception_word (PangoHyphenator *hyphenator,
				          PangoLanguage   *language,
				          const gchar     *word,
				          const gchar *word_hyphenated);

/* Complex way to add an exception, as an array of PangoHyphenPoint. */
void pango_hyphenator_set_exception_data (PangoHyphenator *hyphenator,
				          PangoLanguage   *language,
				          const gchar     *word,
				          GArray	  *hyphen_data);

/* Processes the word, and return information about hyphenation points.
   The results are placed in a GArray of PangoHyphenPoint, which can be	
   reused for subsequent calls (it will be cleared automatically), thus
   saving lots of mallocs. */
void pango_hyphenator_process_word (PangoHyphenator *hyphenator,
				    PangoLanguage   *language,
				    gchar	    *word,
				    gint	     n_chars,
				    GArray	    *results);

/* To get the hyphenation routine for a particular language. */
PangoHyphenatorFunc pango_hyphenator_get_language_processor
                                    (PangoHyphenator    *hyphenator,
			 	     PangoLanguage      *language);


/* Very sophisticated apps may want to use custom hyphenation routines
   for particular languages. You specify the hyphenation function and
   any data to pass to it. You could pass NULL as the function to turn
   off hyphenation for the language. */
void pango_hyphenator_set_language_processor
                                    (PangoHyphenator    *hyphenator,
			 	     PangoLanguage      *language,
				     PangoHyphenatorFunc func,
				     gpointer            data,
				     GDestroyNotify      destroy_func);

/* This is the type of the function that hyphenates words. */
typedef void (*PangoHyphenatorFunc) (PangoLanguage   *language,
				     gchar	     *word,
				     gint	      n_chars,
				     gpointer	      data,
				     GArray	     *results);


PangoHyphenPoint
================

This is a struct to contain information about possible hyphenation
points in a word.

typedef struct _PangoHyphenPoint PangoHyphenPoint;
struct _PangoHyphenPoint
{
  /* The character offset of the hyphenation point in the word,
     i.e. the place to break just before. */
  gint offset;

  /* The number of characters to remove from the original text,
     after the hyphenation point, if we do break here. */
  gint chars_to_remove;

  /* The text to insert before the break. */
  gchar pre_break_text[MAX_PRE_BREAK_TEXT];

  /* The text to insert after the break. */
  gchar post_break_text[MAX_POST_BREAK_TEXT];

  /* The penalty to use if we break here. This is always 50 in the basic
     hyphenator, though more advanced hyphenators may adjust this, so
     some hyphenation points are preferred over others. */
  gint penalty;
};

Normal hyphenation points have chars_to_remove set to 0,
pre_break_text set to "-", and post_break_text set to "".

This concept is similar to TeX, where you can define discretionary
hyphens like "\discretionary{k-}{k}{ck}" for use in words like
"backen" (which hyphenates to "bak-ken"), where the chars in the
brackets specify the pre-break text, the post-break text, and the
no-break text respectively.

It also handles ligatures, e.g. to break "difficult" between the f's
in the "ffi" ligature, pre_break_text would be "f-", post_break_text
would be "fi", and chars_to_remove would be 1 (the "ffi" ligature
character).


PangoLayout Changes
===================

Each PangoLayout will be created with its hyphenator set to the same
default PangoHyphenator.

/* To get the hyphenator of a PangoLayout. */
PangoHyphenator *pango_layout_get_hyphenator (PangoLayout     *layout);

/* To change the hyphenator of a PangoLayout. */
void pango_layout_set_hyphenator (PangoLayout     *layout,
				  PangoHyphenator *hyphenator);


/* To get the default hyphenator. */
PangoHyphenator *pango_layout_get_default_hyphenator (void);

/* To change the default hyphenator for all new PangoLayouts.
   You can pass NULL to turn hyphenation off completely. */
void pango_layout_set_default_hyphenator (PangoHyphenator *hyphenator);



Use Cases
=========

 o Applications don't need to do anything to get standard hyphenation
   (once they've turned on the new layout code somehow).

 o If they want to add hyphenation exceptions for all layouts, they
   can change the default hyphenator:

   PangoHyphenator *hyphenator = pango_layout_get_default_hyphenator();
   PangoLanguage *language = pango_language_from_string ("en_GB");
   pango_hyphenator_add_exception_word (hyphenator, language,
                                        "ex-am-ple");

 o If they want to use different exceptions for different layouts,
   e.g. for different documents, they can create multiple hyphenators:

   PangoHyphenator *hyphenator = pango_hyphenator_new ();
   PangoLanguage *language = pango_language_from_string ("en_GB");
   pango_hyphenator_add_exception_word (hyphenator, language,
                                        "ex-am-ple");

   Then whenever they create a new PangoLayout for a particular document
   they can set the corresponding hyphenator:

   pango_layout_set_hyphenator (layout, hyphenator);

 o If they want to use custom hyphenation routines, they can set them
   on the default hyphenator or the individual hyphenators just like
   adding the exceptions above.


Widget writers developing complex text layout widgets should also
provide a way to set a hyphenator.


Damon




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]