Re: Putback thai.c to CVS

From: Owen Taylor <otaylor redhat com>
To: gtk-i18n-list gnome org
Cc: Chookij Vanatham <chookij vanatham eng sun com>
Subject: Re: Putback thai.c to CVS
Date: 14 Nov 2000 13:15:26 -0500
This looks good, but there are a few things I'd like to see fixed
before it is committed to CVS.

I think it is very important that each module have a consistent
identation style. This is for two reasons:

    - Improve legibility
    - Prevent excessive diffs as each person editing the 
      file reformats it to their own taste.

I'd strongly encourage using the Pango coding style for this
(pango/docs/TEXT/coding-style) for this, but it is only mandatory for
the Pango library and examples.

The thai.c module previously followed the Pango guidelines; if you
need to use something else, please make the whole file use that
consistently. Currently, there are places where within in a single
block there are two different indentation levels...


Additional detailed comments on parts of the file:

   * Copyright (C) 1999 Red Hat Software

Since you've added significant amounts of code, you should add the
appropriate copyright notice here. ("Copyright 2000, Sun
Microsystems", or whatever.) Adding or not adding the copyright notice
does not change the copyright status of the code, but it makes it
clear who the copyright holders of the code are.

  #include <iconv.h>
[...]
  #include <fribidi/fribidi.h>

The includes of iconv and fribidi were never necessary for thai.c,
and actually now will cause compilation errors since I've removed the
direct dependencies here.

 
  #define	NoTailCons		_NC
  #define	UpTailCons		_UC
  #define	BotTailCons		_BC
  #define	SpltTailCons		_SC
  #define	Cons			(NoTailCons|UpTailCons|BotTailCons|SpltTailCons)
  #define AboveVowel		_AV
  #define BelowVowel		_BV
  #define	Tone			_TN
  #define AboveDiac		_AD
  #define BelowDiac		_BD
  #define	SaraAm			_AM

It seems that these #defines are meant to be legible and the _NC-style
short forms just meant for the table; I think it would make the code
more legible if you used the long #defines everywhere but the table.

  /* Returns a structure with information we will use to rendering given the
   * #PangoFont. This is computed once per font and cached for later retrieval.
   */
  static ThaiFontInfo *
  get_font_info (PangoFont *font)
  {
    static const char *charsets[] = {
      "xtis620.2529-1",
      "xtis-0",
      "tis620.2533-1",
      "tis620.2529-1",
      "iso8859-11",
      "tis620-0",
      "tis620-1",
      "tis620-2",
      "iso10646-1",
    };

Is the ordering here still right? I would expect that tis620-1 and tis620-2
should go at the top, since they should provide as good results as the 
XTIS fonts, but with a more compact set of glyphs. They certainly will
provide better results than the plain-tis fonts, so should be preferred
to them.

(The reason I decided to do a XTIS shaper was mostly because it made a
nice simple example, not because I really think people should have
fully-precomposed XTIS fonts...)

        switch (num_chrs) {
          case 1:
            if (IsChrType(cluster[0], _BV|_BD|_AV|_AD|_TN)) {
                glyph_lists[0] =
                          PANGO_X_MAKE_GLYPH (font_info->subfont, 0x7F);
                glyph_lists[1] =
                          PANGO_X_MAKE_GLYPH (font_info->subfont,
                                              ucs2tis(cluster[0]));
                return 2;

What do these fonts have in position 0x7f? If it is a character with negative
advance width, I'm not sure Pango will handle this properly, but from
the context it also seems like it might be a blank character of the width
of the cell.

  /* Return the glyph code within the font for the given Unicode Thai 
   * code pointer
   */
  get_glyph (ThaiFontInfo * font_info, gunichar wc)

The return value for this function seems to have been lost, but since
you no longer are using it, it should just be removed.

  gboolean
  IsWttComposible(gint cur_wc, gint nxt_wc)

This function should be static. Keeping exported symbols to a minimum
prevents various problems.

  static char *
  g_utf8_get_next_cluster(const char	*text,
                          gint		length,
                          gunichar	**cluster,
                          gint		*num_chrs)

I think this name is less than ideal - since it is a static function it doesn't
need to be namespaced. It definitely shouldn't be namespaced within
the namespace of GLib UTF-8 manipulation functions. get_next_cluster()
would be fine.

    if ((text == NULL) || (length == 0) ||
        ((text) && (*text == '\0')) ) {
        if (*num_chrs)
            *num_chrs = 0;
        if (cluster)
            cluster[0] = 0;
        return (char *)NULL;
    }

If this check was ever hit, it would cause really bad results, since you
are running this function within a loop

    while (p < text + length)
      {
        gunichar cluster[MAX_CLUSTER_CHRS];
        gint num_chrs;

        log_cluster = p;
        p = g_utf8_get_next_cluster(p, text + length - p, &cluster, &num_chrs);
        add_cluster(font_info, glyphs, log_cluster - text, cluster, num_chrs);
      }

So having get_next_cluster return NULL is not a good idea. However,
the check will never be hit since a shaper is allowed to assume that
it will be called with a non-NULL text with an accurate length. 

In general, I would tend to avoid this sort of "something went wrong,
let me silently ignore it" check. Either you should simply assume that
your arguments or valid, or if you check their validity you should do
so in a way that you get useful debugging information if the check
fails - the g_assert() and g_return_if_fail() macros are useful for
this.

    p = text;
    if (cluster)
        cluster[0] = g_utf8_get_char(p);

Again, there is no point in trying to behave properly if someone calls
you with cluster == NULL. Either assume that the arguments are right
(after all, the function is called only once from immediately below)
or put at the top of the function:

 g_assert (cluster != NULL);

What I expect you were copying here is that for public functions we
generally allow the pointer passed in for out arguments to be NULL;
however, that doesn't really apply to a static "helper" function that
occurs only in one place.

    p = g_utf8_next_char(p);
    do {
      if (p >= text + length)
          break;
      if (cluster)
          cluster[nChrs] = g_utf8_get_char(p);
      if (IsWttComposible(cluster[nChrs - 1], cluster[nChrs])) {
          nChrs++;
          if (nChrs == 3)
              ClusterNotFound = FALSE;
          p = g_utf8_next_char(p);
      } else {
          if ((nChrs == 1) &&
              IsChrType(cluster[nChrs - 1], Cons) &&
              IsChrType(cluster[nChrs], SaraAm) ) {
              nChrs = 2;
              p = g_utf8_next_char(p);
          } else if ((nChrs == 2) &&
                     IsChrType(cluster[nChrs - 2], Cons) &&
                     IsChrType(cluster[nChrs - 1], Tone) &&
                     IsChrType(cluster[nChrs], SaraAm) ) {
                     nChrs = 3;
                     p = g_utf8_next_char(p);
          }
          ClusterNotFound = FALSE;
      }
    } while (ClusterNotFound);

I find the above a little confusing with the update of p and nChrs all
over the place; can't you do something like the following?

  while (p < text + length && n_chars < 3)  
    {
      gunichar current = g_utf8_get_char (p);
      
      if (n_chars = 0 ||
	  IsWttComposible (cluster[n_chars - 1], current) ||
	  (nChrs == 1 &&
	   IsChrType (0, Cons) && 
	   IsChrType (current, SaraAm)) ||
	  (nChrs == 2 &&
	   IsChrType (cluster[0], Cons) &&
	   IsChrType (cluster[1], Tone) &&
	   IsChrType (current, SaraAm)))
	{
	  cluster[n_chars++] = current;
	  p = g_utf8_next_char (p);
	}
      else
	break;
    }

[...]

  /* Load a particular engine given the ID for the engine
   */
  PangoEngine *
  MODULE_ENTRY(script_engine_load) (const char *id)
  {
    if (!strcmp (id, "ThaiScriptEngineLang")) {
      return thai_engine_lang_new ();
    } else if (!strcmp (id, "ThaiScriptEngineX")) {
      return thai_engine_x_new ();
    } else {
      return NULL;
    }
  }

As a minor quibble, I'm not sure what the motivation was for adding
the braces here; generally keeping non-significant changes to a
minimum helps keep code review simpler.


Anyways, the changes look sensible, though I certainly haven't
verified the MAC/WIN shaping rules in detail. With a few cleanups as
noted above, I don't see any problem with replacing the existing
thai.c with it.

Regards,
                                        Owen
References:
- Putback thai.c to CVS
  - From: Chookij Vanatham
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]