Re: [HarfBuzz] HarfBuzz API design

From: Behdad Esfahbod <behdad behdad org>
To: Adam Twardoch <list adam twardoch com>
Cc: Martin Hosken <martin_hosken sil org>, Carl Worth <cworth cworth org>, "gtk-i18n-list gnome org" <gtk-i18n-list gnome org>, Harfbuzz <harfbuzz lists freedesktop org>
Subject: Re: [HarfBuzz] HarfBuzz API design
Date: Wed, 19 Aug 2009 17:04:24 -0400

On 08/19/2009 04:15 PM, Adam Twardoch wrote:

I think it would be useful to have a helper function akin to Microsoft's
ScriptItemizeOpenType()* that breaks a Unicode string into individually
shapeable items (runs) and provides an array of feature tags for each
shapeable item for OpenType processing.

Thanks Adam. So far the focus has been to unify the shaping logic (mostimportant for Indic). Itemization, while pretty well defined, is somethingeveryone does slightly differently. It requires:


  - Applying Unicode Bidi Algorithm,

  - Script tagging heuristic for Script=Common characters

  - Language tagging heuristic

  - Font assignment

Except for the first item which is well-defined by Unicode, the other stepsare less well-defined and different usecases require slightly differentsolutions. For example, web browsers have very strict font assignment rulesthat follow the CSS spec. Other applications, less so. It would be harder tojustify using a unified itemizer. At least initially. But yes, that's one ofthe logical next steps.


behdad

* http://msdn.microsoft.com/en-us/library/dd368557%28VS.85%29.aspx

Adam

Behdad Esfahbod wrote:

On 08/19/2009 02:57 AM, Martin Hosken wrote:

Dear Behdad,

I feel that this is the core of the API since it specifies what inputs and outputs harfbuzz works with (particularly outputs).


Hi Martin,

Yes, hb_shape() and the hb_glyph_info_t are essentially the core of the API.

typedef struct _hb_glyph_info_t {
     hb_codepoint_t codepoint;
     hb_mask_t      mask;
     uint32_t       cluster;
     uint16_t       component;
     uint16_t       lig_id;
     uint32_t       internal;
} hb_glyph_info_t;

I may have misinterpretted but mask, lig_id and probably component, feel to be OT specific in that a consumer of the output is unlikely to ever need them.


Yes and no.  Mask is used to mark which user features should be applied to
which glyphs, and I think at least AAT can/will use that too.  For lig_id and
component, they are not inherently OT-specific.  They are implementation
details of how HarfBuzz implements the OT spec.  We may decide to hide them
too, and just have another internal member.  Individual shapers can use the
internal members as they wish then.  That's actually a good idea.  Unless I
find a use for the client having access to those values, it better be hidden.
   I'll make that change now.

I'm thinking about adding a some other fields here though (without changing
the size).  Things like justification points, etc.

The disadvantage I see with having a single buffer that changes its contents from chars to glyphs is that then you lose the association map between underlying chars and glyphs. I suppose it can be recreated using the component information, but it's going to be problematic when it comes to cursor hit testing.


The decision is only relevant inside the hb_shape() call.  The user has the
original text still.  Please see the last part of my reply to Carl Worth.

For script and language, it's a bit more delicate.  I'm also convinced that
they belong to the buffer.  With script it's fine, but with language it
introduces a small implementation hassle: that I would have to deal with
copying/interning language tags, something I was trying to avoid.  The other
options are:

     - Extra parameters to hb_shape().  I rather not do this.  Keeping details
like this out of the main API and addings setters where appropriate makes the
API cleaner and more extensible.

     - Use the feature dict for them too.  I'm strictly against this one.  The
feature dict is already too highlevel for my taste.

Why do you say the feature dict is too high level? It seems just the right place, to me. Or it could be stored in the buffer, since it is buffer specific.


It's just not as efficient and easy to use as I like.  But it's just fine for
user features, yes.

One question: is a buffer representing a single run for which the language doesn't change or is it potentially multiple runs that are yet to be segmented?


The way I'd recommend using it is for one run.  The API already limits it to
one font anyway.  Doesn't mean we can't add API to do multiple runs in the
future though.

behdad

Yours,
Martin

_______________________________________________
HarfBuzz mailing list
HarfBuzz lists freedesktop org
http://lists.freedesktop.org/mailman/listinfo/harfbuzz

References:
- HarfBuzz API design
  - From: Behdad Esfahbod
- Re: [HarfBuzz] HarfBuzz API design
  - From: Behdad Esfahbod

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]