HarfBuzz API design



[Warning: long email ahead]

Hi all,

With the rewritten HarfBuzz OpenType Layout engine merged in pango master now, I've been working on the public API for a few weeks. I have it mostly designed, though there are a few open questions still and I appreciate any feedback. The current code can be browsed here:

  http://git.gnome.org/cgit/pango/tree/pango/opentype

I will add a separate configure.ac to that directory in the coming days such that it can be built as a standalone library. In a couple of weeks I may even move it back out to its own git repo and use git magic to pull it in pango until we start using it as a shared library (expect end of year).


When designing HarfBuzz API my reference has been cairo. That is, usability has been the top priority. Other than that, hiding technical details while still being powerful enough to implement advanced features internally are other goals of the API.

In this mail I'll only discuss the backend-agnostic API, which is what I expect most users will use. This is what will be available by including "hb.h". For example, OpenType-specific APIs will be included in "hb-ot.h" only. That includes querying list of supported OpenType scripts, language systems, features, etc.

Finally, the other strict goal of the API is to be fully thread-safe. That means, I had to bit the bullet and add refcounting API already. Object lifecycle API is like cairo's, that is, each object has: _create(), _reference(), _destory(), and _get_reference_count(). At some point we may want to add _[gs]et_user_data() also which is useful for language bindings.

Error handling is also designed somewhat like cairo's, that is, objects keep track of failure internally (including malloc failures), but unlike cairo, there's no direct way to query objects for errors. HarfBuzz simply does its best to get you the result you wanted. In case of errors, the output may be wrong, but there's nothing you can do to improve it. There's not much point in reporting that state anyway. So, no error handling in the API.

Before jumping into the API, lemme introdce a memory management construct I added first:


Blobs
=====

hb_blob_t is a refcounted container for raw data and is introduced to make memory management between HarfBuzz and the user easy and flexible. Blobs can be created by:

typedef enum {
  HB_MEMORY_MODE_DUPLICATE,
  HB_MEMORY_MODE_READONLY,
  HB_MEMORY_MODE_WRITEABLE,
  HB_MEMORY_MODE_READONLY_NEVER_DUPLICATE,
  HB_MEMORY_MODE_READONLY_MAY_MAKE_WRITEABLE,
} hb_memory_mode_t;

typedef struct _hb_blob_t hb_blob_t;

hb_blob_t *
hb_blob_create (const char        *data,
                unsigned int       length,
                hb_memory_mode_t   mode,
                hb_destroy_func_t  destroy,
                void              *user_data);

The various mode parameters mean:

  DUPLICATE: copy data right away and own it.

READONLY: the data passed in can be kept for later use, but should not be modified. If modification is needed, the blob will duplicate the data lazily.

  WRITEABLE: data is writeable, use it freely.

READONLY_NEVER_DUPLICATE: data is readonly and should never be duplicated. This disables operations needing write access to data.

READONLY_MAY_MAKE_WRITEABLE: data is readonly but may be made writeable using mprotect() or equivalent win32 calls. It's up to the user to make sure calling mprotect() or system-specific equivalents on the data is safe. In practice, that's never an issue on Linux and (according to Tor) on win32.


One can also create a sub-blob of a blob:

hb_blob_t *
hb_blob_create_sub_blob (hb_blob_t    *parent,
                         unsigned int  offset,
                         unsigned int  length);

Blob's data can be used by locking it:

const char *
hb_blob_lock (hb_blob_t *blob);


One can query whether data is writeable:

hb_bool_t
hb_blob_is_writeable (hb_blob_t *blob);

Can request it to be writeable inplace:

hb_bool_t
hb_blob_try_writeable_inplace (hb_blob_t *blob);

Or can request making data writeable, making a copy if need be:

hb_bool_t
hb_blob_try_writeable (hb_blob_t *blob);

For the latter the blob must not be locked. The lock is recursive. The blob internal stuff is protected using a mutex and hence the structure is threadsafe.

The main use of the blob is to provide font data or table data to HarfBuzz. More about that later.



Text API
========

Perhaps the biggest difference between the API of the old Qt-based HarfBuzz shaper API and the new one is that the new API reuses hb-buffer for its shaping input+output. So, this is how you will use harfbuzz in three lines:

  - create buffer
  - add text to buffer    ---> buffer contains Unicode text now
  - call hb_shape() on it
  - use output glyphs     ---> buffer contains positioned glyphs now

Within that picture, there are three main objects in HarfBuzz:

  - hb_buffer_t: holds text/glyphs, and is not threadsafe

- hb_face_t: represents a single SFNT face, fully threadsafe beyond construction. Maps to cairo_font_face_t.

- hb_font_t: represents a face at a certain size with certain hinting options, fully threadsafe beyond construction. Maps to cairo_scaled_font_t.


Buffer
======

The buffer's output is two arrays: glyph infos and glyph positions. Eventually these two items will look like:

typedef struct _hb_glyph_info_t {
  hb_codepoint_t codepoint;
  hb_mask_t      mask;
  uint32_t       cluster;
  uint16_t       component;
  uint16_t       lig_id;
  uint32_t       internal;
} hb_glyph_info_t;

typedef struct _hb_glyph_position_t {
  hb_position_t  x_pos;
  hb_position_t  y_pos;
  hb_position_t  x_advance;
  hb_position_t  y_advance;
  uint32_t       internal;
} hb_glyph_position_t;


One nice thing about using hb-buffer for input is that we can now easily add UTF-8, UTF-16, and UTF-32 APIs to HarfBuzz by simply implementing:

void
hb_buffer_add_utf8 (hb_buffer_t  *buffer,
                    const char   *text,
                    unsigned int  text_length,
                    unsigned int  item_offset,
                    unsigned int  item_length);

void
hb_buffer_add_utf16 (hb_buffer_t    *buffer,
                     const uint16_t *text,
                     unsigned int    text_length,
                     unsigned int    item_offset,
                     unsigned int    item_length);

void
hb_buffer_add_utf32 (hb_buffer_t    *buffer,
                     const uint32_t *text,
                     unsigned int    text_length,
                     unsigned int    item_offset,
                     unsigned int    item_length);

These add individual Unicode characters to the buffer and set the cluster values respectively.


Face
====

HarfBuzz is build around the SFNT font format. A Face simply represents a SFNT face, although this is all transparent to the user: you can pass junk to HarfBuzz as font data and it will simply ignore it. There are two main face constructors:

hb_face_t *
hb_face_create_for_data (hb_blob_t    *blob,
                         unsigned int  index);

typedef hb_blob_t * (*hb_get_table_func_t)  (hb_tag_t tag, void *user_data);

/* calls destory() when not needing user_data anymore */
hb_face_t *
hb_face_create_for_tables (hb_get_table_func_t  get_table,
                           hb_destroy_func_t    destroy,
                           void                *user_data);

The for_tables() version uses a callback to load SFNT tables, whereas the for_data() version takes a blob which contains the font file data, plus the face index for TTC collections.


The face is only responsible for the "complex" part of the shaping right now, that is, OpenType Layout features (GSUB/GPOS...). In the future we may also access cmap directly. Not implemented right now, but old-style 'kern' table will also be implemented in the same layer.

The reason for introducing the blob machinery is that the new OpenType Layout engine and any other table work we'll add use the font data directly, instead of parsing it into separate data structures. For that reason, we need to "sanitize" the font data first. When sanitizing, instead of pass/fail, upon finding irregularities (say, an offset that points to out of the table), we may modify the font data to make it clean-enough to pass to the layout code. In those cases, we first try to make the blob writeable in place, and if that fails, to make a writeable dup of it. That is, copy-on-write easy or the hard way. For sane fonts, this means zero per-process memory is consumed. In the future, we'll cache sanitize() results in fontconfig such that not every process has to sanitize() clean fonts.



Font
====

Normally I would have made the font constructor take a hb_face_t (like cairo's does indeed). A font is a face at a certain size and with certain hinting / other options afterall. However, FreeType's lack of refcounting makes this really hard. The reason being: Pango caches hb_face_t on the FT_Face instance's generic slot. Whereas a hb_font_t should be attached to a PangoFont or PangoFcFont.

As everyone knows, FT_Face is not threadsafe, is not refcounted, and is not just a face, but also includes sizing information for one font at a time. For this reasons, whenever a font wants to access a FT_Face, it needs to "lock" one. When you lock it though, you don't necessarily get the same object that you got the last time. It may be a totally different object, created for the same font data, depending on who manages your FT_Face pool (cairo in our case). Anyway, for this reason, having hb_font_t have a ref to hb_face_t makes life hard: one either would have to create/destroy hb_font_t between FT_Face lock/unlock, or risk having a hb_face_t pointing to memory owned by a FT_Face that may have been freed since.

For the reasons above I opted for not refing a face from hb_font_t and instead passing both a face and a font around in the API. Maybe I should use a different name (hb_font_scale_t?) I'd rather keep names short, instead of cairo style hb_font_face_t and hb_scaled_font_t.

Anyway, a font is created easily:

hb_font_t *
hb_font_create (void);

One then needs to set various parameters on it, and after the last change, it can be used from multiple threads safely.


Shaping
=======

The main hb_shape() API I have written down right now (just a stub) is:

typedef struct _hb_feature_t {
  const char   *name;
  const char   *value;
  unsigned int  start;
  unsigned int  end;
} hb_feature_t;

void
hb_shape (hb_face_t    *face,
          hb_font_t    *font,
          hb_buffer_t  *buffer,
          hb_feature_t *features,
          unsigned int  num_features);

where features are normally empty, but can be used to pass things like:

  "kern"=>"0"         -------> no kerning
  "ot:aalt"=>"2"      -------> use 2nd OpenType glyph alternative
  "ot:mkmk"=>"0"      -------> never apply 'mkmk' OpenType feature

Perhaps:

  "ot:script"=>"math" ------> Force an OpenType script tag
  "ot:langsys"=>"FAR " -----> Force an OpenType language system

Maybe:

  "ot"=>"0"           ------> Disable OpenType engine (prefer AAT, SIL, etc)

Or perhaps even features marking visual edge of the text, etc.





Discussion
==========


Script and language
===================

Normally the shape() call needs a few more pieces of information. Namely: text direction, script, and language. Note that none of those belong on the face or font objects. For text direction, I'm convinced that it should be set on the buffer, and already have that in place.

For script and language, it's a bit more delicate. I'm also convinced that they belong to the buffer. With script it's fine, but with language it introduces a small implementation hassle: that I would have to deal with copying/interning language tags, something I was trying to avoid. The other options are:

- Extra parameters to hb_shape(). I rather not do this. Keeping details like this out of the main API and addings setters where appropriate makes the API cleaner and more extensible.

- Use the feature dict for them too. I'm strictly against this one. The feature dict is already too highlevel for my taste.

So, comments here is appreciated.



Unicode callbacks
=================

HarfBuzz itself does not include any Unicode character database tables, but needs access to a few properties, some of them for fallback shaping only. Currently I have identified the following properties as being useful at some point:

typedef hb_codepoint_t
(*hb_unicode_get_mirroring_func_t) (hb_codepoint_t unicode);

Needed to implement character-level mirroring.


typedef hb_category_t
(*hb_unicode_get_general_category_func_t) (hb_codepoint_t unicode);

Used for synthesizing GDEF glyph classes when the face doesn't have them.


typedef hb_script_t
(*hb_unicode_get_script_func_t) (hb_codepoint_t unicode);

Not needed unless we also implement script itemization (which we can do transparently, say, if user passed SCRIPT_COMMON to the shape() function).


typedef unsigned int
(*hb_unicode_get_combining_class_func_t) (hb_codepoint_t unicode);

Useful for all kinds of mark positioning when GPOS is not available.


typedef unsigned int
(*hb_unicode_get_eastasian_width_func_t) (hb_codepoint_t unicode);

Not sure it will be useful in HarfBuzz layer. I recently needed to use it correctly set text in vertical direction in Pango.


I've added an object called hb_unicode_funcs_t that holds all these callbacks. It can be ref'ed, as well as copied. There's also a hb_unicode_funcs_make_immutable() call, useful for libraries who want to give out references to a hb_unicode_funcs_t object they own but want to make sure the user doesn't modify the object by mistake.

The hb-glib.h layer then implements:

hb_unicode_funcs_t *
hb_glib_get_unicode_funcs (void);


The question then is where to pass the unicode funcs to the shape() machinery. My current design has it on the face:

void
hb_face_set_unicode_funcs (hb_face_t *face,
                           hb_unicode_funcs_t *unicode_funcs);

However, that is quite arbitrary. There is nothing in the face alone that requires Unicode functionality. Moreover, I want to keep the face very objective. Say, you should be able to get the hb_face_t from whoever provides you with one (pango, ...), and use it without worrying about what settings it has. The Unicode funcs, while welldefined, can still come from a variety of sources: glib, Qt, Python's, your own experiments, ...

I started thinking about moving that to the buffer instead. That's the only other place that Unicode comes in (add_utf8/...), and the buffer is the only object that is not shared by HarfBuzz, so user has full control over it.

One may ask why have the callbacks settable to begin with. We can hardcode them at build time: if glib is available, use it, otherwise use our own copy or something. While I may make it to fallback to whatever has been available at compile time, I like being able to let user set the callbacks. At least until I write one UCD library to rule them all... /me grins

So that's another question I need feedback about.


Font callbacks
==============

These are the font callbacks (font class, font funcs, ...) that I've prototyped. Note that both the font, face, and a user_data parameter are passed to all of them. Some of these callbacks technically just need a face, not font, but since many systems implement these functions on actual fonts not faces, we implement it this way. Right now one can set the hb_font_callbacks_t object on the hb-font and set user_data there (hb_font_set_funcs()).


typedef hb_codepoint_t
(*hb_font_get_glyph_func_t) (hb_font_t *font, hb_face_t
                             *face, const void *user_data,
                             hb_codepoint_t unicode,
                             hb_codepoint_t variant_selector);

This is the cmap callback. Note the variant_selector: it supports new cmap14 tables. For older clients, they can ignore that argument and do the mapping. We probably will implement support for Unicode cmaps internally, but chain to this function for missing-glyphs or if no suitable cmap was found. That has three advantages:

- Pango etc can pass whatever code they want for missing glyphs, to use later to draw hexbox,

- Pango, through fontconfig, knows how to handle non-Unicode cmaps, so that will continue to work,

- For non-SFNT fonts, HarfBuzz should happily sit back and make things work still, this is how that will work.


typedef hb_bool_t
(*hb_font_get_contour_point_func_t) (hb_font_t *font, hb_face_t *face,
                                     const void *user_data,
                                     hb_codepoint_t glyph,
                                     hb_position_t *x, hb_position_t *y);

Needed for complex GPOS positioning. Pango never did this before. Pretty straightforward, just need to make it clear the space that the positions are returned in. I'll discuss that in the next section.


typedef void
(*hb_font_get_glyph_metrics_func_t) (hb_font_t *font, hb_face_t *face, const
                                     void *user_data, hb_codepoint_t glyph,
                                     hb_glyph_metrics_t *metrics);

This one is a bit more tricky. Technically we just need the advance width. The rest of the metrics are only used for fallback mark positioning. So maybe I should split this in a get_glyph_advance and a full get_glyph_metrics one. Current HarfBuzz has a single call to get advance width of multiple glyphs. If that kind of optimization deems necessary in the future, we can add a callback to take an entire buffer and set the advances.

There are more issues here though:

1) The metrics struct most probably should be public. However, in the future I like to use bearing-deltas to improve positioning. A transparent struct doesn't help in those situations. Not sure what the alternatives are.

2) It's not exactly clear how to deal with vertical fonts. One way would be to assume that if buffer direction is vertical, then the font already knows that and returns the vertical metrics. That's not a completely off assumption, though that may not be how win32 fonts work?


typedef hb_position_t
(*hb_font_get_kerning_func_t) (hb_font_t *font, hb_face_t *face,
                               const void *user_data,
                               hb_codepoint_t first_glyph,
                               hb_codepoint_t second_glyph);

Again, most probably we will read 'kern' table internally anyway, but this can be used for fallback with non-SFNT fonts. You can even pass, say, SVG fonts through HarfBuzz such that the higher level just deals with one API.



Another call that may be useful is a get_font_metrics one. Again, only useful in fallback positioning. In that case, ascent/descent as well as slope come handy.


Font scale, etc
===============

Currently, based on the old code, the font object has the following setters:

void
hb_font_set_scale (hb_font_t *font,
                   hb_16dot16_t x_scale,
                   hb_16dot16_t y_scale);

void
hb_font_set_ppem (hb_font_t *font,
                  unsigned int x_ppem,
                  unsigned int y_ppem);

The ppem API is well-defined: that's the ppem to use for hinting and device-dependent positioning. Old HarfBuzz also had a "device-independent" setting, which essentially turned hinting off. I've removed that setting in favor of passing zero as ppem. That allows hinting in one direction and not the other. Unlike old HarfBuzz, we will do metrics hinting ourselves.

The set_scale() API is modeled after FreeType, but otherwise very awkward to use. There are four different spaces relevant in HarfBuzz:

- Font design space: typically a 1024x1024 box per glyph. The GPOS and 'kern' values are in this space. This maps to the EM space by a per-face value called upem (units per em).

  - EM space: 1em = 1em.

- Device space: actual pixels. The ppem maps EM space to this space, if such a mapping exists.

- User space: the user expects glyph positions in this space. This can be different from device space (it is, for example if you use cairo_scale()). Current/old pango ignore this distinction and hence kerning doesn't scale correctly [1].


Now, what the hb_font_set_scale() call accepts right now is a 16.16 pair of scales mapping from font design space to device space. I'm not sure, but getting that number from font systems other than FreeType may actually be quite hard. The problem is, upem is an implementation detail of the face, and the user shouldn't care about it.

So my proposal is to separate upem and make it a face property. In fact, we can read upem from OS/2 SFNT table and assume a 1024 upem for non-SFNT fonts (that's what Type1 did IIRC). In fact, we wouldn't directly use upem for non-SFNT fonts right now.

Then the scale would simply need to map EM space to device space. But notice how that's the same as the ppem. But then again, we really just care about user space for positioning (device space comes in only when hinting). So, set_scale should be changed to accept em-to-user-space scale. Not surprisingly, that's the same as the font size in the user-space.

Another problem I would need to solve here is, cairo allows a full matrix for device-to-user space. That is, glyphs can be rotated in-place for example. That's what we use to implement vertical text. I'm inclined to also adding a full-matrix setter. The behavior would be:

- If (1,0) maps to (x,y) with nonzero y, then many kinds of positioning should be completely disabled,

- Somehow figure out what to do with vertical. Not sure right now, but it should be ok detecting if the font is 90-degree rotated and compensate for that.


In that model however, I wonder how easy/hard would it be for callbacks to provide requested values (contour point, glyph metrics, etc) in the user space. For cairo/pango I know that's actually the easiest thing to do, anything else would need conversion, but I'm not sure about other systems. An alternative would be to let the callbacks choose which space the returned value is in, so we can map appropriately.



I guess that's it for now.  Let discussion begin.  Thanks for reading!

behdad

[1] http://bugzilla.gnome.org/show_bug.cgi?id=341481


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]