Re: HarfBuzz API design

On 08/19/2009 02:45 PM, Carl Worth wrote:
Excerpts from Behdad Esfahbod's message of Tue Aug 18 16:23:50 -0700 2009:
[Warning: long email ahead]

My reply might bounce to unsubscribed-from lists. Feel free to forward
if you think there's anything important in here for others to read.

Thanks Carl.

_reference(), _destory(), and _get_reference_count().  At some point we may
want to add _[gs]et_user_data() also which is useful for language bindings.

That should be "destroy" not "destory", of course. I normally wouldn't
point out a typo in an email message, but I did see the same error in
a comment that might be copied from actual code:

Ouch.  Fixed the typo.

/* calls destory() when not needing user_data anymore */
hb_face_t *
hb_face_create_for_tables (hb_get_table_func_t  get_table,
                             hb_destroy_func_t    destroy,
                             void                *user_data);

So don't forget to spell-check your header files. :-)

Yeah, postponed that to when I add actual docs.

Humm, seems like s/writeable/writable/g is also needed. Stupid me. The geek inside me prefers writeable though.

typedef enum {
} hb_memory_mode_t;
    DUPLICATE: copy data right away and own it.

    READONLY: the data passed in can be kept for later use, but should not be
modified.  If modification is needed, the blob will duplicate the data lazily.

    WRITEABLE: data is writeable, use it freely.

    READONLY_NEVER_DUPLICATE: data is readonly and should never be duplicated.
   This disables operations needing write access to data.

    READONLY_MAY_MAKE_WRITEABLE: data is readonly but may be made writeable
using mprotect() or equivalent win32 calls.  It's up to the user to make sure
calling mprotect() or system-specific equivalents on the data is safe.  In
practice, that's never an issue on Linux and (according to Tor) on win32.

I don't think these names are quite right yet.

That's definitely one of the most arbitrary parts of the code, thanks for catching! I do think there's a slight confusion though, which if I clarify (as the docs will eventually will), the values make more sense. Look at it this way, the mode parameter describes the characteristics of the input data only. It's not a property of the blob itself, and for that reason there is no getter for it.

Looking at it that way, these values are accurate:


This one, not so much:


As it also describe a behavior of the blob itself. That value however is for very corner case uses (when you're just interested in knowing whether the data is sane and you want to avoid the cost of copying).

I think the thing that strikes me first as wrong is that if you create
a READONLY blob then it's still perfectly valid to write to it, (such
that you have to have a separate READONLY_NEVER_DUPLICATE for
"readonly---and I mean it).

More clear in the above context?

I also did some Hoffman encoding here. I initially had READONLY and READONLY_MAY_DUPLICATE, but figured that READONLY_MAY_DUPLICATE is what I want most users to use, and one should think twice before using READONLY. So I gave the short name to the common one and made the corner case use a longer name.

One difficulty is that you're capturing separate notions here,
(whether the data buffer is writeable vs. whether the created blob
should be writeable).

Yes and no indeed.

And there's some missing orthogonality too. For example, if I have
data that I don't want the blob to reference, (hence DUPLICATE), might
I not still want to create a blob in which all operations requiring
write access are disabled? That combination is not possible to express
from what I can see.

Initially I had that design in mind, with separate writeable. But I figured the extra control is not needed, no.

So, here's pseudocode trying to capture the decision-tree for choosing
one of the modes currently:

if (the data cannot be referenced by the blob)
else if (the data buffer can be written to)
	if (writing requires calling mprotect)
else if (the resulting blob should be readonly)

Again, lets forget about READONLY_NEVER_DUPLICATE, my mental model for this was:

if (I don't own the data)
else if (I malloced it)
else if (it's mmapped)
else /* ie.  it's shared with someone else, I don't have perms to modify it */

So perhaps what's actually desired here is a set of flags to express
each of those conditions independently? I don't have a good proposal
for what those flag names might be.

I found the flag approach even more confusing as one would then think about (and the docs have to explain) all different kind of combinations.

Although, having a name for the entire behavior is helpful, (rather
than just a conjunction of flags). So maybe we just need simpler names
to capture the behaviors of interest:

COPY: Data is immediately copied by the blob. Resulting blob supports
	write operations. [New name for DUPLICATE]

COPY_ON_WRITE: Data is referenced by the blob. Resulting blob support
	write operations which are implemented by lazily copying the
	data, (the original data is never modified). [New name for

READ_WRITE: Data is referenced by the blob. Resulting blob supports
	write operations which will directly modify original data.
	[New name for WRITEABLE. Of the new names I'm proposing, this
	is the one I like the least.]

READONLY: Data is referenced by the blob. Resulting blob does not
	support write operations. [New name for READONLY_NEVER_DUPLICATE]

MPROTECT: Data is referenced by the blob. Resulting blob supports
	write operations which are implemented by first calling
	mprotect or equivalent. [New name for
	READONLY_MAY_MAKE_WRITEABLE. Obviously this isn't an ideal
	name since it references an operating-system-specific
	feature. Suggestions welcome.]

I wanted to keep MPROTECT out of the name. Again, please think about it again without the mode determining whether the blob will be writeable or not. Just as describing the access mode of the input data only. Do you think the current enum makes sense that way?

I don't have much to say about the actual font-related parts of the
proposal. But I do think it's perhaps a bad idea to reuse the same
buffer for text input and glyph output. Is memory that tight? Might a
user not want to be able to query the text back out even after

Memory is not tight, no. Note that this is only about what the input/output to the hb_shape() function will be. Users already keep the original text in some location anyway, and the cluster mapping is what's used to map the glyphs back. All I'm doing here essentially is to let the user populate a hb-buffer and pass that to hb_shape(), instead of hb_shape() doing it.

The benefit of course is having one hb_shape() API, instead of hb_shape_utf8(), hb_shape_utf16(), and hb_shape_utf32().

Makes more sense?

Thanks again,


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]