ART_HACKPDF 1.4 transparency

From: Raph Levien <raph acm org>
To: libart-hackers gnome org
Subject: ART_HACKPDF 1.4 transparency
Date: Mon, 24 Jul 2000 14:35:53 -0700
[This is a copy of an email written to L. Peter Deutsch regarding
adding PDF 1.4 transparency to Ghostscript. It is likely to be of some
interest to Libart hackers, as I'm planning on supporting the full PDF
1.4 transparency model in Libart.]

   I promised I'd write you in more detail when I'd had a chance to
absorb the PDF 1.4 transparency spec:
http://partners.adobe.com/asn/developer/acrosdk/DOCS/PDF_Transparency.pdf

   I've just gone through another pass-through, and _still_ don't feel
I fully understand it, but have enough insight that we can start
productively talking about the impact on the gs code base.

   Let me start by expressing a few concerns. Patents top the list.
Adobe basically gives notice that there _will_ be patents covering PDF
1.4, and that no license for these patents is implied unless Adobe
publishes a Patent Clarification Notice. Pierre tells me (informally!)
that Adobe will probably grant any generic patents on the transparency
model, but will keep specific implementation techniques. This could be
an area of major concern - I believe the difference in speed (and
possibly quality) between a naive implementation and a clever one is
likely to be dramatic. Pierre also tells me that his patent was filed
around Christmastime. Thus, we're not likely to find out the status of
the patent situation for another year or two.

   My other concern is the inconsistency between different
implementations. In particular, an implementation based on
supersampling can produce quite different results than one performing
all rasterization at the device resolution, and simulating
antialiasing through alpha channel techniques. In fact, from playing
with AI9, it's clear that Adobe has implemented _both_ techniques. In
general, they use alpha-channel for screen display, and supersampling
for raster image export. It is not, in fact, at all difficult to see
visible differences, even when the "Pixel Preview" View option is
selected.

   So now I believe that the issue of "correctly implementing the
spec" is made considerably more complex - if you want to match Adobe's
behavior, you also have to match the supersample/alpha choice.

   Ok, now for the fun stuff. I'm not going to assume here that you've
read the spec in detail - even if so, I hope you may find this summary
helpful for purposes of intuition. The spec is not light beach reading
:)

   The basic rendering model is a tree, with "elementary objects" at
the leaves and groups everywhere else. The interpretation of this tree
is not simple - neither the model where a group is a function of its
children, nor the model of a canvas over which things are painted,
applies. In attribute grammar terminology, rendering requires both
synthesized and inherited attributes.

   For the purpose of our discussion, I think the most illuminating
model will be traversal of the tree while modifying a stack of
temporary images, with the obvious mapping between stack depth and
level in the tree. Each intermediate image has pixels composed of one
scalar per device color component, plus two alpha values, called
"shape" and "opacity". Note that the product of opacity and shape
corresponds directly to classical "alpha". The distinction becomes
important only in the case of knockout.

   Exactly what happens on tree pushing and popping depends on two
boolean flags: "isolated" and "knockout". Furthermore, all objects
(groups and elementary objects both) have an associated blending mode
that controls how the top-of-stack image gets composited over the
next-to-top-of stack. The blending function itself is function of two
colors, ie, result_color = blend (background_color,
foreground_color). It might be help your intuition to know that Normal
(bg, fg) = fg. Compositing extends the blending function in a more or
less expected manner to handle alpha. Thus, the compositing function
specialized to the Normal blend mode is exactly equivalent to
Porter-Duff "over".

   Note that the spec does _not_ specify the actual behavior the
blending modes. I don't remember our conversation well, but it seems
to me I win something for predicting that. In any case, I have
successfully reverse engineered all the separable blend functions to
+/- 2 pixel counts, so it's not a stumbling block.

   The simplest case is isolated and !knockout. In this case, pushing
the stack creates a new, empty image. After the children are then
composited to the top of the stack, popping the stack composites the
top-of-stack over the second-on-stack using the blending function
specified in the group.

   The !isolated and !knockout case involves a bit of cleverness. In
this case, pushing the stack dup's the top-of-stack image. Popping the
stack involves an _uncompositing_ operation, then recompositing using
the normal blending mode. More specifically, if the second-on-stack is
x and the tos is y, you solve $y = z over x$ for z, then composite z
over x using the blending function specified in the group. In the case
where the blending function and masks all reduce to Porter-Duff
"over", the uncompositing and recompositing cancel each other out. In
the case of other blend modes, interesting and nontrivial things
happen.

   I'm not going to define the knockout cases in detail here (they are
hairy). However, here is an outline for the purpose of intuition:
pushing the stack copies the tos to a temporary image buffer. Each
child is composited over this temporary image buffer, in a "sandbox"
(you can push the stack and use that as the "sandbox" if that helps
visualize it). Then, this result is composited over the tos using a
rather intricate compositing function defined in terms of both shape
and opacity. Effectively, instead of the images being composited in a
stack (ie, i3 over i2 over i1), each indvidual object is separately
composed over the backdrop (ie some composition of (i1 over backdrop),
(i2 over backdrop) and (i3 over backdrop)). Popping the stack then
works analogously to the !knockout cases.

   Efficiently implementing PDF 1.4 transparency is a very large
kettle of worms. A naive implementation would clearly be very
slow. The space of potential optimizations is large and diverse. Here
is a somwhat random list:

1. Having separate shape and opacity is not necessary in the common
   case. I believe that a simple and effective criterion is whether
   any ancestor in the group tree has the "knockout" option set. If
   not, alpha can be safely computed as a single channel.

2. Algebraic simplification. There are a whole buch of little things
   here - for example, in groups with Normal blend mode and no mask,
   you can usually just remove the enclosing push/pop. The same
   generally applies for groups with one object. Also, when the
   composition backdrop is known to be blank, much simpler math can be
   used.

3. Local simplification. For example, compositing a group in
   non-isolated mode with a Normal blending mode but nonunity opacity
   can be done in one shot rather than uncompositing and
   recompositing.

4. Region-based simplification. It is very common for objects to large
   regions of zero alpha, unity alpha, etc. Within these regions,
   compositing can be simplified dramatically. In areas of zero alpha,
   you don't have to touch the pixels at all, which is a _huge_ win.

5. Computations in vector space. If you have elementary objects of
   flat color, it may make sense to analytically split them up into
   venn-diagram subareas, then compute the transparency color only
   once per area. The advantage of this technique is that it scales up
   very nicely as rez goes up. The disadvantages are numerous,
   including numerical stability issues and the possibility for a
   combinatorial explosion of pieces. Yet, I do know that Adobe has
   implemented this for PostScript and pre-1.4 PDF output.

6. Runlength computations. Again, for regions of flat color, a
   runlength representation makes sense. This is simpler and lower
   level than computing in vector space.

7. Adaptive resolution rendering. Neither vector-space nor runlength
   computations will work in general for gradients. However,
   generating gradients at 2400 (or even 720) dpi and doing the
   blending pixel-by-pixel can be incredibly wasteful, when you can
   get almost exactly the same results computing the interior at
   dramatically reduced resolution.

8. Judicious choice of premultiplied vs. separated alpha. Separated
   alpha is fastest in the common case of compositing rgba images over
   an rgb buffer (similarly for cmyk). However, working with
   intermediate groups in Normal blend mode, premultiplied is going to
   be faster. For many operations (such as abNormal blend modes), it's
   probably a wash.

   Lastly, I've been thinking about supersampling. Consider three
different rendering strategies: no antialiasing, alpha-compositing
based aa, and supersampling. In the common case, many pixels will be
identical across all three versions.

   I hypothesize that the following criteria can predict when the
pixel values will be identical: for each object, compute a bitmap with
value 0 where there is no edge present inside the pixel, 1 where there
is an edge present. Then add all these bitmaps pointwise. For values
of 0, the three rendered images will match. For values of 1, the alpha
and supersampled images will match, but will in general differ from
the non-aa. For values of 2 or greater, none in general will match.

   If this hypothesis is true, then a valid rendering strategy would
be to render the entire image using alpha compositing, then rework all
pixels with >= 2 counts with supersampling. Treating the 0 case
specially is not likely to help much, as libart already fills in
non-edge pixels at memset speeds.

   It's interesting to note that Adobe Illustrator 9 appears to
implement supersampling naively. In all tests I've done, memory usage
and cpu seem to scale quite linearly with areal resolution (ie
quadratically with linear rez).

   Lastly, while some of these optimizations fit in nicely with the
existing PostScript code to traverse the PDF structure, others may
not. I think we need to carefully select a set of optimizations that
do a reasonable job simplifying the common case, while ideally not
affecting the structure of the gs code base too drastically.

Take care,

Raph
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]