Re: gdk_pixbuf interpolation tables



Hi Brian,

Not having touched this code in a while, I had to spend some
time going back through the code and the thought processes
behind it. While doing that I wrote up some of the "math"
behind it:

 http://people.redhat.com/otaylor/gtk/pixbuf-transform-math.ps

It's nothing very deep... just a little bit of symbol juggling
really. But it makes it a little more obvious why 
the filters we are actually separable.

I've also documented in there what the filters are that are being
used.

The observation that all of the filters we use now are separable
is an important one. (I had some inkling of this before as shown
by how I defined the "BILINEAR" filter, but never really made the
generalization). It certainly does reduce the complexity and
expense of  calculating the tables a lot, and anything that makes
pixops.c shorter and simpler (not to mention faster) is a good thing.

Still, I wonder if pursuing this is really right in the long
term:

 - Once calculating the filter tables gets to be a big
   bottleneck, we've already are in a regime where our
   algorithms aren't a good idea.

   A well known defect of the current pixops code is that
   it performs horribly for large scale down factors. 
   Say, you are scaling a 1024x768 image to 64x48 for a
   thumbnail. Then n_x = n_y = 9, and we compute

    9*9*16*16 == 20736

   filter coefficiencts. If the user hits the zoom-out button
   a few more times, and the program tries to scale the
   image down to 1x1, then we have:

     1025x1025x16x16 = 279M 

   filter coefficients. Oops!

   Of course, if we can compute the coefficients faster, 
   we're better off, but not that much better off.

 - The traditional 3D graphics way of doing downscaling
   is to create a mipmap of scaled down images and then
   do bilinear or trilinear interpolation from there.
   Scaling down by a factor of two is something that 
   we can code to be really fast, so this isn't unreasonable
   even if we have to generate the mipmap on the fly.

   The main quality defect of this approach is when you
   are scaling very by very different amounts in the two
   directions, you lose detail. But anisotropic scaling is
   going to be less common for our uses than it is in 3D
   graphics since we aren't drawing lots of patterned floors 
   receding off into the distance.

   You can also (like current 3D cards) use a larger
   filter on top of the MIP-map to deal with the anisotropic
   case.

 - We get fairly frequent requests for rotation support
   in gdk-pixbuf. When you switch from a simple scale, to
   an arbitrary affine transform, the filter tables are 
   no longer separable, and also aren't easily calculatable
   analytically. So, a table based approach is a lot less
   attractive.

So, I'm wondering if a complete rewrite of pixops.c to 
a mipmap'ed approach may be the right long term approach.

A couple of detailed comments on your mail.

> 2. Separated tables can save memory.  It is only necessary to build
>    the full 2d tables as they are accessed.  For example, most of
>    the pixops_process functions deal with 1 line of tables.  Therefore
>    the logic could easily be modified to only build each line of 
>    tables as needed.  This has no extra CPU cost, but reduces memory
>    used.  I did not yet implement this change, however I think it is
>   clear that this would be fairly straightforward to do.

Building the tables on a as-needed bases doesn't save you much
on "average", since for almost all scale factors, you'll need
all the horizontal and vertical phases. Though the cases
where the scale-down factor is a small integer fraction are
certainly more common than others. (That is, to scale from
200 to 160, the scale factor is 5:4, so we only need 4 phases
0, 0.25, 0.50, 0.75 rather than all 16 phases.)

I think it probably would work best to compute the needed phases
in advance and pregenerate them rather than doing it on-demand,
since it avoids complexity in the inner horizontal loop.

I'm a little wary of an approach, however, that works well
for scaling from 256x256 to 64x64 but suddenly takes much
longer and much more memory to scale to, say, 63x63.

If we stick with a table based approach for all scale factors,
one thing we might want to considers is simply computing each 
2-D filter coefficient as needed. If we convert the 1-D tables
to integers and normalize them to sum correctly, than this
should only require 1 multiplication per source pixel
read, which isn't a huge deal. (We are doing ~20 arithmetic 
operations per destination pixel for the compositing.)
   
> 3. The need to include overall_alpha in the structure is that it
>    indicates what the tables sum to.  The vectors will always sum
>    to overall_alpha.  This information is needed for later processing
>    (correcting totals for 2d integer tables, or when scaling only
>    in one dimension - see below for more information about this).

[...]

> 2. I put much thought into what to do with overall_alpha. 
>    Spreading overall_alpha evenly over both dimensions is correct for
>    all situations except for when scaling an image in one dimension
>    only.  In this case you would obviously want to multiply that
>    dimension by overall_alpha (when overall_alpha != 1.0), instead of
>    sqrt(overall_alpha).  Note that GTK+ currently doesn't support
>    scaling in one dimension, but setting up the vectors in this
>    fashion takes into account the future evolution of the code.
>    
>    I ended up deciding that the cleanest solution is to evenly 
>    distribute overall_alpha across the x/y dimension.  The one corner
>    case mentioned above can easily be handled by multipling the 
>    one vector used by the additional constant of sqrt(overall_alpha).
>    
>    An alternative solution that is equally correct is to force the
>    *make-weights functions to *always* return tables that sum to 1.0
>    (rather than to overall_alpha).  And then move the multiplication of
>    overall_alpha to the conversion function.  This would be slighly
>    slower, but might be considered more mathematically pure.

This is almost certainly the right thing to do. The overall alpha
has nothing to do with the structure of the filter tables; it's
just included in the filter tables as an optimization to speed
things up. It might even make sense to drop the optimization entirely
and do it when computing each pixel if that would simplify dealing
with roundoff errors.

> Lastly I would like to discuss a few problems that I noticed with the
> interpolation tables that are generated with INTERP_PIXOPS_HYPER.  These
> problems exists equally in the current code and in my patch.  
> 
> 1. I notice that when using INTERP_PIXOPS_TILE and INTERP_PIXOPS_BILINEAR
>    that the corrections range from -16 to +16.  This range of correction
>    makes sense due to correcting for rounding errors.  This range of 
>    32 is only .04% of 65536, so the correction will obviously not 
>    greatly affect the end result.
>    
>    However, with INTERP_PIXOPS_HYPER the correction is often very large
>    sometimes as high as 11943.  This is 18.22% of 65536, which is 
>    obviously significant enough to cause significant distortion to the
>    filter when the correction is not applied evenly to the filter.  The
>    current correct_total() implementation only corrects the 1st value
>    in the table it finds that can handle the correction without going
>    negative.  I suspect that this creates unacceptable results, especially
>    for a filter which is intended to be of highest quality (and is also
>    the slowest).  I suspect that the problem could be corrected by 
>    improving the logic of bilinear_make_weights.  However, the problem
>    could also be addressed by distributing the correction more evenly
>    across the filter.

If the HYPER table is that far off, it's just buggy, and we need to
debug it.

I haven't reviewed the details of your code yet. It looks reasonable
in general, except for the overall_alpha problem and some of the
function names. (E.g., bilinear_quadrant() should be renamed if it
is dealing with half of the [0,1] interval rather than a quarter
of the [0,1]x[0,1] square. :-)

I think it probably makes sense to go ahead and apply some version
of your patch; making the code faster never hurts, and making the
code simpler will help future changes to actual address the important
problems the code has.

Regards,
                                              Owen





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]