Inserting T=float casts makes the function perform better (at least here). It avoids conversions between single precision and double precision values (i.e. cvtsd2ss) which would otherwise be used. So this version
static inline float G_GNUC_CONST
fast_log2ff (float value)
{
union {
float f;
int i;
} float_u;
float_u.f = value;
// compute log_2 using float exponent
const int log_2 = ((float_u.i >> 23) & 255) - 128;
// replace float exponent
float_u.i &= ~(255 << 23);
float_u.i += BSE_FLOAT_BIAS << 23;
typedef float T;
T u, x = float_u.f;
// lolremez --long-double -d 6 -r 1:2 "log(x)/log(2)+1-0.00000184568668708"
u = T (-2.5691088815846393966e-2l);
u = u * x + T (2.7514877034856806734e-1l);
u = u * x + T (-1.2669182593669424748l);
u = u * x + T (3.2865287704176774059l);
u = u * x + T (-5.3419892025067624343l);
u = u * x + T (6.1129631283200211528l);
x = u * x + T (-2.040042118396715321l);
return x + log_2;
}
is faster, because all operations are on floats. This costs a bit of precision but the float version (fast_log2ff
) is faster than using double (fast_log2fd
) or long double (fast_log2fl
).
$ g++ -std=c++17 -Wall -g -O3 -o l2 l2.cc `pkg-config --cflags --libs spectmorph glib-2.0 bse`
$ l2
log2f: 5.369997 ns/call
fast_log2fl: 7.890487 ns/call
fast_log2fd: 4.662395 ns/call
fast_log2ff: 3.652096 ns/call
prec: fast_log2ff: 4.493532e-06
prec: fast_log2fd: 3.721012e-06
prec: fast_log2fl: 3.691373e-06
$ clang++ -std=c++17 -g -O3 -o l2 l2.cc `pkg-config --cflags --libs spectmorph glib-2.0 bse`
$ l2
log2f: 5.323792 ns/call
fast_log2fl: 7.597113 ns/call
fast_log2fd: 5.201006 ns/call
fast_log2ff: 4.071403 ns/call
prec: fast_log2ff: 4.493532e-06
prec: fast_log2fd: 3.721012e-06
prec: fast_log2fl: 3.691373e-06
On the other stuff I mostly agree. If you have use cases in mind (for key tracking or filter frequency modulation it doesn't matter) that need integers k exp2 (k) to be 2^k and you think you want to pay for it with one add-mul, ok. I think relative error is the most important goal here, though. For instance if the key tracking algorithm returns 222 instead of 220, from a muscians point it is as bad as returning 888 instead of 880. Both sound equally wrong, and both have the same relative error (not absolute error).
Applying corrections for fast_log2 (2^k) to yield k for integer k sounds ok to me. Note that it doesn't fix fast_log2 (7.999999) to be 3, as you patched only the case where the input is equal to or slightly greater than 2^k, not the case where it is slightly smaller.
fast_log2fl (7.999999) = 2.999996; log2f (7.999999) = 3.000000
fast_log2fl (8.000000) = 3.000000; log2f (8.000000) = 3.000000
fast_log2fl (8.000001) = 3.000000; log2f (8.000001) = 3.000000
This could be fixed by adjusting the linear coeffcient of the remez polynomial, but this would make our worst case error larger, and I think as the result is so close to the perfect value it is probably not worth it.
As for whether to approximate at all on AMD64: my impression from the benchmarks is that in many cases using one of the approximations would yield sufficient quality faster that exp2f or log2f. On AMD64 especially when using T=float internally.
However, the gain is not dramatic, and maybe we're trying to optimize something with approximations that is not really a performance problem. For instance the LadderFilter (the place where this started) typically only needs one log2 value per note-on. Only portamento would affect this negatively which we do not support at the moment. What I'm trying to say here is: if we use log2f/exp2f and one day we run perf on beast and see than 10% of the CPU usage is spent in exp2f, we could still deal with it at that point in time.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.