As mentioned on IRC, enabling optimizations with MODE=release and picking clang++ (6.0 here) vs g++ (7.4 here) makes major differences when benchmarking exp2f and log2f from glibc against our approximations. On a modern AMD64 processor, glibc is often faster. Internally it also uses polynomials around order 4, but picks its coefficients from a table depending on the input argument. With that it achieves errors < 1 ULP and is often speedier because it can also use hand crafted SSE2 implementations.
I haven't had a chance to benchmark the approximations on musl, but so far, based on your submission, I'm inclined to integrate the following:
Here's the error correction I'm talking about, note that exchanging "long double" for "float" makes the code significantly slower, because it forces the compiler to add code to reduce precision. On my machine, this version is roughly as fast as log2f when compiling with optimizations, with both compilers:
static inline long double G_GNUC_CONST
fast_log2f (float value)
{
union {
float f;
int i;
} float_u;
float_u.f = value;
// compute log_2 using float exponent
const int log_2 = ((float_u.i >> 23) & 255) - 128;
// replace float exponent
float_u.i &= ~(255 << 23);
float_u.i += BSE_FLOAT_BIAS << 23;
long double u, x = float_u.f;
// lolremez --long-double -d 6 -r 1:2 "log(x)/log(2)+1-0.00000184568668708"
u = -2.5691088815846393966e-2l;
u = u * x + 2.7514877034856806734e-1l;
u = u * x + -1.2669182593669424748l;
u = u * x + 3.2865287704176774059l;
u = u * x + -5.3419892025067624343l;
u = u * x + 6.1129631283200211528l;
x = u * x + -2.040042118396715321l;
return x + log_2;
}
Error samples, compared to LOG2L(3):
+0.0, -0.00000231613294631
+0.5, +0.00000000000000000
+1.0, +0.00000000000000000
+1.1, -0.00000181973000285
+1.5, -0.00000130387210186
+1.8, -0.00000312228549678
+2.0, +0.00000000000000000
+2.2, -0.00000181973000285
+2.5, -0.00000140048214306
+3.0, -0.00000130387210186
+4.0, +0.00000000000000000
+5.0, -0.00000140048214306
+6.0, -0.00000130387210186
+7.0, -0.00000312228549678
+8.0, +0.00000000000000000
+9.0, -0.00000084878575295
+10.0, -0.00000140048214306
+11.0, -0.00000368176020430
+16.0, +0.00000000000000000
+32.0, +0.00000000000000000
+40.0, -0.00000140048214306
+48.0, -0.00000130387210186
+54.0, -0.00000149844406951
+64.0, +0.00000000000000000
+127.0, -0.00000162654178981
+128.0, +0.00000000000000000
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.