Re: Suggestion: Change the specification for grand in glib

From: Sebastian Wilhelmi <wilhelmi ira uka de>
To: Sverre Johansen <sj ifi uio no>
Cc: Gtk Development List <gtk-devel-list gnome org>
Subject: Re: Suggestion: Change the specification for grand in glib
Date: Thu, 14 Dec 2000 16:12:24 +0100
Hi Sverre,

> g_random_int_range:
> -------------------
> 
> g_random_int_range returns a random number over the range
> [min..max-1].  I think this is a really bad idea. The parameter is
> named min and max, and I think that most programmers will presume that
> the range is [min..max]. If they miss reading the documentation and
> don't test their software well, we all know what the result will
> be. It will be much harder to change the specification after it is
> included in a stable version of glib.

Designing such things you can never satisfy everyone. The most common mistake
in programming is said to be the off-by-one error. For example you make an
array of 6 elements, but access elements 0-6, that is 7 ones. Thats why it is
good programming practice to always make intervalls with a closed (aka
contained) lower bound and an open (that is not contained) upper bound, such
that you uniformly can use loops of the form:

for (i = min; i < max: i++)

and that (max - min) really gives the number of elements in that range!

That is done that way in the C++ STL (for that reason) and in other places as
well, so I thought, it is reasonable here as well. But I agree, it's not a
canonical decision. It could as well have been made the other way round. But
now we have it and I would like to keep it. I hope the rational for this is
now more clear.

> g_rand_double:
> --------------
> 
> g_rand_double returns a random number over the range [0..1).  Taking a
> theoretical perspective the range should really be (0..1), but because
> of the floating-point-representation the ideal value will be rounded,
> down, up or to the nearest which is representable by the
> floating-point format.

Actually writing [0..1) or [0..1] is not a big difference, which shouldn't be
noticable by the mean or something like that (it's 1/2^52), but now, as I'm
exprementaing a bit with that, I have noticed, that tehe multiplicator is not
very good! Indeed using the following program it turns out to be quite bad:

#include <glib.h>

main()
{
  double a = 1;
  guint32 b = 0;
  
  double c;

  b--;

  c = a / b;

  printf ("%.70f\n%.70f\n%.70f\n%.70f\n",a, c, b*c, b*
2.3283064365386963e-10);
}

and that of course is something to be corrected. What we could do is using the
data from GDoubleIEEE754 in gtypes.h and filling in tha dara by "hand". But
that might pose more problems (portability comes to mind) as it solves. And
thinking more about it, it isn't trivially done.

> The implementation that is selected does not give a mean value of
> 0.5. It is slightly lower, how much depends on the rounding.

That must be due to the bad multiplicator. If choosen carefully it should
work.

> If the returned value is converted to a float and the value is rounded
> up or to the nearest we may get 1.0.

1.0 is very easily represented in IEEE double: it is 1 * 2^0 that is it has
mantissa 0 (the leading 1 is not encoded, as it is always 1) and biased
exponent 1023 and sign 0.

> I suggest that the implementation should be changed so that every
> possible floating-point number over the range [0..1] that is
> representable by IEEE754-double may be returned.

I would still prefer, that 1 itself is not returned for similiar reasons as
above, but that's not a dogma of course. In practice there won't be a
difference.

> Therefore the specification should say [0..1] instead of [0..1).
> 
> The changed specification allow others and more accurate
> implementations. I Suggest the following implementation of
> g_rand_double:
> 
> ------------------------------------------------
> /**
>  * g_rand_double:
>  * @rand: a #GRand.
>  *
>  * Return the next random #gdouble from @rand equaly distributed over
>  * the range [0..1].
>  *
>  * Return value: A random number.
>  **/
> gdouble
> g_rand_double (GRand* rand)
> {
>   gdouble transform;
>   guint32 r1 = g_rand_int (rand);
>   gdouble r2 = g_rand_int (rand);
> 
>   for (transform = G_RAND_DOUBLE_TRANSFORM; r1 == 0;
>        transform *= G_RAND_DOUBLE_TRANSFORM)
>     r1 = g_rand_int (rand);
> 
>   if (r1 < 1<<20)
>     r2 += g_rand_int (rand) * G_RAND_DOUBLE_TRANSFORM;
> 
>   return (r1 + r2 * G_RAND_DOUBLE_TRANSFORM) * transform;
> }
> ------------------------------------------------

That implementation looks really weird! why should transform become
G_RAND_DOUBLE_TRANSFORM ^ n if g_rand_int is zero more than once (which of
course wont happen, but why the loop then). That really seems questionable.
 
> With this implementation of g_rand_double, g_rand_int_range can use this
> function for every range. We could use the old implementation of
> g_rand_double as an optimization for ranges below 2^16.

Ah, so what your basically saying is, that your implementation of
g_rand_double is setting all 52 bits of gdouble, not only 32 as mine and it
can thus be used to circumvent the 'if (dist > 2^16)' part of
g_rand_int_range. Actually that seems like a good idea, even if it requires
g_rand_double to fetch to 32 bit random numbers. But your implementation
really looks weird to me (That can be my fault, though). Can you please
explain, what your are doing there.

> I have implemented other functions with other distributions and the
> derived g_random_* functions. These functions are the following:
> 
>  - g_rand_int_poisson
>  - g_rand_double_normal
>  - g_rand_double_negexp
>  - g_rand_double_erlang
>  - g_rand_boolean
> 
> The first four depends on the Math library and should therefore not be
> placed in glib, but where should they then be placed?  My
> g_rand_double_normal implementation is not based on the Polar method
> so no extra state information is needed, so it could depend on
> g_rand_double unchanged.
> 
> If there is no other library that these functions fit into we could
> make one new library called glibm, or conditionally include it in
> glib.

I don't have anything against these functions in glib, but last time I asked
noone was very fond of requiring -lm for glib, so that won't happen.
g_rand_boolean is also not necessary, but might provide for better readability
in the code. I might add macros to not bloat the code more.

Bye,
Sebastian

P.S.: Please let's continue this discussion on gtk-devel and not gtk-bugs.
-- 
Sebastian Wilhelmi
mailto:wilhelmi ira uka de
http://goethe.ira.uka.de/~wilhelmi
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]