Re: GNOME user survey 2011 (v4)

On Fri, Aug 19, 2011 at 11:14 PM, Matthew Garrett <mjg59 srcf ucam org> wrote:
> On Fri, Aug 19, 2011 at 10:26:08PM +0300, Felipe Contreras wrote:
>> > Yes, because you have no idea how big the population is. Maybe 10
>> > million is the total population and it's representative. Maybe it's 50%
>> > of the population, disproportionately biased towards those of a given
>> > prior opinion. You can't know.
>> Do you have any idea what is the likelihood of that happening? Try
>> throwing a dice 10 times and always getting 1-3. Even if the dice is
>> rigged, it's very unlikely. It gets exponentially less likely 1
>> million times.
> That's clearly wrong. If you have a bucket of red balls and blue balls
> and you draw 10 million balls, and you find that you drew 6 million red
> balls and 4 million blue balls, what does that tell you? If you're
> sampling randomly it tells you that there are more red balls than blue
> balls. If you're subconsciously preferring to pick up red balls then it
> tells you nothing. So we need to avoid subconsciously picking red balls,
> which means we need to pick users randomly which is something we can't
> do with a voluntary survey. Cochran's formulas don't apply here because
> you're not picking your sample set at random.

That's a very bad example. An example closer to reality would be that
color is indeed the bias, but we are not interested in the color, but
the size of the balls. After the survey, we find out that overall, red
balls are bigger than blue balls. Fortunately we don't care about the
proportion of blue vs red balls in the total population, we only care
about blue balls, so, we only consider the size of those.

In the GNOME case, the color of the balls corresponds to the bias we
want to identify; like geekness, and the size is the actual thing we
are interested on, which is their happiness. We only care about non
geeks (blue balls), as many GNOME people have stated, the real target
users are the ones that don't even know what is GNOME.

Now, if what you are worried about is the self-selection bias, we can
add a new question "Why are you taking this survey?" with the option
"Somebody is pushing me", and encourage people to push their
relatives/colleagues/friends to fill the survey (just like a
"professional" firm would, except "crowd-sourced"). Then, for external
validity, you only consider the results of the people that answered
"Somebody is pushing me" (they don't have self-selection bias).

Felipe Contreras

