[Fwd: a request for gnome bugzilla data]

For any one who may concern,

Please see:

I sincerely hope gnome community could do the same thing for good.

Thank you, Minghui

-------- Forwarded Message --------
From: Minghui Zhou <zhmh pku edu cn>
To: gnome-infrastructure gnome org
Subject: a request for gnome bugzilla data
Date: Sun, 22 Jul 2012 19:35:17 +0800

Dear gnome-infrastructure masters,

Here is a super sincere request from a software engineering researcher
in Peking University, regarding gnome bugzilla data.

It's a long story, please allow me to waste a little bit of your time.

First, the data I want is as following.
For every bug, I want two pages (with full information, in particular,
the performer's email), e.g.,

Second, the reason why I want this data is as following. In order for
your understanding, I'll try to give some details.

I'm investigating long term contributors (LTCs, who stay with the
project for at least three years and are above 10-th percentile of bugs
per year) in OSS projects, in particular, how their attitude and
environment in their first month with the project impacting their chance
of becoming LTCs, hoping to understand best practices and help design a
better community architecture. 

I used issue workflow recorded in bug tracking system to conduct this
study.  For example, the first thing I need to do is to locate LTCs. I
consider the first time a contributor did activity in Bugzilla as his
joining day, and the period between his joining day and the day he made
the last activity (before the day the data was retrieved) as the
duration he stays with the project.
Based on this, I model people's attitude and environment through issue
workflow, e.g., I found a newcomer's pro-community attitude represented
by her first contribution being a comment on an existing issue instead
of a bug report or a report through Bugzilla interface instead of a
crashreporting tool double her odds of becoming an LTC. Having any of
issues reported during the first month to be fixed has the same effect.
Her micro-climate represented by low attention of too rapid response,
and her macro-climate represented by the increased project popularity
reduce her odds.  And so forth.

In all these calculations, to locate real people from their logins,
names, emails, activities are extremely important. In general, I
consider person's email as his handle that distinguishes him from
anybody else, and this is a relative good approach (activity page only
has emails, and not everybody has a name in the information page). If
people have multiple emails, I also have a way to deal with that -- out
of the scope ...

However, it is extremely difficult to deal with the bugzilla extract for
gnome we have, the reason is as following.

Indeed we retrieved gnome bugzilla once in Jan 2011. We understand the
retrieve may cause problems for gnome bugzilla, therefore we were very
careful about that. The retrieve was done without logging in, therefore
the data doesn't have emails for performers. For example, it is jhs
instead of jhs jsschmid de for each activity he did in the data. Or,
bugzilla-gnome instead of bugzilla-gnome vitters nl   
The issue with this data is, too many people share the former part of
their emails. For example, as for jhs,
u2n;jhs;5680;1;Johnny Haugen

There are many consequences. For example, it's difficult for scripts to
determine if jhs is an LTC, therefore I had to drop people with multiple
names in the calculation... 
This certainly hurts the soundness of the study, and I'm not sure to
what extent the truth was ignored.  

In any case, I was wondering if there is a way to get bugzilla extract
for gnome.

I understand the data may be sensitive to some extent, but I believe
this is for good. Understanding gnome practice not only helps other
projects, but also helps gnome community itself. I guarantee it will be
only used for research purpose. If it's necessary, I could sign any
agreement that protects any privacy you people don't want to expose.

Sorry if this bothers you too much.

Thanks,  Minghui Zhou

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]