Re: [BuildStream] Re-structure of BuildStream Gitlab policy document



On Wed, 2018-11-07 at 18:27 +0000, Laurence Urhegyi via BuildStream-list wrote:
Apologies I haven't got around to responding to this earlier.

Have commented on the MR also:

https://gitlab.com/BuildStream/nosoftware/alignment/merge_requests/7/diffs?commit_id=921b10e1569e45723878ce19e92a34ca91251b76

Hi Laurence,

Thanks for bringing this back to the list, this conversation deserves
much more attention than dwelling in the above merge request, which I
would implore others who are interested in this thread to also read.

On 29/10/18 15:05, Tristan Van Berkom via BuildStream-list wrote:
It's probably a good time to start discussion around our (over)use of
gitlab, indeed I think the bar is too high, and more importantly we
have a terrible signal to noise ratio.

Agree entirely about simplicity. If people feel we are over-using then we should 
take on-board that feedback and adapt the policy accordingly. We need more 
contributors to join in this discussion, really.

My goal here is create a policy that is clearly defined and simple for anyone to 
pick up. Some stuff may not work, and if we realise that over time then we can 
change it if we agree, but let's get things simplified, agreed and defined.

We had a very simple policy before, which I believe allowed for decent
level of notifications and visibility, this was basically:

* There are only two types of label

   * severity (enhancement, bug, critical, blocker), these effect the
     default sorting of the main issue list, via label "prioritization".

   * category (frontend, cache, optimization, logging, etc), these allow
     one to view issues related to a specific area or aspect of the
     codebase.

* If your merge request is not ready for upstream review, prefix it
   with WIP so that we don't have to consider it.

* The assignee field is only there to ensure we avoid duplication of
   work, and we only ever self-assign.

But our gitlab was a bit of a mess back then. It could take a long time to find 
the issue you were looking for, for example. And the amount of contributors has 
grown massively since (and thus so has the amount of tickets raised) so we 
needed to mature a bit, I think.

It could have been more clearly defined, but it was at a level of
simplicity where documentation of how to use it was arguably not even
required.

I disagree that issues were any more difficult to find than now, in
fact now I think they are much more difficult to find.

Now that we have several boards to view the issues which people insist
on using, the rest of us need to see the issues in several different
views.

There are currently only 277 open issues:

   https://gitlab.com/BuildStream/buildstream/issues

But if I have to see them through several different windows, I feel
like we are multiplying the cognitive effort of consuming the issues by
the amount of boards we have. The boards should only be useful to those
who use them, and using them should not be imposed on the rest of us,
this only adds confusion to what is otherwise a nicely sortable and
filterable list of only 277 issues.

But this problem with the boards is the least of our worries, if this
was the only problem then I don't think we would have reached a point
where affirmative action is required.

More below...

Let's take a step back and see what we really need, and what are the
problems with the current setup:

* Signal to noise ratio is unbearable.

   I get hundreds of emails a day related to issues and merge requests,
   and the majority of these are quite meaningless. The biggest offender
   here I think is over-use of the assignee field, while having too many
   labels still contributes to this.

I think you may be exaggerating how much those particular emails add, but yes, 
there are *a lot* of emails if you watch all of the gitlab notifications as we 
both do, so less of those can only be an improvement. Personally it is fine by 
me to go to 'only self-assign'. And on that note try to keep gitlab tickets to 
technically relevant details and minimize the noise, as opposed to comments 
related to 'status' etc.

I can guarantee you that Jürg also receives the emails, and probably a
small handful of others, and I was alerted that even Daniel has given
up on being included in the feedback loop due to the overhead.

This, coupled with the amount of metadata required to be input by
contributors in order to satisfy the boards, are our two biggest pain
points at the moment.

Still more below...

* Assignee button pushing is not a civilized means of communication.

   People should be capable of forming complete sentences and actually
   engaging in human communication when requesting that someone review
   something, or requesting that someone comment with their opinion on
   something.

   There is no value to the project to know what is assigned to whom
   beyond just knowing who is doing the engineering work to close a
   given issue at a given time (i.e. to avoid duplication of work, we
   want to use the assignee).

You and others have differing opinions on this. I don't feel very strongly 
either way. This would be solved anyway if we followed the above points.

* Labels are overly complicated, as you point out.

   This adds burden to anyone filing an issue or later tracking
   progress, not to mention that bookkeeping of too many details
   contributes to the bad quality of our signal to noise ratio.

   I would argue that the new status labels (todo, doing, verify, etc)
   are irrelevant to the tracking of project issues, we can do away with
   all of this complexity.

I hardly think the status labels are adding much of an overhead. It takes 2 
seconds to move a ticket from one column to another.

Rather, the inconsistent and confusing way that 'severity / high_impact' is 
being applied needs addressing, and there are a lot of 'category' labels which 
don't even get used, as far as I can see. Having so many labels in in a drop 
down list is a bit over-whelming for a new contributor and will surely only 
drive them away from bothering with labels.

   It is more important to have a simplified issue filing and tracking
  
experience, optimized for new contributors and drive by contributors,
  
than it is valuable to do this level of book keeping in the issue
  
tracker proper.

Agree. Which is why I wrote the patch.

   From a highlevel project perspective, we should really only be
   concentrating on the merge request list, and only those merge
   requests which are not marked as WIP. Prioritization of work that is
   in progress, and tracking of that work, should be up to those who are
   doing the work and sometimes managers (in the instances where
   contributors have managers).


Regarding your question about severity / impact labels:

   I personally only care about the truth: if there are 500 open issues,
   and 400 of them are critical, then so be it. There is no point on
   pretending that issues are less severe because we have limited
   resources; prioritization of work on more severe issues can just as
   well be done outside of the issue tracker.

   The story behind the "High_Impact" label is basically that it should
   be the "Critical" label, except that there is a desire to keep the
   number of critical issues in the "severity board" to a minimal
   number, I cannot really explain or understand the rationale behind
   this.

   For this case I would suggest that we base the critical label on
   actual conditions like:

      - Does it cause someone to lose data
      - Does it happen often
      - Does it result in an irrecoverable project state or corruption
        of the local artifact cache
      - Does it result in a stack trace without a decent explanation of
        what went wrong to the user, or explanation of how the user
        can recover

Sounds sensible to me. As long as we clarify and define.


   And then we can just remove the High_Impact label completely, since
   the intention of the High_Impact was to be the "Real Critical" label
   (as opposed to the "What we pretend is really critical based on what
   we have time to focus on").

Is this really the case? I don't think it is, and I certainly hope not.

It is absolutely the case, ask Agustín.

<snip>

I'm not going to jump the gun and change any policy at this time,

And nor should you: I don't think anyone has the right to do that. It's a policy 
that we'll all have to use daily and we should put it forward on this list to 
agree before settling on something.

Sorry but I have to stop you right here and set things straight, while
we will normally discuss most major things on the list, I fear the way
you have phrased this is subject to wildly inaccurate interpretations.

Under absolutely no circumstances is this project going to devolve into
some kind of democracy. While everyone who contributes will always have
a voice, it is not true that they will have a vote.

This project will always have strong leadership, and we will always
have people around who are in charge and will be able to affect
affirmative action, otherwise wounds like this one can be left to
fester.

My main mission right now is to grow the plurality of this leadership
such that there are always a healthy number of maintainers around to
lead the project, such that I am not a bottle neck, and may even be
able to work on completely different projects without this one
suffering.

Quite ironically, the very gitlab problem that is under discussion is a
huge stumbling block to my achieving this goal of getting others
actively involved in leadership roles.


Let's structure the rest of the email a bit more clearly...


Problem statement
-----------------


Contributor difficulty and friction
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I believe you (Laurence) have captured and understood this point well
enough, and is your motivation for creating
https://gitlab.com/BuildStream/nosoftware/alignment/merge_requests/7.

Basically, this is a free and open source software (FOSS) project,
which means we need to make this place friendly for all contributors.

This applies equally to contributors who are employed full time to work
on the project, hobbyists who want to spend some free time hacking on
BuildStream, and those who are employed full time to work on the
projects by companies which we do not have any affiliations with
whatsoever.

For the time being, BuildStream has more full time paid contributors
than other hackers, which is really a rare and unusual situation which
is great, but we should not assume this will last forever and we should
prepare the project to stand on it's own two feet and continue to be a
good place to contribute to when the funding evaporates, which I
guarantee you it will (whether it is due to it's success or it's
failure).

At this point, the policies around metadata for issue filing is too
complicated even for the full time engineers which we have strong
affiliations with, but it needs to be simple enough for a drive by
contributor to easily submit a weekend patch with another weekend
followup in review, and have a pleasant experience doing it.


Notification spam
~~~~~~~~~~~~~~~~~
As a maintainer of BuildStream, either at a global level or to a minor
degree (say, a maintainer of a subsystem), it is important to be
involved in the feedback loop at all levels.

This means:

  * Knowing when users have filed a new defect or enhancement request.

    When appropriate, we provide some feedback to the user who filed
    the issue, or close the issue as a duplicate, etc.

  * Knowing when a new branch is submitted for review, or when a new
    iteration of branch is submitted for review.

    This ends up in a queue, but it is very useful to us to see
    the review comments as they roll in, and be alerted via email
    to the newly created merge request.

    Unfortunately, people file way too many merge requests, and
    we have tons of them sleeping in WIP needlessly, this we should
    also put an end to with stronger policy.

  * Knowing when an issue is closed and why, perhaps another maintainer
    landed a contributed branch and at least reading the subject line
    tells me if that unblocks other things.

  * Knowing when someone starts working on an issue, and outlines
    what they intend to do in the issue (filing of "task" issues,
    or comments left on an existing issue by a contributor who is
    starting to work on something).

    At this stage, it is highly important that maintainers get the
    notification and have a chance to read a brief outline of what
    a contributor intends to do in their patch.

    3 out of 4 times, there is at least one implementation detail
    that is either one of:

      (A) completely incorrect, and the proposed solution wont work

      (B) overkill of an implementation to solve an otherwise simple
          problem

      (C) an approach which appears to be the right architectural
          choice right now, but will not make sense when
considering
          other developers ongoing work

      (D) an approach that will work, but will simply be messy and
          unhealthy for the codebase long term
    
    At this stage, it is appropriate for a maintainer to correct the
    problem before it happens, saving the contributor possibly days
    or even a week or more, when it finally comes review time and the
    patch is unacceptable.

Now to put this into perspective, rewind a handfull of months to when
we did not have this much overhead in gitlab... My every day morning
involved 30 minutes to 1.5 hours of going through the notifications in
my inbox. Even with 15+ active full time developers, I usually could
provide minimal feedback, only on the issues and merge requests which
required it, in this morning ritual (of course Jürg helped out on a
good portion of these issues, without that help it might have been more
like 2 or 3 hours per day of just providing highlevel feedback).

Now fast forward to present day... My every day morning starts the same
way, but I really have close to 5 or even 10 times the amount of
notifications, for what is really close to the same amount of ongoing
issue discussion and merge request - and I am unable to easily identify
which of the emails I can safely ignore just by reading the subject
line, because now we have tons of emails just because some tag WIP
board related tag stage changed, or an assignee changed.

The other week, when replying this email, I found that by the end of
the day; I was unable to even get through a whole day's worth of
notifications (because during the day, I also get proded for reviews of
things, and my remaining time is not enough to cover a single day of
providing feedback).

By the end of the week, I was days behind on notifications, and there
may very well be new important issues being filed that I am not even
aware of because I just can't get through my daily inbox.


Spreading and sharing leadership
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
As I mentioned earlier in this email, it's right at the top of my
list of priorities to grow the leadership pool for this project.

A part of this effort will be affecting a new committer policy as was
discussed on list:

   https://mail.gnome.org/archives/buildstream-list/2018-September/msg00044.html

... but another part of this is to get people involved enough that they
develop their own sense of ownership, feel they have a personal stake
in the codebase, and feel confident in assuming leadership in various
capacities and levels.

In my experience in other FOSS projects; the first big milestone for a
contributor in their path from being a casual patch submitter to
becoming a core developer or even maintainer, is when they take the
leap and subscribe to the issue tracker notifications.

This point is much related to the previous point, currently the
notifications are not concise enough, to the point where I cannot even
catch up at the rate they are coming in, so I cannot recommend that
people include themselves in the feedback loop until this is fixed,
which is a very sad state of affairs.

Ideally I would expect that all developers who work full time on
developing BuildStream to be subscribed.

The current workflow of the feeback loop is basically:

  * Users file issues
  * Someone marks the issue as "important", or else it might
    just fall through cracks and not get looked at for a long time
  * We might delegate it to a developer if we have available
    resources
  * Maintainers have to chase down developers who should be fixing
    the issue (e.g., the issue is a result of a feature they landed)

This is horribly inefficient, the right way for this to work is:

  * Users file issues
  * All core developers and maintainers receive notifications
  * A developer notices the issue and immediately recognizes that
    this bug is fallback from their feature
  * The correct developer fixes the issue on their own, and either
    commits the fix if it is simple enough or seeks a review
  * A lot of the issues are already solved and we don't need to
    go chasing down developers and telling them that some fallout
    happened due to a branch they landed

Slowly but surely, those who demonstrate responsibility through this
process will be granted wider scope commit rights and essentially
become leaders in the project, and perform more reviews.


Solution
--------
Revert mostly to what we were doing before.

There are a few major points to change in order to have a simpler
experience interacting with the issue tracker, and restrict
notifications to only relevant ones.

  * Revert to the old usage of the asignee field

    Revert back to the assignee field to be used as self-assign only,
    and as a signal to show that someone is working on patches to solve
    the issue, so as to avoid any duplication of work.

  * Back out the WIP related tags and kanban-like boards

    This stuff adds a ton of notifications that are unrelated to
    issue tracking and only pertinent to tracking work in progress,
    while at the same time lowering the bar for anyone who wants to
    work on BuildStream and interact with the issue tracker.

    We can keep the severity board around, for those who prefer
    to view the issue list as separate columns instead of just
    using the prioritized labels to view the entire issue list.

    In any case, we need to keep severity labels around, so
    having it drive a board will not be an obstruction or add
    any cognitive complexity for those who do not look at boards.

  * Implement a policy for not creating merge requests prematurely

    These prematurely created merge requests add a ton of notifications
    which are not pertinent to all core developers, and all core
    developers should be subscribed to the project level notifications
    in order to be included in the pertinent feedback loop.

    We need to be careful to not spam people.

    If you are working on a branch, you will still get the CI benefits,
    and you can still ask your peers for reviews on your branch before
    it is ready to be submitted for a merge - there is no substantial
    need or use to file a merge request before you are... requesting
    that the branch get merged.


Conclusion
----------
I think it is important that we all keep in mind that this is not a
controversial change.

Agustín proposed that we try a new approach in May:

   https://mail.gnome.org/archives/buildstream-list/2018-May/msg00033.html

And the result of this thread is essentially that we allowed this
experiment to take place on the conditions that Agustín would maintain
all of the extra metadata and that we would never expect, especially
new contributors, to need to have any understanding of these, and that
core developers would eventually get the hang of things, which we
mostly have.

Further this was partly designed to improve visibility of work in
progress, which is nice-to-have, on the condition that it does not
impose too much of an onerous and bureaucratic experience interacting
with the issue tracker, which I believe to be a must-not-have.

A lot of things came out of this work which are nice, like having issue
templates, and setting up milestone metadata, but I think Agustín will
agree that it's quite fair to say that the tracking of WIP side of
things is a failed experiment and is now more of an impediment to our
workflow than a benefit in any specific way.


If people feel strongly about keeping the priority/severity labels
around, then I implore you to please reply with concrete advantages and
examples of how these things are improving our lives, wieghing these
against the clear disadvantages I have tried to outline in this email.

Cheers,
    -Tristan



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]