Re: [BuildStream] Some bad mistakes, lets understand them, and make this work better

From: Sander Striker <s striker striker nl>
To: Tristan Van Berkom <tristan vanberkom codethink co uk>
Cc: BuildStream <buildstream-list gnome org>
Subject: Re: [BuildStream] Some bad mistakes, lets understand them, and make this work better
Date: Wed, 12 Sep 2018 15:40:27 +0100

Hi,

On Wed, Sep 12, 2018 at 12:09 PM Tristan Van Berkom <tristan vanberkom codethink co uk> wrote:

[...]

> > On Tue, Sep 11, 2018 at 10:04 AM Tristan Van Berkom via BuildStream-list <buildstream-list gnome org> wrote:
> > TL;DR: We had a regression which causes the artifact cache to get
> > corrupted[1], and fixing it was extremely difficult because
> > developers are making pretty bad mistakes and they are landing
> > on master, outlined in this email. Inviting discussion on how we
> > can improve things.
> >
> >
> > Hi all,
> >
> > I'm writing you today because I see evidence that we are all not
> > doing well enough in review, we are rushing things in too quickly,
> > regardless of whether it is a new feature, or a frantic bug fix or
> > performance improvement for our stable release. We have a lot of
> > developers on hand now, so I don't think we have an excuse.
>
> I think there is more judgement here than is warranted.

Perhaps. I tried to clarify this is not about blame, and if I'm passing
judgment, I think it is on all of us and I want to share in that as
well.

Understood. And I [think I] understand your intention, I'm just not fully agreeing with the conclusions. And while the disclaimer is there, it's hard to read past it.

But to take a step back and put things into perspective...

* Last week this issue came up with the artifact cache, that by
itself is not a problem.

* I tried fixing it with a few simple commits, but that was not
enough to solve it, that is also fine.

* Finally, when looking more deeply at what is happening related to
calculation of cache size, I could not in fact even understand what
is going on in there.

This to me is red flag number one.

* Not understanding the implementation is one thing, but more than
this, I could not even understand what was *intended* to be going
on in there by a reading of the code.

And this is number two.

* I have to conclude that there is not a single intention, but rather
a few crossed intentions.

That in itself doesn't necessarily need to be problematic, just as long as it is clear.

This raises a big red flag to me, or was rather; quite a shock.

Now to circle back to the intent of my message: I think that if people
were to see the examples I've laid out here; they will empathize and
share in the shock which I experienced.

By making sure that people understand how only a few bad choices
(especially breaking of encapsulation, but the other rules of thumb
also apply) here and there, can add up to something that is entirely
incomprehensible; I hope to raise awareness such that this kind of
thing doesn't need to happen in master.

While I think this approach is laudable, I don't think it will resonate enough.

> > In the interest of improving things, I am going to call out a bunch of
> > examples of things which have landed on master in recent months, which
> > really should never have gone as far as landing on master.
>
> I think it might have been better to lead with how we can improve all
> the way at the bottom, because I'm pretty sure the majority of the
> audience will tune out halfway through.

Definitely, I didn't order that in the best way.

I'm still impressed with the structure you managed to keep though :).

> I think the emphasis here should be on what is being paid attention
> to in review.
> > Let me be clear: this is not intended as a rant, and I don't intend to
> > place blame on specific people; I am as much to blame as the rest of
> > us, as some of my examples involve code which I did not take enough
> > time to review, I think we all have to slow our role here and be more
> > careful.
> >
> > Also, especially to those of us who are more experienced: we have to be
> > spending more time reviewing code than writing code; we need to set a
> > good example otherwise peer review really doesn't work and ends up
> > being just a checkmark that somebody else has looked at the code.
> >
> > Since we switched to aggressively granting commit access almost 2
> > months ago[0], we have been allowing some unacceptable things to land,
> > along with some things which could be better. On the surface, we have
> > an improved velocity, and while this checks a lot of boxes, this is not
> > worth while if our standards of quality go completely off the rails,
> > leaving us with a dead ended codebase.
> >
> > One thing I have to stress is, when you perform a review; first ask
> > yourself these questions:
> >
> > * Does this patch make the codebase simpler than it was before ?
> >
> > * Is the newly added code going to be easy for the next developer to
> > read ?
> >
> > * Does this patch circumvent encapsulation of logic in modules ?
> >
> > * Does this patch add transient state to object instances instead of
> > properly propagating transient state through function calls ?
>
> Can we add this to HACKING.rst?

I worry that the more we throw into HACKING.rst, the more it becomes a
sort of black hole where we put policy statements that nobody is going
to actually read... that said after looking at your subversion example,
I am significantly swayed.

I hear you, but I do think it's worth it, and will help newcomers a great deal more with understanding the codebase. Having something similar to http://subversion.apache.org/docs/community-guide/general.html#code-to-read would already be a good first step.

We can probably come up with a few more points here based on the rest
of this email... but these points on their own are pretty solid.

[...]

> > A series of ugly stories
> > ========================
>
> [....really useful specific examples...]
>
> The examples above are really good material for HACKING.rst. Even
> just linking to this thread might be useful.
>

I am happy to make HACKING.rst more elaborate and structured, maybe if
we structure it well enough we can point to specific sections of it,
and then we can hope that at least the parts of it we point at will be
read.

+1.

> > Exclusivity of the CACHE resource
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > I hold myself primarily responsible for having let this issue creep
> > in in the first place, as it should have been obvious and I was not
> > attentive enough in my review of Tristan Maat's code, even if time
> > constraints was an issue it is no excuse; it only demonstrates that
> > we are rushing things into master way too quickly.
>
> I don't think that we can jump to that conclusion. Or maybe it's the
> way that you've phrased it here. When I read quickly, I am thinking
> about time. And time is not a good measure in this case. A code
> change doesn't become better with age.
>
> https://gitlab.com/BuildStream/buildstream/merge_requests/347 was
> started 5 months ago. It landed 1 month ago
> through https://gitlab.com/BuildStream/buildstream/merge_requests/548
> .

By gitlab's "A month ago" fuzzy logic perhaps; it landed July 18th,
though.

Apologies, I didn't find the actual dates, and indeed went by the UI.

I agree that it took quite some time, but in this particular case, I
really feel that Tristan Maat was out of his depth and this required a
lot of design phase before implementation. So I helped a lot with
designing the changes which needed to happen to the scheduler.

Did we actually bring this to the list? Because changes to the scheduler are changes to the core, and that is something that everyone should be aware of. It might have drawn some additional eyeballs to the design and intent.

Before continuing on this point; I should point out that this example
should not have been in the email.

I wanted to point out that I made a mistake of being careless, and
thinking that "If I reviewed it a bunch of times, it must be okay" was
irresponsible of me.

However, this is not one of those obvious mistakes (or wrong choices)
which I can really argue that "It shouldn't have ever landed in
master".

Breaking encapsulation, calling private functions, and adding transient
state to instance variables, pretty much all of the other examples I
listed here are things we should be able to avoid off the bat.

> There's 113 "discussions" on there, with several senior developers participating.
>
> > In order to implement automatic purging of LRU (least recently used)
> > artifacts in the local artifact cache, we needed a bit of churn in
> > the scheduler.
> >
> > I guided Tristan Maat a lot in the features which would be needed,
> > and had to cleanup a bit of a mess he left behind in the logging
> > which needlessly duplicated code and caused things to be hard to
> > follow.
> > In a nutshell; tests were added, there were many iterations of
> > review, and by the end I did not have the attention span left to
> > properly follow that everything is working properly.
>
> To me this also smells of the long-lived-branch problem. The
> problems weren't seen because they weren't landing on master. And
> when the whole change did land on master, it was too big to properly
> review again. And besides, several people had looked at it already,
> it baked for 4 months, so it must be good, right?

So, I quite strongly disagree with this.

I should have tagged that last line with <sarcasm/>. I should know by now that humor in written form is not always coming across.

Deciding that a patch makes the cut due to the number of review rounds
which previously occurred can only make sense if the branch under
review is really only churning slightly in response to the review
rounds - which really wasn't the case here.

[...]

> > Essentially, the underlying implementation details were implemented
> > properly, basically we added a "read write lock" mechanism in
> > _scheduler/resources.py, to ensure that we could handle the
> > exclusivity of jobs which require the artifact cache resource, and
> > ensure we don't have starvation by prioritizing jobs which require
> > exclusive access to a resource.
> >
> > The result in the end however was that:
> >     * Cache size calculation jobs became mutually exclusive to cache
> > cleanup jobs
> >     * The pull jobs and build jobs, which write to the artifact cache,
> > did not require the artifact cache resource *at all*
> >     * Only one job was allowed to access the artifact cache at a time,
> > regardless of exclusivity.
> >
> > The result is that the whole thing we designed to ensure pull/build
> > could not run at the same time as cleanup, was not actually being
> > used for this purpose.
>
> After 4 months it's easy to lose track of the true intention / design
> of the code change, and to lose excitement.
>
> [...]
> > In summary, How can we do better ?
> > ==================================
> > In closing, I think it's appropriate to review the "Project is under
> > heavy load" discussion we had back in April[2].
> >
> > I have to say that, I agree with Sander for the most part that the
> > "land first and fix it later" approach works well in the settings which
> > I have experienced them. But I should admit that in those circumstances
> > we did not allow relative novices commit large features without heavy
> > review either, and with a handful of 5 to 7 full time developers with a
> > good 10 years of experience under their belts, you only have to worry
> > about the occasional regression, which you can indeed fix quickly after
> > the fact.
>
> I'll admit that I didn't quite expect us to be as aggressive with our
> commit access as we have - we sort of went from one extreme to the
> other :). I was expecting us to have existing committers invite
> others to the set of committers. Essentially, give commit access to
> people you feel can be trusted to land a change with confidence, or
> call out for help when they need it. In essence that they show good
> judgement.

I very strongly agree with you and would prefer a tighter circle of
committers myself.

I'm not sure how we can really describe that in terms of policy,
though, and it was raised to me that we probably need to be acting on
policy.

I can see that side of the coin, even though I personally have greater
confidence in a completely ad hoc approach where people use critical
thinking to decide to hand out commit access on a case by case basis
(it wouldn't be perceived as "fair", but I'm not sure it needs to be).

The process can be subjective. I'll refer back to http://subversion.apache.org/docs/community-guide/roles.html#committers as an example of something that works.

> I definitely want to be careful with putting people in categories
> like novice or senior or junior. I prefer us to think of everyone as
> peers, each with strengths and weaknesses. I'm not worried about
> people overstepping boundaries; all the code is in version control,
> and we can always go back to a previous state - at a cost.

Well, I think that what you are saying in the last paragraph shows that
you agree that we have a higher level of trust with some actors, and
those are preferably who we would choose as committers.

I would agree with that.

I might be wrong to use those words, but to me; usually demonstrated
experience in maintaining software, learning from past mistakes which
led to unmaintainable code and having to aggressively refactor - is an
indicator that they are considering the right things when landing
patches.

That said; this is my explanation of why I chose those words, I am fine
with other vocabulary, or accepting more broadly that some people are
better at some things.

It's very easy to demotivate people by putting a label on them, that is the sole reason I'm pointing this out.

> > I am not suggesting that we abandon the approach entirely either, we
> > have to work on making it work better.
>
> +1.
>
> > I should also stress that I am not worried about our team members
> > ability to add regression tests with their patches and make the
> > software mostly "work", I am worried about dead-ending the overall
> > codebase by growing warts on it such that there is no longer any
> > comprehensible design or control flow, at which point no feature can be
> > added or bug can be fixed without mass rewrites.
> >
> > What can we do to make things better ?
> >
> > * I submit that peer review on it's own is not enough, we need to
> >     have the more experienced developers perform review for the
> >     entry level developers.
>
> I want to be careful phrasing it like this.
>
> >     For more trivial patches, developers should also seek out the
> > review of those peers who are more familiar with a given area of
> > the code base than the patch submitter.
>
> When someone _lands_ a change in an area that you are familiar with,
> you will pay more attention. When a change sits on a branch, waiting
> for your review, it's a lot easier to let it sit there then to give
> it your full attention :)

Of course that view is also valid, on the other hand that branch is not
doing any harm while it is not landed in master - once you land
something which needs a redesign in master, others branches start to
build on top of something broken, and things get crooked quickly.

Sure. The part "that branch is not doing any harm while it is not landed in master" is what I am arguing is actually problematic, there is an incentive to _not_ land it. And if it isn't landing, then review is less important as well, and so on.

I am sure that for the category of mistakes, or bad choices, which I've
called out - we need to do better at making sure those never hit master
in the first place.

The reason for this is because I in fact agree with you that we should
land more branches and sometimes have to fix problems after, rather
than stagnating *too* long in review.

*nod*

[...]

> > If you didn't have time to review a patch, and nobody appropriate

> >     had the time to review a patch, it is not acceptable that patches
> >     keep landing without the review of someone who knows what they
> >     are doing - it is MUCH better that the feature is postponed if
> >     it saves us from the kinds of mistakes I've outlined in the horror
> >     stories above.
>
> I'm off a slightly different mind here. Typically people will pay
> attention to what is landing on master - given enough visibility. We
> haven't really gotten our -commits@ list sorted properly in that
> respect.
>
> If we feel that changes are landed prematurely, and effectively
> people are showing poor judgement, then that's a different story. We
> should go back to reviewing commit privileges in that case.

I feel that we are.

But more accurately I feel that we are not making changes in the right
ways in the first place, and would rather emphasize on that, rather
than the timing.

Making changes in the right ways comes way ahead of review, so we should indeed put emphasis there.

[...]

> I do find it a bit daunting that you posit you're the only one who
> cares currently. That isn't inspiring nor motivating for others that
> put in time to do reviews.

Yes, bad me, that was drastically phrased.

It is how you feel, as you state below. I was merely reflecting back what impact such a statement might have on others. It wasn't meant to make you feel worse :).

As I've also mentioned in the same email; I know that Tiago
specifically has demonstrated that he cares enough to submit patches
which refactor some of the things I've called out in this email.

That said, I find myself rather alone on this front which is sad, and
more seriously: I am not enough people to refactor the code base at the
rate which others are landing features.

Nor should you. But clearly the understanding of the codebase and the intents are not aligned. While that is the case you are setting yourself up for an impossible mission and a world of frustration. For yourself, but also for others.

I do want to see more patches which don't actually close issues, but
just fix code that is redundant, misleading, or outright dangerous.

Again, we need more alignment for that. If it isn't obvious, it's not going to happen.

Also, "consideration" is a better word here than "caring".
Consideration of the side effects your patch has to others who will
have to contribute to the same code base, takes a conscious effort, and
we need more of that effort.

I don't believe that "people don't care", but I believe that where
there is lack of consideration, it is mostly a lack of experience.

I think that Jim's response shows this to be not quite the case.

Clearer guidance and expectations make up for a lot of lack of experience. And lack of experience also means not having to un-learn certain things :).

We just need to communicate better, why we make the technical decisions
we make in review (which I'm trying to illustrate with my examples of
what not to do), so that others who end up being reviewers, make the
same considerations we do.

Right, and that's the gist of it. We need to make this front and center. And more approachable than reading this thread, or rather, your original post :).

[...]

> Admittedly, when I finally got around to taking your original
> diagrams and document, and then comparing it to the codebase, things
> are not quite the same anymore. There is now a Stream, for
> instance. I timed out, and apologize for that.

Yes, since that document was created before the first line of code was
written, I am very happy to see just how much it actually resembles the
beast we've created :)

You can definitely still see the outlines :).

> > I'm not sure what else we can do here. I think that we need to make
> > public vs private members, and single vs double underscores, more clear
> > in the HACKING.rst somehow. The concept is not very difficult, and we
> > only really deviate from the Python standards because we need to hide
> > some methods on public classes which our core calls into internally.
> >
> > We cannot however solve everything with policy in the HACKING.rst,
> > people have to care about concepts like encapsulation; and the
> > HACKING.rst cannot become a course on how to write maintainable code.
>
> My suggestions above were really helpful in that respect :D
> Seriously, HACKING.rst is already at a point where it needs an
> index. Maybe we need to split things out. An example from the
> Subversion project that grew over time:
> http://subversion.apache.org/docs/community-guide/
>
> I would recommend reading through the first two sections just to get an idea.

This does look well structured and quite elaborate, without necessarily
being too onerous for anyone to read.

I did comment above that it can be a bit of a black hole where we only
tick boxes that make us feel like we've advertised policy, but I agree
that from your example it can be actually useful.

> > Any other ideas on how we can implement better pre-merge disaster
> > prevention ?
>
> I'm going to say pre- or post- merge :).
>
> The intent of the change needs to be clear when doing a review.
> Knowing what changes are expected to be in there is useful.
> Continuing with the same example project as above, I used to
> appreciate the commit log messages a great deal. Random
> example: https://github.com/apache/subversion/commit/d29903f400be65a6
> 398bba58d446fd95b69ce019
>
> The commit message outlines the overall change, and then the change
> per file (or group of files). This allows for two things:
> - verify that the summary makes sense.
> - verify whether the code changes are matching the summary.

On my part, I usually observe the commit messages when reviewing a
merge request, and only sometimes do I review a merge request commit by
commit, usually to ensure we are not including the history of how the
feature was created and fixed into mainline.

That said, I should probably review commit by commit more often and if
people do that, we should see improvements there.

Do we want to do something at the MR level that provides guidance like:

http://subversion.apache.org/docs/community-guide/conventions.html#log-messages

Or do we want to do that on a per commit basis? Or not at all?

I have another technical proposal to solve parts of the issues in my
examples, which is an approach I considered when originally writing up
the public facing APIs we have for Element / Source etc, but at the
time I wrote the idea off as over engineering.

Basically the idea is like this:

* For every class that is public API, we have a second class that is
private API, for the sake of using trendy names, lets call these
classes Controllers.

E.g. ElementController, SourceController, etc.

* The controllers hold public functions for BuildStream internals
to *call*, but all state is still held on the actual public
instances.

* Only the controllers can access the private members of their
public counterparts.

This approach could allow us to actually lint for any private member
accesses, by only excluding the controllers themselves from this
linting.

Then we could use the regular python conventions for public/private
members and their notations, hopefully making things all around more
clear.

Of course, it's significant churn and costs some effort, feature
branches would conflict a lot; but the refactor would be rather
mechanical and straight forward to carry out (albeit still a lot of
work).

Perhaps this change is worth it for the added linting securities it can
provide.

It feels like an extreme measure, and potentially unnecessary if people know what the expectations/rules are?

> If you managed this far, thanks for reading ;).

It was a great reply, thanks for writing it !

Cheers,
-Tristan

Cheers,

Sander

Cheers,

Sander

Follow-Ups:
- Re: [BuildStream] Some bad mistakes, lets understand them, and make this work better
  - From: Tristan Van Berkom

References:
- [BuildStream] Some bad mistakes, lets understand them, and make this work better
  - From: Tristan Van Berkom
- Re: [BuildStream] Some bad mistakes, lets understand them, and make this work better
  - From: Sander Striker
- Re: [BuildStream] Some bad mistakes, lets understand them, and make this work better
  - From: Tristan Van Berkom

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]