Re: [BuildStream] Nuking the pickle jobber.
- From: Tristan Van Berkom <tristan vanberkom codethink co uk>
- To: Sander Striker <s striker striker nl>
- Cc: BuildStream <buildstream-list gnome org>
- Subject: Re: [BuildStream] Nuking the pickle jobber.
- Date: Tue, 16 Jun 2020 15:44:18 +0900
Hi,
Thanks for bringing it to the list.
On Mon, 2020-06-15 at 22:52 +0200, Sander Striker wrote:
Hi Tristan,
Angelos responded on the !1965 (
https://gitlab.com/BuildStream/buildstream/-/merge_requests/1965#note_361444144
). I'll quote it below for completeness.
Cheers,
Sander
I think at this point, having this feature in it's present state
serves mostly as an obstacle to any progress and refactoring,
although it would be great if someone wanted to own this feature,
fix it, and eventually complete #1021.
Unfortunately Mac, Windows, and "plugin replay" support didn't draw
the huge amount of interest I'd hoped for; so perhaps this is for the
best. Its nice to see code become less complicated at least.
Here are some other points in favour of keeping these changes, in
case one of them is persuasive :)
First, it may not be obvious that this is also a step back for
support on Mac. As of Python 3.8, the spawn method of starting child
processes is the default as fork is considered unsafe, more details
here. That means that state must be pickled over to the child jobs.
The recklessness with which python developers break API across minor
point versions never ceases to amaze me :)
That said, we can certainly force the method to use "fork", which is
what I expect many other python programs which were previously using
multiprocessing will be doing if they need Mac support.
[...]
Unfortunately errors during pickling are notoriously tricky to debug,
which certainly adds drag.
This cannot possibly be overstated.
As I described in the MR, the whole process of figuring out what was
going awry and fixing one of the easier instances of the problem ate a
full day of my time (for one of two problems, I am still blocked by the
fact that instantiating a plugin requires executing Loader code in
order to search for the junction from whence a plugin should be
loaded).
For someone less motivated, this kind of wall costs us a would-be
contributor who I would expect to give up on our project entirely when
faced with such a problem.
For someone equally motivated but without prior knowledge of how object
references work, it could have easily cost a few days or even a week to
figure it out on their own.
With the current motion to adopt PEP484[0], I can predict that
necessary shuffling of object relationships to eliminate implied cyclic
dependencies will cause this brand of issue to resurface multiple
times.
I think that whatever happens, the code in it's present form is
unacceptable in master - this may be controversial to say but IMO: code
introduced into master should not only work, but should at the very
least be designed to minimize the future cost of ownership as much as
is conceivable at the time we add code to master.
From a full reading of your message, I would propose the following
changes in order to make this burden acceptable:
* The act of clearing a member which should not be pickled
(regardless of whether it is in the regular case or in the
"Job specific" case), should be implemented as a decorator.
E.g.:
def __init__(self, name):
# Clear this member upon serialization
#
@serialization.clear
self.name = name
Avoiding any duplication of knowledge in __getstate__,
__setstate__, or get_state_for_child_job_pickling().
* The same should be true of custom serialization of attributes,
e.g.:
def __init__(self, name):
# Use a custom serialization routine for this attribute
#
@serialization.custom(_serialize_name, _deserialize_name)
self.name = name
* jobpickler.py should not have intimate knowledge of the classes
which are being pickled for the sake of job pickling.
Instead we should have a base interface which does the
get_state_for_child_job_pickling() implementation automatically
for attributes specifically marked with decorators. This could
be called `JobSerializable` and implemented/inherited by classes
which need to buy into this.
jobpickler.py then would simply override the dispatch table
once for this custom thing:
pickler.dispatch_table[JobSerializable] = _reduce_object
Having these decorators and this class(interface) achieves two
things:
* Added safety, no need to declare all of this variable handling
in multiple places, or add assert statements as is discussed
here[1].
* Provides an actual interface to document, where we explain
to developers "how to do it", instead of requiring developers
actually read and deeply understand a complex conversation
going on in jobpickler.py.
* Similar to the above, we should have a `JobUnserializable`
interface which raises a *useful* error when __getstate__ is called
on that object.
This exception should have in it's message, a list of as much
context as possible, at the very least it should indicate:
* Which objects or functions have a direct reference on the
object at the time of serialization
* By what means they are referred to by this direct reference
(i.e. it should not be too hard to list which object attribute
is referencing the unserializable object).
What to do now ?
~~~~~~~~~~~~~~~~
I am personally a bit torn here, I want other developers to weigh in on
this.
The facts on the ground are:
* This is not a problem for Mac, we can always forcefully set
multiprocessing to use the "fork" method.
* This is a blocker for win32 native support, and that is huge,
I don't want to lose track of this, bringing it back will also
be difficult after ditching it.
* I don't personally believe this should be a blocker for the
activity of subprocessing the scheduler. We should be able to
get away with a single (possibly expensive on WSL) fork() early
on and be able to do regular IPC over Queues to meet our needs.
This was never expected to be "easy" anyway.
And our options are:
1) Get rid of this serialization stuff
2) Refactor/reimplement the serialization stuff to reduce the cost
of ownership, and reduce the burden of knowledge required to
generally write BuildStream code.
This also requires solving the issue of instantiating plugins
in subprocesses via the "spawn" method (which does not work for
plugins loaded through the new "junction" origin).
If we decide to take this route, I would probably still try to
resolve the plugin instantiation issue first in order to land
long standing !1901 ASAP.
3) Solve the issue of instantiating plugins in subprocesses via the
"spawn" method without any refactoring of object serialization.
As I've pretty much stated, I feel strongly opposed to this
approach, I don't think the current code is viable.
Keeping in mind that I've been working on !1901[2] for a long time now
(on and off while we finally agree on a design) and that this very high
priority branch is blocking on a decision, where should I be spending
my time ?
Best Regards,
-Tristan
[0]: https://www.python.org/dev/peps/pep-0484/
[1]: https://gitlab.com/BuildStream/buildstream/-/merge_requests/1463#note_194278834
[2]: https://gitlab.com/BuildStream/buildstream/-/merge_requests/1901/
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]