Re: [BuildStream] Performance: cython



Hey Tristan,

Responses inline

On Thursday, 23 May 2019 09:26, Tristan Van Berkom <tristan vanberkom codethink co uk> wrote:

TLDR: I want to introduce Cython [0] to the codebase to selectively improve the
performance of BuildStream.
I have some benchmarks showing that the performance improvements are significant.

In general I think this sounds nice.

In the longer term, I would also like to have the hot paths (like
ruamel.yaml) implemented entirely in C and use only a bindings
interface to those, but that's a bit pie in the sky for now.


With that aim in head, adding Cython is a good thing, since we will already have
the interfaces to talk to C directly.

Also a side note: Some or our dependencies (Pyroaring, Ruamel) as partly written in
Cython. We might need to ask developers for a few changes (Pyroaring doesn't export
its Cython definitions) but we might be able to squeeze even more performance without
diving ourselves into C.

[...]

-   We would be unable to run against the Pypy and Jython interpreters unless we
      also provide a pure python implementation of the cythonized modules.
      We already can't do this today, due to grpc.


From what I understand, cpython is the main interpretor which we
expect, and probably if there are installation targets which have other
python interpretors, these can be installed in parallel.

Yes

So if I understand correctly, this is only a concern in terms of
features we can consume from other interpretors, but it is not a
concern in term of being easily installable on any host, correct ?

Not sure what you mean about "features we can consume from other interpretors".

But if the concern is easy installation, I would not expect this to be a problem.


-   We would need to have a C compiler in the environment where we
    install.
      - This is already the case because PyRoaring doesn't publish wheels to PyPi.
      - We can also ship wheels, removing this need.


We need to have a C compiler in the environment where webuild, not
where we install.

Correct.

During development however, we of course always build where we install.

-   Finally, and perhaps the most annoying, is that we now need a compilation
      phase before testing. For people that do not modify the Cython files, this
      can be done only once the first time.
      For tox, however, we would now need an explicit build step. In order for this
      to work, we would need build time dependencies, which requires pyproject.toml.
      We had decided to not introduce it without a need [3], I adapted the code in
      my MR to show how it would look like.


Just for clarification, this build step will be incorporated into the
regular `python3 setup.py build` phase correct ?

Correct, we will however need "Cython" installed in the environment where
we build if we haven't pre-generated the C files before.

For this, we should publish to pypi source distributions with pre-generated
files, and on developers machines just require Cython.

Will this not integrate transparently with `tox` ?

Tox, unless using pyproject.toml doesn't install build dependencies before
installing the package. We would therefore have no guarantees that cython
was installed in the tox environment in which we build the package.

Mostly, I want to know that the packaging story of BuildStream is not
harmed by this, and the regular `build` `install` and `test` commands
used for the purpose of building, installing into a DESTDIR, and
testing against host dependencies in an isolated distro build
environment, is not regressed by this change.

I don't expect this to present problems, just putting it out there :)


This will require a few changes in our setup.py. Mainly what I intend to do:

1) Require Cython when running "setup.py sdist". That way we ensure that our
source distributions always contain the C files
2) When invoking "setup.py build" or "setup.py test" or "setup.py install", check if we have Cython.
   - If we have Cython, everything works transparently.
   - If we don't have Cython, check if we have the generated ".c" files.
      - If we do, use them and hope they are up to date
      - Otherwise abort and ask user to install cython.

And otherwise, everything should work transparently.
Does that seem acceptable to you?

Cheers,
-Tristan

Thanks for the answer!
Ben



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]