Re: [BuildStream] Performance: cython



Hi Benjamin,

On Wed, 2019-05-22 at 09:41 +0000, Benjamin Schubert wrote:
Hey everyone,

TLDR: I want to introduce Cython [0] to the codebase to selectively improve the
performance of BuildStream.
I have some benchmarks showing that the performance improvements are significant.

In general I think this sounds nice.

In the longer term, I would also like to have the hot paths (like
ruamel.yaml) implemented entirely in C and use only a bindings
interface to those, but that's a bit pie in the sky for now.

[...]
Disadvantages of Cython compared to pure Python
-----------------------------------------------

There is multiple disadvantages I can see at introducing Cython in the project.

- Cython is Yet Another Language. That means that people wanting to touch the
  cython files need to learn it.
  - I consider this better than having to learn the python C API.
  - In addition, if we keep this targeted to specific files, we would minimize
    the number of people having to touch it.
   
- Since Cython compiles to C, we can now get Segfaults and other C niceties.

- There is no linter currently supporting Cython syntax. That means less
  rigurous checks on the cython files. However, IDEs like VSCode and Pycharm
  support Cython files.
 
- Profiling cython code is not as easy as profiling Python code. That is
  because, in order to integrate with Python profiling tools, Cython needs to
  wrap every C function call in a Python function call, to get profiling times
  per function. This adds overhead on the caller function.
  This however doesn't prevent profiling, and is jsut something that needs to
  be kept in mind (A good trick is to run two profiles, one with profiling,
  one without and see the differences).
 
- We would be unable to run against the Pypy and Jython interpreters unless we
  also provide a pure python implementation of the cythonized modules.
  We already can't do this today, due to grpc.

From what I understand, cpython is the main interpretor which we
expect, and probably if there are installation targets which have other
python interpretors, these can be installed in parallel.

So if I understand correctly, this is only a concern in terms of
features we can consume from other interpretors, but it is not a
concern in term of being easily installable on any host, correct ?
 
- We would need to have a C compiler in the environment where we
install.
  - This is already the case because PyRoaring doesn't publish wheels to PyPi.
  - We can also ship wheels, removing this need.

We need to have a C compiler in the environment where we *build*, not
where we install.

During development however, we of course always build where we install.

- Finally, and perhaps the most annoying, is that we now need a compilation
  phase before testing. For people that do not modify the Cython files, this
  can be done only once the first time.
  For tox, however, we would now need an explicit build step. In order for this
  to work, we would need build time dependencies, which requires pyproject.toml.
  We had decided to not introduce it without a need [3], I adapted the code in
  my MR to show how it would look like.

Just for clarification, this build step will be incorporated into the
regular `python3 setup.py build` phase correct ?

Will this not integrate transparently with `tox` ?

Mostly, I want to know that the packaging story of BuildStream is not
harmed by this, and the regular `build` `install` and `test` commands
used for the purpose of building, installing into a DESTDIR, and
testing against host dependencies in an isolated distro build
environment, is not regressed by this change.

I don't expect this to present problems, just putting it out there :)

Cheers,
    -Tristan



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]