[BuildStream] Performance: cython



Hey everyone,

TLDR: I want to introduce Cython [0] to the codebase to selectively improve the
performance of BuildStream.
I have some benchmarks showing that the performance improvements are significant.

What I intend to change if my proposal is accepted:

- Rewrite _variables.py and _yaml.py in Cython (I want to keep the scope small
for now).

As an implementation detail, we would also need:
- Introduce pyproject.toml as per [3]
- Add a now necessary Build phase in tox.

I will explain why those changes later on.

I have setup a WIP MR [1] that shows what the code would look like. I'm still
refining it but I think that would be close to the MR I will do if we accept to
go this path.

While looking at how to improve performance of BuildStream further, I think
we are gently hitting the limits of Python and would like to consider
introducing a lower level language, in this case, Cython, which allows us to
easily compile Python-like code into efficient C code, able to interface easily
with the Python interpreter.

This change would be for selective part of the codebase. The intent is NOT to
replace everything.

I'm going through the pros and cons in this email, please let me know what you
think or if I overlooked something.

The outline is:

- Motivation
- What is Cython and why this and not X (C/Rust/...)
- Disadvantages of Cython compared to pure Python
- Visions for Cython in BuildStream
- Conclusion


Motivation
----------

BuildStream has a few very hot path on the code, where small improvements can
give big gains for BuildStream users. Cython helps speeding up Python code
without changing too much how the code was written.

I had a play rewriting _yaml.py and _variables.py to see what it would give.
I then ran some benchmarks against the master branch in order to see the
difference in performance that Cython would bring us.

These benchmarks have been run on a lenovo T470 laptop, with 2 physical/4
virtual cores machine, with 32Go of DDR4 RAM, and a NVME SSD. The machine was
running Fedora 30, fully updated as of the 22th of May, and had no other
significant program running.

Each run was done 7 times, with the two first runs of each discarded on
base-files/base-files.bst from jennis' debian project [4].

------------------------------------------------------------------------------------------
| commit                      | action        | python_version   |   max_memory |   time |
|:----------------------------|:--------------|:-----------------|-------------:|-------:|
| bschubert/cython - 238458d2 | build - 4     | py37             |          176 | 150.98 |
|                             | build - 8     | py37             |          176 | 164.62 |
|                             | show          | py37             |          176 |   5.22 |
|                             | show - cached | py37             |          176 |   5.35 |
| master - 25172ed2           | build - 4     | py37             |          185 | 156.38 |
|                             | build - 8     | py37             |          185 | 171.02 |
|                             | show          | py37             |          162 |   8.7  |
|                             | show - cached | py37             |          185 |   8.18 |
------------------------------------------------------------------------------------------

I also ran just the "show" on 'debian-stack.bst', again same machine and
7 times, with the first two dropped and got:

-------------------------------------------------------------------------------------
| commit                      | action   | python_version   |   max_memory |   time |
|:----------------------------|:---------|:-----------------|-------------:|-------:|
| bschubert/cython - 238458d2 | show     | py37             |         1438 |  76.36 |
| master - 25172ed2           | show     | py37             |         1554 | 111.25 |
-------------------------------------------------------------------------------------

As we can see, the performance improvement is significant, with little changes
in the codebase.


What is Cython and Why Cython and not X?
----------------------------------------

Cython [0] is a tool to write C extensions for python.
Its aim is to be able to seamlessly mix python and efficient C/C++ code in the
same codebase. It supports adding "extension type" and pure C functions for
some part of the code, while still using python classes and dynamic dispatch
in others.

It uses a syntax that is very close to python, adding some additional keywords
in the language. That means that a normal python file is already a valid Cython
code.

Cython is fairly mature and is used widely. Projects like Sage, Scipy and
Pandas have been leveraging it for multiple years.

The reasons for using Cython instead of another languages would, in my opinion, be:

- Cython interfaces transparently with Python. If we were to use another
  language, we would need to do the translations ourselves. And maintain them
  for future Python versions.
- Python code is already valid Cython code, which means we can easily move only
  some parts of a python file to actually use static definitions and keep the
  rest as is.
- Cython integrates well with the Python ecosystem since it is designed for
  this exact purpose.
- Getting accurate code coverage for Cython can be integrated with
  coverage.py [2]


Disadvantages of Cython compared to pure Python
-----------------------------------------------

There is multiple disadvantages I can see at introducing Cython in the project.

- Cython is Yet Another Language. That means that people wanting to touch the
  cython files need to learn it.
  - I consider this better than having to learn the python C API.
  - In addition, if we keep this targeted to specific files, we would minimize
    the number of people having to touch it.
   
- Since Cython compiles to C, we can now get Segfaults and other C niceties.

- There is no linter currently supporting Cython syntax. That means less
  rigurous checks on the cython files. However, IDEs like VSCode and Pycharm
  support Cython files.
 
- Profiling cython code is not as easy as profiling Python code. That is
  because, in order to integrate with Python profiling tools, Cython needs to
  wrap every C function call in a Python function call, to get profiling times
  per function. This adds overhead on the caller function.
  This however doesn't prevent profiling, and is jsut something that needs to
  be kept in mind (A good trick is to run two profiles, one with profiling,
  one without and see the differences).
 
- We would be unable to run against the Pypy and Jython interpreters unless we
  also provide a pure python implementation of the cythonized modules.
  We already can't do this today, due to grpc.
 
- We would need to have a C compiler in the environment where we install.
  - This is already the case because PyRoaring doesn't publish wheels to PyPi.
  - We can also ship wheels, removing this need.
 
- Finally, and perhaps the most annoying, is that we now need a compilation
  phase before testing. For people that do not modify the Cython files, this
  can be done only once the first time.
  For tox, however, we would now need an explicit build step. In order for this
  to work, we would need build time dependencies, which requires pyproject.toml.
  We had decided to not introduce it without a need [3], I adapted the code in
  my MR to show how it would look like.


Visions for Cython in BuildStream
---------------------------------

Given that Cython is less easy than Python to use, I would like to restrict its
use to real bottleneck, mostly on "sunny paths" of the code.

Moreover, since we can now get Segfaults and others, I would like to restrict
this to very well tested areas of the code, that don't move often. Mostly on
files which API is pretty stable.

Files that only expose a few endpoints to other modules are also a good fit,
since we can move more of the underlying code to close-to-C code.

All in all, I think `_yaml` and `_variables` to be the best fits, with
potentially a few selected files later on. Ideally, the least possible.


Conclusion
----------

I think that adding Cython to the codebase has benefits in terms of performance
that largely compensate the slight hit developers of the project will need
to take.

I think that the Cython code is close enough for Python developers to
understand and be able to work with it when needed.

Please let me know what you think and feel free to review and post
comments/concerns to the MR [1] too.

Sorry for the long email,

Benjamin


- [0] https://cython.org/
- [1] https://gitlab.com/BuildStream/buildstream/merge_requests/1350
- [2] https://cython.readthedocs.io/en/latest/src/tutorial/profiling_tutorial.html#enabling-coverage-analysis
- [3] https://gitlab.com/BuildStream/buildstream/merge_requests/639
- [4] https://gitlab.com/jennis/debian-stretch-bst/tree/jennis/use_remote_file




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]