Improving staging performance



Hello!

In this message I hope to present a convincing case that staging delay is
significant for a reasonable amount of files, and that for rebuilds it can be
drastically reduced.

I ran an experiment locally with an alpine sysroot and 100k files. I think this
is a reasonable amount of files because it's roughly the same number I work
with on another project, which is DPKG-based.

Similar to the results in previous posts on performance, I see that staging
takes around 20 seconds. This is a significant speed bump in the workflow.

I found that with a trivial hack, it's possible to cache the results of
staging. This means that for rebuilds, or builds that use the same
dependencies, we don't pay the cost of linking files into the sandbox. With the
hack applied, the trivial re-build feels near-instant.

I think a lot of work would need to be done to make the approach
production-ready.

My hope is that this message will start a conversation to explore the
possibilities for speeding up staging :)

Cheers,
Angelos

P.S.

In trying to address the same problem in Bazel, some engineers have created
'sandboxfs' for avoiding the burden of linking into the sandbox. Instead it
creates a FUSE fs that maps as desired.

https://blog.bazel.build/2017/08/25/introducing-sandboxfs.html

Separately I'm going to continue the work of measuring BuildStream performance
in general, as discussed on the 'Benchmarking BuildStream commands from user
point of view' thread.

P.P.S.

Here's my full repro steps and local patch, for reference.

Steps to setup the test:
```
wget https://nl.alpinelinux.org/alpine/v3.6/releases/x86_64/alpine-minirootfs-3.6.1-x86_64.tar.gz

mkdir -p sysroot
tar -x -f alpine-minirootfs-3.6.1-x86_64.tar.gz -C sysroot --exclude=./dev
tar -z -c -f alpine-3.6.1-x86_64.tar.gz -C sysroot .
rm -rf sysroot

mkdir manyfiles
cd manyfiles
mkdir -p {1..100}/{1..100}
for name in {1..100}/{1..100}/{1..10}; do echo $name > $name; done
cd ..
tar czf manyfiles.tar.gz manyfiles/
rm -rf manyfiles/

mkdir bst-alpine

cat > bst-alpine/alpine.bst <<EOF
kind: import
description: Import Alpine Linux.
sources:
- kind: tar
  url: file://$(pwd)/alpine-3.6.1-x86_64.tar.gz
EOF

cat > bst-alpine/manyfiles.bst <<EOF
kind: import
description: Import many files.
sources:
- kind: tar
  url: file://$(pwd)/manyfiles.tar.gz
EOF

cat > bst-alpine/test.bst <<EOF
kind: manual
description: Trivially exercise a manual step.
depends:
- filename: manyfiles.bst
  type: build
- filename: alpine.bst
  type: build
config:
  install-commands:
    - touch /buildstream/install/1
EOF

cd bst-alpine
echo 'name: bst-alpine' > project.conf

bst track alpine.bst
bst track manyfiles.bst
```

```diff
diff --git a/buildstream/element.py b/buildstream/element.py
index b76ef68..567e1e8 100644
--- a/buildstream/element.py
+++ b/buildstream/element.py
@@ -1057,6 +1057,21 @@ class Element(Plugin):
             def cleanup_rootdir():
                 utils._force_rmtree(rootdir)

+            cache_key = utils._generate_key({
+                'dependencies': [
+                    e._get_cache_key_from_artifact()
+                    for e in self.dependencies(Scope.BUILD)
+                ],
+            })
+
+            cache_path = os.path.join(context.builddir,
'staging-cache-' + cache_key)
+            using_cache = False
+            if os.path.exists(cache_path):
+                using_cache = True
+            else:
+                os.makedirs(cache_path, exist_ok=True)
+            rootdir = cache_path
+
             with _signals.terminator(cleanup_rootdir), \
                 self.__sandbox(rootdir, output_file, output_file) as
sandbox:  # nopep8

@@ -1070,8 +1085,14 @@ class Element(Plugin):
                 try:
                     # Step 1 - Configure
                     self.configure_sandbox(sandbox)
-                    # Step 2 - Stage
-                    self.stage(sandbox)
+
+                    if not using_cache:
+                        # Step 2 - Stage
+                        self.debug('Using root: {}'.format(rootdir))
+                        self.stage(sandbox)
+                    else:
+                        self.debug('Using cache: {}'.format(cache_path))
+
                     # Step 3 - Assemble
                     collect = self.assemble(sandbox)
                 except BstError as e:
@@ -1132,7 +1153,11 @@ class Element(Plugin):
                     self.__artifacts.commit(self, assembledir)

             # Finally cleanup the build dir
-            cleanup_rootdir()
+            # cleanup_rootdir()
+
+            utils._force_rmtree(os.path.join(rootdir, 'artifact'))
+            utils._force_rmtree(os.path.join(rootdir, 'scratch'))
+            utils._force_rmtree(os.path.join(rootdir, 'root', 'buildstream'))

     # _pull():
     #
```

Build without staging cache patch applied (27 secs):
```
time bst --debug build test.bst
DEBUG: Early enablement of messages
Loading:   003
[pid:1377  id:001][--:--:--][        ][ main:test.bst
    ] DEBUG   Created: manual element at test.bst [line 1 column 0]
[pid:1377  id:002][--:--:--][        ][ main:alpine.bst
    ] DEBUG   Created: import element at alpine.bst [line 1 column 0]
[pid:1377  id:003][--:--:--][        ][ main:alpine.bst-0
    ] DEBUG   Created: tar source at alpine.bst [line 4 column 2]
[pid:1377  id:004][--:--:--][        ][ main:manyfiles.bst
    ] DEBUG   Created: import element at manyfiles.bst [line 1 column
0]
[pid:1377  id:005][--:--:--][        ][ main:manyfiles.bst-0
    ] DEBUG   Created: tar source at manyfiles.bst [line 4 column 2]

Checking:  003/003

BuildStream Version 0.1.dev1451+gde35793.d20171110
  Session Start: Wednesday, 15-11-2017 at 09:55:14
  Project:       alpine (/test/alpine-bst)
  Targets:       test.bst

User Configuration
  Configuration File:      Default Configuration
  Log Files:               /tmp/angelos/.cache/buildstream/logs
  Source Mirrors:          /tmp/angelos/.cache/buildstream/sources
  Build Area:              /tmp/angelos/.cache/buildstream/build
  Artifact Cache:          /tmp/angelos/.cache/buildstream/artifacts
  Maximum Fetch Tasks:     10
  Maximum Build Tasks:     4
  Maximum Push Tasks:      4
  Maximum Network Retries: 2

Pipeline
      cached d92feae0 alpine.bst
      cached f255b0f8 manyfiles.bst
   buildable c8fc78c0 test.bst
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[pid:1377  id:000][--:--:--] START   Starting build
[pid:1384  id:001][--:--:--][c8fc78c0][build:test.bst       ] START
alpine/test/c8fc78c0-build.1384.log
[pid:1384  id:001][--:--:--][c8fc78c0][build:test.bst       ] START
Staging dependencies
[pid:1384  id:001][00:00:22][c8fc78c0][build:test.bst       ] SUCCESS
Staging dependencies
[pid:1384  id:001][--:--:--][c8fc78c0][build:test.bst       ] START
Integrating sandbox
[pid:1384  id:001][00:00:00][c8fc78c0][build:test.bst       ] SUCCESS
Integrating sandbox
[pid:1384  id:001][--:--:--][c8fc78c0][build:test.bst       ] START
Staging sources
[pid:1384  id:001][00:00:00][c8fc78c0][build:test.bst       ] SUCCESS
Staging sources
[pid:1384  id:001][--:--:--][c8fc78c0][build:test.bst       ] START
Running install-commands
[pid:1384  id:001][--:--:--][c8fc78c0][build:test.bst       ] STATUS
Running install-commands

    touch /buildstream/install/13

[pid:1384  id:001][00:00:00][c8fc78c0][build:test.bst       ] SUCCESS
Running install-commands
[pid:1384  id:001][--:--:--][c8fc78c0][build:test.bst       ] START
Running strip-commands
[pid:1384  id:001][--:--:--][c8fc78c0][build:test.bst       ] STATUS
Running strip-commands

    find "/buildstream/install" -type f \
      '(' -perm -111 -o -name '*.so*' \
          -o -name '*.cmxs' -o -name '*.node' ')' \
      -exec sh -ec \
      'read -n4 hdr <"$1" # check for elf header
       if [ "$hdr" != "$(printf \\x7fELF)" ]; then
           exit 0
       fi
       debugfile="/buildstream/install/usr/lib/debug/$(basename "$1")"
       mkdir -p "$(dirname "$debugfile")"
       objcopy --only-keep-debug --compress-debug-sections "$1" "$debugfile"
       chmod 644 "$debugfile"
       strip --remove-section=.comment --remove-section=.note
--strip-unneeded "$1"
       objcopy --add-gnu-debuglink "$debugfile" "$1"' - {} ';'

[pid:1384  id:001][00:00:00][c8fc78c0][build:test.bst       ] SUCCESS
Running strip-commands
[pid:1384  id:001][--:--:--][c8fc78c0][build:test.bst       ] START
Caching Artifact
[pid:1384  id:001][00:00:00][c8fc78c0][build:test.bst       ] SUCCESS
Caching Artifact
[pid:1384  id:001][00:00:26][c8fc78c0][build:test.bst       ] SUCCESS
alpine/test/c8fc78c0-build.1384.log
[pid:1377  id:000][00:00:26] SUCCESS Build Complete

Pipeline Summary
  Total:       3
  Session:     1
  Fetch Queue: processed 0, skipped 1, failed 0
  Build Queue: processed 1, skipped 0, failed 0
DEBUG: Destroyed: manual element at test.bst [line 1 column 0]
DEBUG: Destroyed: import element at alpine.bst [line 1 column 0]
DEBUG: Destroyed: import element at manyfiles.bst [line 1 column 0]
DEBUG: Destroyed: tar source at alpine.bst [line 4 column 2]
DEBUG: Destroyed: tar source at manyfiles.bst [line 4 column 2]

real    0m27.307s
user    0m13.430s
sys     0m12.360s
```

Priming staging cache with initial build of test.bst (25 secs):
```
time bst --debug build test.bst
DEBUG: Early enablement of messages
Loading:   003
[pid:1396  id:001][--:--:--][        ][ main:test.bst
    ] DEBUG   Created: manual element at test.bst [line 1 column 0]
[pid:1396  id:002][--:--:--][        ][ main:alpine.bst
    ] DEBUG   Created: import element at alpine.bst [line 1 column 0]
[pid:1396  id:003][--:--:--][        ][ main:alpine.bst-0
    ] DEBUG   Created: tar source at alpine.bst [line 4 column 2]
[pid:1396  id:004][--:--:--][        ][ main:manyfiles.bst
    ] DEBUG   Created: import element at manyfiles.bst [line 1 column
0]
[pid:1396  id:005][--:--:--][        ][ main:manyfiles.bst-0
    ] DEBUG   Created: tar source at manyfiles.bst [line 4 column 2]

Checking:  003/003

BuildStream Version 0.1.dev1451+gde35793.d20171110
  Session Start: Wednesday, 15-11-2017 at 10:09:06
  Project:       alpine (/test/alpine-bst)
  Targets:       test.bst

User Configuration
  Configuration File:      Default Configuration
  Log Files:               /tmp/angelos/.cache/buildstream/logs
  Source Mirrors:          /tmp/angelos/.cache/buildstream/sources
  Build Area:              /tmp/angelos/.cache/buildstream/build
  Artifact Cache:          /tmp/angelos/.cache/buildstream/artifacts
  Maximum Fetch Tasks:     10
  Maximum Build Tasks:     4
  Maximum Push Tasks:      4
  Maximum Network Retries: 2

Pipeline
      cached d92feae0 alpine.bst
      cached f255b0f8 manyfiles.bst
   buildable 0eb54564 test.bst
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[pid:1396  id:000][--:--:--] START   Starting build
[pid:1403  id:001][--:--:--][0eb54564][build:test.bst       ] START
alpine/test/0eb54564-build.1403.log
[pid:1403  id:001][--:--:--][0eb54564][build:test.bst       ] DEBUG
Using root: 
/tmp/angelos/.cache/buildstream/build/staging-cache-d13087cbaf56233b5bbdd49d070455c917042aa9c965d55ccdbedbd20ef4d2aa
[pid:1403  id:001][--:--:--][0eb54564][build:test.bst       ] START
Staging dependencies
[pid:1403  id:001][00:00:22][0eb54564][build:test.bst       ] SUCCESS
Staging dependencies
[pid:1403  id:001][--:--:--][0eb54564][build:test.bst       ] START
Integrating sandbox
[pid:1403  id:001][00:00:00][0eb54564][build:test.bst       ] SUCCESS
Integrating sandbox
[pid:1403  id:001][--:--:--][0eb54564][build:test.bst       ] START
Staging sources
[pid:1403  id:001][00:00:00][0eb54564][build:test.bst       ] SUCCESS
Staging sources
[pid:1403  id:001][--:--:--][0eb54564][build:test.bst       ] START
Running install-commands
[pid:1403  id:001][--:--:--][0eb54564][build:test.bst       ] STATUS
Running install-commands

    touch /buildstream/install/14

[pid:1403  id:001][00:00:00][0eb54564][build:test.bst       ] SUCCESS
Running install-commands
[pid:1403  id:001][--:--:--][0eb54564][build:test.bst       ] START
Running strip-commands
[pid:1403  id:001][--:--:--][0eb54564][build:test.bst       ] STATUS
Running strip-commands

    find "/buildstream/install" -type f \
      '(' -perm -111 -o -name '*.so*' \
          -o -name '*.cmxs' -o -name '*.node' ')' \
      -exec sh -ec \
      'read -n4 hdr <"$1" # check for elf header
       if [ "$hdr" != "$(printf \\x7fELF)" ]; then
           exit 0
       fi
       debugfile="/buildstream/install/usr/lib/debug/$(basename "$1")"
       mkdir -p "$(dirname "$debugfile")"
       objcopy --only-keep-debug --compress-debug-sections "$1" "$debugfile"
       chmod 644 "$debugfile"
       strip --remove-section=.comment --remove-section=.note
--strip-unneeded "$1"
       objcopy --add-gnu-debuglink "$debugfile" "$1"' - {} ';'

[pid:1403  id:001][00:00:00][0eb54564][build:test.bst       ] SUCCESS
Running strip-commands
[pid:1403  id:001][--:--:--][0eb54564][build:test.bst       ] START
Caching Artifact
[pid:1403  id:001][00:00:00][0eb54564][build:test.bst       ] SUCCESS
Caching Artifact
[pid:1403  id:001][00:00:23][0eb54564][build:test.bst       ] SUCCESS
alpine/test/0eb54564-build.1403.log
[pid:1396  id:000][00:00:23] SUCCESS Build Complete

Pipeline Summary
  Total:       3
  Session:     1
  Fetch Queue: processed 0, skipped 1, failed 0
  Build Queue: processed 1, skipped 0, failed 0
DEBUG: Destroyed: manual element at test.bst [line 1 column 0]
DEBUG: Destroyed: import element at alpine.bst [line 1 column 0]
DEBUG: Destroyed: import element at manyfiles.bst [line 1 column 0]
DEBUG: Destroyed: tar source at alpine.bst [line 4 column 2]
DEBUG: Destroyed: tar source at manyfiles.bst [line 4 column 2]

real    0m24.859s
user    0m13.130s
sys     0m10.440s
```

Change test.bst to touch file '2' instead.
Now it uses the staging cache (1.5 secs):
```
time bst --debug build test.bst
DEBUG: Early enablement of messages
Loading:   003
[pid:1413  id:001][--:--:--][        ][ main:test.bst
    ] DEBUG   Created: manual element at test.bst [line 1 column 0]
[pid:1413  id:002][--:--:--][        ][ main:alpine.bst
    ] DEBUG   Created: import element at alpine.bst [line 1 column 0]
[pid:1413  id:003][--:--:--][        ][ main:alpine.bst-0
    ] DEBUG   Created: tar source at alpine.bst [line 4 column 2]
[pid:1413  id:004][--:--:--][        ][ main:manyfiles.bst
    ] DEBUG   Created: import element at manyfiles.bst [line 1 column
0]
[pid:1413  id:005][--:--:--][        ][ main:manyfiles.bst-0
    ] DEBUG   Created: tar source at manyfiles.bst [line 4 column 2]

Checking:  003/003

BuildStream Version 0.1.dev1451+gde35793.d20171110
  Session Start: Wednesday, 15-11-2017 at 10:10:59
  Project:       alpine (/test/alpine-bst)
  Targets:       test.bst

User Configuration
  Configuration File:      Default Configuration
  Log Files:               /tmp/angelos/.cache/buildstream/logs
  Source Mirrors:          /tmp/angelos/.cache/buildstream/sources
  Build Area:              /tmp/angelos/.cache/buildstream/build
  Artifact Cache:          /tmp/angelos/.cache/buildstream/artifacts
  Maximum Fetch Tasks:     10
  Maximum Build Tasks:     4
  Maximum Push Tasks:      4
  Maximum Network Retries: 2

Pipeline
      cached d92feae0 alpine.bst
      cached f255b0f8 manyfiles.bst
   buildable 34f5b215 test.bst
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[pid:1413  id:000][--:--:--] START   Starting build
[pid:1420  id:001][--:--:--][34f5b215][build:test.bst       ] START
alpine/test/34f5b215-build.1420.log
[pid:1420  id:001][--:--:--][34f5b215][build:test.bst       ] DEBUG
Using cache: 
/tmp/angelos/.cache/buildstream/build/staging-cache-d13087cbaf56233b5bbdd49d070455c917042aa9c965d55ccdbedbd20ef4d2aa
[pid:1420  id:001][--:--:--][34f5b215][build:test.bst       ] START
Running install-commands
[pid:1420  id:001][--:--:--][34f5b215][build:test.bst       ] STATUS
Running install-commands

    touch /buildstream/install/15

[pid:1420  id:001][00:00:00][34f5b215][build:test.bst       ] SUCCESS
Running install-commands
[pid:1420  id:001][--:--:--][34f5b215][build:test.bst       ] START
Running strip-commands
[pid:1420  id:001][--:--:--][34f5b215][build:test.bst       ] STATUS
Running strip-commands

    find "/buildstream/install" -type f \
      '(' -perm -111 -o -name '*.so*' \
          -o -name '*.cmxs' -o -name '*.node' ')' \
      -exec sh -ec \
      'read -n4 hdr <"$1" # check for elf header
       if [ "$hdr" != "$(printf \\x7fELF)" ]; then
           exit 0
       fi
       debugfile="/buildstream/install/usr/lib/debug/$(basename "$1")"
       mkdir -p "$(dirname "$debugfile")"
       objcopy --only-keep-debug --compress-debug-sections "$1" "$debugfile"
       chmod 644 "$debugfile"
       strip --remove-section=.comment --remove-section=.note
--strip-unneeded "$1"
       objcopy --add-gnu-debuglink "$debugfile" "$1"' - {} ';'

[pid:1420  id:001][00:00:00][34f5b215][build:test.bst       ] SUCCESS
Running strip-commands
[pid:1420  id:001][--:--:--][34f5b215][build:test.bst       ] START
Caching Artifact
[pid:1420  id:001][00:00:00][34f5b215][build:test.bst       ] SUCCESS
Caching Artifact
[pid:1420  id:001][00:00:00][34f5b215][build:test.bst       ] SUCCESS
alpine/test/34f5b215-build.1420.log
[pid:1413  id:000][00:00:00] SUCCESS Build Complete

Pipeline Summary
  Total:       3
  Session:     1
  Fetch Queue: processed 0, skipped 1, failed 0
  Build Queue: processed 1, skipped 0, failed 0
DEBUG: Destroyed: manual element at test.bst [line 1 column 0]
DEBUG: Destroyed: import element at alpine.bst [line 1 column 0]
DEBUG: Destroyed: import element at manyfiles.bst [line 1 column 0]
DEBUG: Destroyed: tar source at alpine.bst [line 4 column 2]
DEBUG: Destroyed: tar source at manyfiles.bst [line 4 column 2]

real    0m1.451s
user    0m0.640s
sys     0m0.100s
```


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]