Re: jack-driver updated

From: Stefan Westerfeld <stefan space twc de>
To: Tim Janik <timj gnu org>
Cc: beast gnome org
Subject: Re: jack-driver updated
Date: Fri, 15 Sep 2017 21:22:22 +0200

   Hi!

On Tue, Aug 22, 2017 at 11:44:01AM +0200, Tim Janik wrote:

On 16.07.2017 15:42, Stefan Westerfeld wrote:

 IMHO none of these issues is so critical that it is a blocker for
merging. I.e. even if we ship jack-driver as-is, we're still better off than
without it.


Wrong. Sorry, there's no easier way to put it. The current shape of the jack
driver comes close to an end user assault and the world needs to be saved from it!

With that off my chest, let me try to be constructive ;-)

The Good:
We have a jack driver now, that means in theory the pitiful fraction of mistaken
linux users that think jackd would actually benefit them, instead of eating
their latency and stabbing their stability are now able to run Beast along side
their bug ridden jack related software stack.

The Bad:
The code quality of the BSE jack driver leaves lots of things to wish for, at
the very least, please run 'make check' on the Beast tree when you hack on code.
The check routines have been highly tuned not to waste too much developer
patience. I know your laptop is faster than mine, and it takes merely 25 seconds
here to run check + installcheck, you can afford that investment before
requesting a merge.


Ok. I did that and fixed make check for the jack driver now.

While we're at it, the deprecations during compilation also need fixing:
      jack_port_get_total_latency(jack_client_t*, jack_port_t*)' is deprecated


Fixed. Using jack_port_get_latency_range() now, which is not deprecated. While
I was at it, also added a latency callback, so BEAST sets the correct the
latency (with ringbuffer) on its input/output ports.

There're like a dozen FIXMEs in the code, kudos for adding the comments that
point at areas that need improvements, but hey, we really have to have those
fixed before we can merge and ship.
A lot of places need smaller updates to comply with the coding standards
elsewhere in BSE, I saw you made a start of that already, but what's still
missing is, of the top of my head:
      * assert_return() instead of g_return*_if_fail


Nothing to do here (fixed already), this is just because you reviewed an old
version of the driver.

      * Bse::printerr instead of g_print*


Fixed.

      * member_ instead of m_member variables


Right, I fixed this now.

Also the libjack dependency must be optional. The peaceful living individuals
that have no clue about jackd or libjack-jackd2-dev should not be confronted
with a hard libjack dependency that breaks Beast builds without providing any
benefit for them.


Done. You can compile BEAST now with --without-jack to disable the jack driver
explicitely. If you specify nothing, the jack driver will be built if
pkg-config finds libjack*.

I added a status message at the end of configure, so the user can see which
optional dependencies will be built (SpectMorph will be added to that list).

For the record, the *VAST* majority of linux users these days can be expected to
have pulseaudio installed, they wouldn't even be able to run the latest Firefox
if they didn't. The same can not be said of jackd, so in a purely utilitarian
world, it'd make a lot more sense to spend effort on improving pulseaudio
support than jackd support.


Among all linux users, you may be right. But we target a special group of users,
musicians, and in the linux world these are more likely to already use jackd.

The Ugly:
Beast crashes when Jackd exits (or crashes).
Even if Beast is *not* playing audio, the moment jackd exits, all user data held
by any currently running Beast instance is destroyed in flames. This is why we
definitely are better off *without* it, stability is paramount and this needs to
be fixed before end users can try the jack driver.


I don't see this issue here. All my tests (also killing jackd) have shown that
the driver doesn't crash. Maybe it is because you reviewed an old version. Can
you reproduce with the latest jack-driver branch version?

The Details:

- Why is SIGPIPE blocked? *If* blocking SIGFPE for our engine threads from jack
driver initialization works at all, then by pure luck. If needed, this'd have to
be moved to engine thread creation.


SIGPIPE is triggered if we write() data to a process we communicate with, and
that process dies. This is what can happen if we are connected to jackd, jackd
dies, and we use jack_*() functions after that.

I've done more tests, and was not able to reproduce the problem when I remove
the SIGPIPE code from our jack driver. Not blocking SIGPIPE doesn't lead to
crashes (like it did when I originally wrote the jack driver).

Nowadays, Gtk unconditionally ignores SIGPIPE signals. See gtk/gtkmain.c, which
contains

  signal (SIGPIPE, SIG_IGN);

This alone is enough to make beast perfectly stable, even if jackd dies, as we
initialize Gtk before that. So if you do

$ kill -13 <beast_pid>    # 13 = SIGPIPE

with a vanilla BEAST, nothing happens. So the JACK driver should be stable whether
or not we add more code.

I wasn't so happy that we depend on another library doing this for us (for
instance, what happens if we write a commandline program that interacts with
the JACK driver?). Also note that the manual page for signal() says that

... The effects of signal() in a multithreaded process are unspecified.

So I removed the SIGPIPE code from the JACK driver and added code to libbse in
bsemain.cc which blocks SIGPIPE explicitely. It should be early enough so that
it happens before other threads are initialized, so that all of our threads
inherit the signal mask.

As a final remark, from looking at the source code, it seems that the jackd2
appears to have some SIGPIPE handling code, whereas jackd1 doesn't. So this
may be a non-issue as long as the user runs jackd2.

- jack_client_open ("beast" <- no translation needed


Right. Not mentioned any more.

- "FIXME: get rid of device unloading, so that the jackd connection can be kept
initialized" <- send patches


Device unloading is already gone, I haven't seen that BEAST would unload the
driver (you reviewed old code which still has this FIXME). However it seems
that the right way to do it is the one from our last phonecall, that is: BEAST
itself should keep the driver open between explicit

 - start audio engine
 - stop audio engine

so this is not really a JACK driver issue, but should be fixed indepently of
the driver the user uses. So I removed the FIXME from the driver code.

- "FIXME: should we try to tweak local buffering so that it corresponds to the
user defined latency" <- Yes, I found myself adjusting the Beast latency
settings with jackd and it had no effect. Also, if latency is / stays huge with
jackd, that totally defeats running it in the first place.


I don't think we can do much better latency wise. We have these ring buffers,
and these need to contain a minimum amount of data to be able to always satisfy
the JACK callback.

The only alternative way to implement things would be without ring buffers. We
would directly run the engine within the JACK callback itself. So whenever
JACK thinks we should produce (for instance) 256 samples, we would compute
these samples. This would allow us to do zero latency. However, we would
explicitely violate the rules JACK has for code that runs within the callback,
which the JACK developer docs state as

...The code in the supplied function must be suitable for real-time execution.
That means that it cannot call functions that might block for a long time. This
includes all I/O functions (disk, TTY, network), malloc, free, printf,
pthread_mutex_lock, sleep, wait, poll, select, pthread_join, pthread_cond_wait,
etc, etc.

And of course since our driver API is just not designed this way, we would need
deeper changes in BEAST to get callback driven audio to work.



Anyway, lets go back to what we currently have: the user will specify the
amount of buffering within BEAST. This cannot be arbitarily small, because
the ringbuffers within BEAST must be big enough to ensure that the JACK
callback will have enough samples available when it occurs.

The jack latency will be added to the size of the ringbuffer yielding the
total value for the buffer. Now what is the minimum obtainable latency? This
depends on the jackd period size. I did an experiment with a deliberately small
period size:

$ jackd -d alsa -p 128

and a small latency setting in BEAST of 5ms. Then my jack driver reports

rlatency=2.667 ms wlatency=5.333 ms ringbuffer=7.000 ms total_latency=15.000 ms

So the effective output latency is the size of the ringbuffer + the jack write
latency, that is 12.33 ms. Note however that although I could play the demo
song with these settings, I observed occasional xruns. So this is about as
little latency as possible with my system.

- "FIXME no guarantee that beast will read the data from jack since this depends
on the presence of a module that reads data the synthesis graph" <- There is no
way a synthesis graph is executed without output modules, output modules are the
only thing driving the engine


Yes, this FIXME is about input. BEAST will in some situations *read* data from
the JACK driver, in other situations it won't. This even can change while the
driver is open, like for instance if you remove the input module from the full
duplex example.

However, this FIXME is fixed already: the code in jack_device_write() will now
trigger reading if necessary. Here is the comment from the code

  /* our buffer management is based on the assumption that jack_device_read()
   * will always be performed before jack_device_write() - BEAST doesn't
   * always guarantee this (for instance when removing the pcm input module
   * from the snet while audio is playing), so we read and discard input
   * if BEAST didn't call jack_device_read() already
   */

- "/* get rid of this *slow* interleaving code */" <- seems this should be a FIXME ?


No. Basically we could in some cases use faster (de)interleaving code. We could
even when redesigning the driver API avoid interleaving and deinterleaving because
the input/output module in the engine already has seperate buffers for each channel
which is just the memory layout the JACK callback needs/produces.

However, IMHO the total cost of the interleaving/deinterleaving compared to the
total cost of synthesis itself make this an optimization that we shouldn't do.

- "FIXME: we need a way to indicate an error here" <- at the very least, *print*
a message, so developers can trace the error conditions mentioned in the code


Ok, I added a Bse::printerr for the case that jackd dies. Ideally this would
stop the playback, too and popup a dialog, but I don't know how to do that
properly.

- If jackd stops (or crashes), Beast crashes (even if not playing), loosing all
user data:
  beast-0.11.1: ../nptl/pthread_mutex_lock.c:81: __pthread_mutex_lock: Assertion
`mutex->__data.__owner == 0' failed.


By the way, this doesn't look like a SIGPIPE issue. SIGPIPE usually causes a
silent and clean exit from the application (no messages). If you still see this
with the latest version, a backtrace would be nice.

- Beast should reconnect to jackd when it starts playing, and completely
disconnect from jack on stop, so the following sequence works:
      Beast plays through pulseaudio
      Beast stops
      pasuspend -- jackd
      Beast plays through jackd
      Beast stops
      jackd exits
      Beast plays through pulseaudio again


Yes, this works. At least I can do it without any problems with the current
version of the driver.

- If jackd is not running and Beast plays through PA, the jack driver still
spams stderr with error messages, please silence the driver if its not working.


Done. All errors produced by jack_client_open() are now suppressed. And of
course if errors do occur during JACK driver operation, they will still be
printed.

In summary, the driver needs a lot more love from you and fixes before it can be
considered for inclusion. The FIXMEs need to be straightened out one way or the
other - it's ok if other parts of BSE need touching for that (e.g. to mask
signals in the engine threads or to have a BSE callback to stop playback
altogether).

1.) signal handling

SIGPIPE must be blocked in our thread, because otherwise if the jack server
dies, BEAST will die receiving SIGPIPE - see FIXME - this should not be done
in our driver code, but BEAST should setup SIGPIPE handling globally


I can't reproduce this atm, Beast simply crashes when Jackd dies.


See remarks about SIGPIPE above.

2.) keeping the connection open

Currently, we reconnect to the JACK server every time the user presses play.


No we don't. That's what we *should* do.


Yes, we do. We reconnect on every play.

3.) jack device latency

The driver API is suboptimal here (see FIXME).


I'm happy to review patches here.


I've now implemented that and removed the FIXME. Basically I changed

static guint            jack_device_latency             (BsePcmHandle           *handle);

to

static void             jack_device_latency             (BsePcmHandle           *handle,
                                                         guint                  *rlatency,
                                                         guint                  *wlatency);

in the alsa, jack and null drivers, and the BsePcmHandle code. Turns out that
the API is not used by anyone at the moment, but in any case, this is what the
API should look like.

4.) handling jack server down

It can happen that during play, the JACK server dies (user can manually stop
server, server can crash). When that happens, currently the play position
pointer will no longer advance.


As stated above, we now at least Bse::printerr() a message if the JACK server
dies.

No, Beast dies and crashes in flames.


Since you tested an old version of the code, I suggest that you try the latest version
of the jack-driver branch here. I can definitely kill the JACK server here, and BEAST
keeps on running.

Also useful information would be the version of JACK you're using, the JACK API you
and version built BEAST against (libjack-dev vs. libjack-jackd2-dev), any stdout/stderr
stuff, and possibly a backtrace.

5.) auto connect should be configurable

Auto connecting to the hardware device (which is what the driver currently
does) is often but not always the desired thing to do. Other JACK applications
allow disabling auto connect, so that external programs (such as qjackctl) can
manage connections for the client.

So we should have a way of somehow disabling auto connect. Ideally at the UI.


Can you add a BSE setting for this, or is anything else needed?


OK, I added the jack_autoconnect setting and use this in the JACK driver now.

6.) jack server name

We currently don't support connecting to a specific JACK server, if more than
one is running. See FIXME. Ideally this could be added at the UI.


What's the use case for multiple jackd here? What do other jackd clients do?


Ok, after a little research, I now removed the FIXME without changing the code.

Running multiple JACK servers (with different names) is considered an expert only
feature. Neither Ardour nor Qtractor have any code to support this. Users that
really know what they do can set the environment variable JACK_DEFAULT_SERVER
before starting an application like Ardour, to use a non-default jackd.

This environment variable works with BEAST, too (I tested it), and from all I
know now this should be good enough.

7.) xrun reporting

We currently report xruns (for our internal ringbuffer) on stdout. Not sure if
that is the right thing to do, maybe we should just ignore xruns alltogether or
report them using PDEBUG() - so that developers can see them, but not users.


Nope, use printerr for now. Users *hear* xruns, and prining a message gives them
a clue about the cause.


Ok, using Bse::printerr now.

   Cu... Stefan
-- 
Stefan Westerfeld, http://space.twc.de/~stefan

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]