Re: Pulseaudio

From: Lennart Poettering <mztabzr 0pointer de>
To: matteo member fsf org
Cc: desktop-devel-list gnome org
Subject: Re: Pulseaudio
Date: Fri, 12 Oct 2007 21:20:36 +0200

On Mon, 08.10.07 20:02, Matteo Settenvini (matteo-ml member fsf org) wrote:

Hi!

Ok, a little bit late because I was travelling, but here's my reply to
the whole PA thread on desktop-devel (as the
maintainer of PA).

This is a long reply, so you might to want to grab yourself of a cup
of coffee (or mango lassi?) before you start reading through it.

I'd like to thank davidz, matteo, haddess, jan for jumping in the
discussion for PA's defense and replying more quickly than I did. Thanks,
dudes!

> It has been a while since esound has received some attention - releases
> are almost stalled. Looking at the GNOME wiki, it seems that Pulseaudio
> is the stronger candidate between alternatives, and that it allows for
> quite a lot of nifty things.
>
> I'm running pulseaudio since four or five months now on two of my
> desktop systems, both x86 and PPC, and I must say that I'm really
> satisfied by it.
> It's quite stable and has very few compelling bugs for the normal user
> (e.g. when using it as an esound replacement on a machine with more than
> a logged in user it doesn't share the esd socket, or similar).
>
> It also seems to be actively developed, and is shipped by default with
> Fedora 8.
>
> Can it be eligible for inclusion in GNOME 2.22?

Coincidentally we discussed just this during the GNOME Summit on
sunday. Here are my 10¢ on this and all the issues raised in the whole
thread following Matteos proposal.

I am not sure that PA should become "part" of GNOME. A blessed
dependency sure, but really a new module of GNOME? Probably not.

Fedora now ships PA by default, and SuSE is moving to PA as
well. (Of the big distros only that spaceboy distro doesn't love us
anymore as it seems, as I haven't heard from them in a while) There
are still a couple of rough edges, like that we ship two volume
controls: one being the native PA volume control, which can do all
kinds of nifty things like per-stream volumes and moving streams
between devices. And gnome-volume-control which is much sexier UI-wise
(i18n, ...), but exposes a lot of cruft we'd prefer to get rid of
(i.e. all kinds of stupid alsa mixer tracks and checkboxes nobody
really understands, shows every devices thrice, ...) Resolving this
duplication probably needs a little bit tighter integration of PA and
GNOME: either the volume control tool in GNOME would need to link
directly against PA -- or we'd have to wrap all the special PA
features in GST's mixer interfaces -- which I think doesn't make that
much sense. Too many abstraction layers are bad, especially if there'd
only be a single backend driver which would implement most of it.

A couple of direct replies to what people brought up in their emails:

Martin Meyer suggested that PA was "heavy-weight". This is quite
frankly bullshit. It depends how you compile PA. Sure, PA is a little
bit bigger than ESD, but not that much. It of course becomes bigger,
if you compile *all* the modules we ship. But you don't have to do
this -- if you just want the core, then just compile the core and PA
is tiny. A lot of embedded people now start to adopt PA -- people
which a lot stronger constraints that we generally have on GNOME as a
desktop for a PC. So, the "bloat", "heavy-weight" issue is
nonsense. You can compile the SVN PA fine with just two external
dependencies (ALSA and liboil -- both libraries are nowadays installed
on all distros anyway -- so they don't really count) and it works
fine. Everything else is optional, and can be split off in seperate
packages. And even without those extra modules PA is still very
useful.

Regarding GST vs. PulseAudio: there is just no "vs."! Gstreamer does
muxing/demuxing/decoding/encoding of media streams. PA is a low-level
PCM-only sound server. They're too different things. You could
compare this to X11 and GTK: X11 just does a bit of windowing and
drawing for you; GTK does all those UI things on top. PA does just a
bit of buffering, mixing, filtering for you; GST does all those nice
decoding/encoding/muxing/demuxing things on top.

Regarding the PA vs dmix issue, Sven Neumann brought up. Yes, if you
only care about the simplest form of mixing, then dmix is sufficient
for you. However, if we want to provide anything that remotely comes
near to what Vista or MacOS X provides -- then we need some kind of
sound server, just like they are shipping one. (MS likes to call the
sound server a "userspace sound system", though, but that's just the
terminology. The imporant fact is that they have a real-time process
which serializes access to the PCM devices). So what does PA offer you
beyond dmix right now? From a user perspective this is: moving streams
on-the-fly between devices; distributing audio on multiple audio
devices at the same time; per-stream volumes; fast-user-switching
support; automatic saving/restoring of per-application devices,
volumes; sensible hotplug support; "rescueing" streams to another
audio device, if you accidentaly pull your usb cable; network support;
... the list goes on and on and on. Also, ALSA is Linux specific
(though personally I think this doesn't really matter)

Gustavo brought up the issue that PA "hogs" the sound device. Sure we
do. The idea is having everything go through PA, so that we can treat
everything the same. However, since there are some APIs that are
notoriously hard to virtualize (e.g. OSS with mmap) and some areas
where you don't want the extra context-switching PA adds (pro audio,
for now), there's now a tool called "pasuspender" which when passed a
command line it will execute that, but before doing so suspend PA's
sound card access and afterwards resume it again. So, prefix your
"quake2" invocation with "pasuspender" and everything should be
fine. Also, we now close all audio devices after 1s of idle time by
default. We do this mostly to save power. However this also has the
side effect of releasing the audio device quickly for other apps. The
drawback of course is that many sound cards generate pops and clicks
everytime you open/close the device (some intel hda for example), but
that can probably be worked around in the drivers (according to
Takashi) and I guess you cannot have everything at the same time, so
power saving is more important for now. In practice you probably
shouldn't notice PA's presence at all -- unless you try to play a ALSA
stream to hw:0 and a PA stream at the same time. And last but not
least, we have been shipping a PA plugin for libasound for a while
now. It's enabled by default in F8 and redirects all ALSA audio to
PA -- unless some borked app hard codes "hw:0" as device name.

Regarding Flash and PA: As Bastien pointed out, in F8 we ship a plugin
for the flash player which makes it compatible with PA. With that
plugin Flash and PA are perfectly compatible.

Gustavo repeatedly brought up the compatibility with current
(closed-source) stuff: PA is also "the compatible sound server". We
provide compatibility with OSS, ALSA, ESD, GST, LIBAO, Xine, MPlayer,
... (in various degrees, but mostly pretty high-quality). Right now
Quake2 is the only relevant app I know that doesn't really work on top
of PA, but for those cases we have pasuspender. Basically, I think
this is a non-issue these days. And for almost all of the remaining
apps we have compat problems with, we can fix our compat layers for
them. Most of the time the applications are misusing the APIs, but
we're happy to try to add the necessary stuff to out compat layers to
get them working with them.

Regarding hardware mixing support: this is bullshit. You know, a while
back all sound cards had wavetable stuff built in hw. And then this
became obsolete - because it could be done with less effort and
without problems in software, with faster CPUs. Then, there where MPEG
decoder cards which soonishly became obsolete -- because it could be
done with less effor in software, with faster CPUs. And then, some
vendors added hw mixing to their cards. But that was 6 years ago -- if
you look at current sound card designs (HDA) you'll notice that they
only support a single stream. They are high-quality but very
feature-limited DAC. HW mixing is dead technology, it's out of
fashion, made redundant by stuff that nowadays is available in the
CPU: MMX, SSE. Using hw mixing imposes a greater burden on your USB,
PCI busses, might generate more IRQs. The place to do mixing is
nowadays the CPU -- it's one of the reasons MMX, SSE where added to
the CPU in the first place. Accelerating mixing in hw is really not
what you want to do these days. But, if you really insist that you
want to use this obsolete technology in your sound system the you're
welcome to send me a patch or add a module to PA. But honestly, the
next one who comes up with the hw mixing issue should please do his
homework and read up what happened in sound card design in the last 10
years, thank you very much. Asking for hw mixing in PA is like asking
for support for MPEG decoder cards in GST.

Also, never forget: PA does much more than just mixing audio. That's
just the tiniest part of it.

Gustavo then played the latency card: yes, PA increases the latency
over direct hw access. But so does dmix, because it enforces fixed
fragment settings for all apps. What you really want to do (which
however right now is only partially implemented in PA) is allowing
per-stream fragment settings, by scheduling audio based on timer
interrupts instead of sound io interrupts (based on fixed fragment
settings). Those timer interrupts can be dynamically changed so we can
change the wakeup points dynamically during playback without too much
effort. However this needs some kind of kernel support (hrtimers,
HPET), which only has become available very recently and on x86 only
(not even amd64 yet), so until we get this fully implemented a few
months will pass. If we have that however, we basically get the same
PCM pipeline that Vista and MacOS have: a huge mixing buffer managed
by a real-time userspace sound server which allows rewriting at any
time and notifying clients dynamically, scheduled via timer
interrupts. In essence, in the long run we really *need* something
like PA, if we want to provide low latencies (i.e. short fragments ==
frequent interrupts) and low power consumption (i.e. few interrupts ==
huge fragments) at the same time and switch between them
dynamically. Yes, right now, PA increases your achievable latencies a
bit (but just a bit), but in the end we *need* a process that does the
audio scheduling based on timers -- something that PA will then do. Of
course, PA doesn't fully implement yet, which is partially PA's fault
and partially the kernel's fault that sucks when it comes to timers,
right now. We're getting there.

Then, Gustavo played the stability card: Yepp, sure, PA is relatively
new code. But I mean, esd is more than ten years old these days. And
you'd call it stable? Come on! PA is stable enough for inclusion in
F8, and it is actively maintained. And that should be all that
counts. Oh, and sound is not really life-depending, is it? If you
lose audio on your desktop all you lose is a bit of background music,
it's not that PA eats all your files for breakfast. The "stability"
argument is just a trick to disallow innovation.

Gustavo, PA in F8 is very much different then PA 0.9.6. As suggested
by Matthias, please try it in F8. You know, Gustavo, that RH did a lot
of work on PA before we included it in F8, to make it seamless and as
bug-free as possible? Sure there might be an issue left here and
there. But that's in every software.

So, to the next big technical issue Gustavo found in PA: he thinks its
developers are stubborn. Thank you very much, Gustavo, I love you
too. Maybe it is you that is stubborn here, with spreading all this
FUD?

(Just as a side note: do you know that Takashi, the upstream ALSA
maintainer also maintains PA in Suse? Maybe you're more Catholic than
the Pope in your insistance on ALSA dmix?)

Regarding CPU load: the version of PA that ships in F8 uses exactly
0.00% CPU when idle -- unless some stupid app polls for the volume all
the time, which might raise it a bit -- but that should be fixed in
the app.

Frederic still loves ESD. ESD is bad, in latency, in features, in
code, in everything. I am not sure if you, Frederic, noticed that ESD
only supports 2ch, 16bit, 44khz audio. Have you noticed all those 5.1
sound systems popping all around you? Have you noticed that everyone
hates esd? And that the most well known trick to get your audio
working on your Linux desktop is called "killall esd"? Noone wants to
maintain ESD -- do you? There are just so many reasons why ESD should
be obsoleted... Dude, the next one who seriously suggests ESD as our
path to the future in desktop I audio I will personally buy a ticket
for a time machine, so he can fast-forward for 10 years or so and join
the rest of us in 2007!

Regarding cross-desktop support: I personally don't care too much
about KDE, but apparently you can set it up just fine like described
here: http://pulseaudio.org/wiki/PerfectSetup Xine (which I think is
what amarock -- or whatever that awful media player everyone but me
loves so much is called -- uses for the hard stuff) also ships a native
PA driver.

Ronald, you say: "Userspace daemons are out." This is completely
bogus. Just have a look on other OSes. Like MacOSX, like Vista. One of
the new Vista features is the new "userspace sound system". In Unix
nomenclatura this translates to "daemon". A user sound system is the
way it needs to be, it's the way the systems do it which currently
ship more powerful and useful sound systems then we do. As mentioned
earlier, the PCM pipeline you really want is one RT thread per device
that drives all streams based on timers, not on IO IRQs, managing a
large, rewritable playback buffer.

HW mixing is dead, and the lock-free magic dmix does is not really
powerful enough for what is required from a sound system these days.

PA is an implementation of the aforementioned ideal audio server
design. (Not complete, as mentioned above, though).

This is a very good read about the design of CoreAudio, and basically
does what we want to do in PA as well.

http://developer.apple.com/DOCUMENTATION/DeviceDrivers/Conceptual/WritingAudioDrivers/AudioFamilyDesign/chapter_3_section_3.html#//apple_ref/doc/uid/TP30000731-CJBIDABE

Ronald, you claim: "sound daemon is the right solution _only_ for
networked audio". This is also bogus. There's a lot of stuff you want
to do in a sound server. For example: policy decisions like "everytime
I plug in my USB headset in I want all voip playback streams to
automcatically switch to it, and everytime i start my voip app i want
its stream to go through the usb headset". Then, doing all this kind
of "compiz for audio" stuff. For example, what I will probably make
available in PA pretty soon is the ability to do "spacial" event
sounds, i.e. if you press a button on the left side of your screen its
event sound goes out of the left speaker, and vice versa. Or stuff
like automatically sliding down the volume of all windows that are
currently not in the foreground. (i.e. you start two totems and only
the one in the foreground is at 100% volume, the other one at 30% or
so. And when you switch windows the volumes automatically slided to
the opposite). Right now, PA basically just provides the
infrastructure for these kind of things, but after the groundwork is
now done, I can now focus on the "earcandy" part.

In short: there are both user-visible (like these effects, moving
streams between devices, per-stream volumes) and technical (doing
low-latency and low power-consumption at the same time) reasons why a
userspace sound daemon is the way forward.

Ronald, the "alsa-plugin" ships a OSS backend, just as a side note.

Regarding GSmartMix: some parts of gsm live on, like the the new
sound preferences dialog which allows per-class devices and stuff. The
problem I saw with gsm is that it was limited to GST. And yeah, not
all apps use GST, and many apps never will. I hope to work with
Marc-André to get the remaining ideas of gsm into PA, as soon as I
export the necessary meta information for all streams in PA.

Ronald, in a way PA is just a reimplementation of dmix. You can
autolaunch it via libasound, and you shouldn't notice much of a
difference, except that you suddenly can do device aggregation,
per-stream volumes with just a few clicks, and so on.

Jan: dmix doesn't involve a daemon anymore. They now do some
atomic ops magic of mixing everything lock-free with a single mix
buffer and a couple of saturation buffers. It's a technically
brilliant solution, though probably not the best for your CPU
caches, and it falls back to locking mode on multicore and non-x86.

Gustavo: PA by default uses pretty large playback buffers which apps
can rewrite at any time. This is the very definition of what MS calls
"GlitchFree", and is the way to go to provide never-drop-out
guarantees and quick reaction when seeking. We don't really pass those
large buffer down to the hw yet, but that's mostly because of the
hrtimer mess mentioned above. PA in F8 should not drop out, unless
you configure it manually to some strange settings. If you ship a
shitty HZ=100-with-no-preemption kernel, then yes, this increases the
chance of a drop-out. But really, if you want to shoot yourself in the
foot then go for it, but don't blame PA for it, don't do it the ESR
way. In any reasonable setup PA shouldn't drop out.

The way forward, to get something like "GlitchFree" on Linux is called
"PulseAudio", and in contrary what you are claiming, ALSA dmix is not.

Gustavo: as I tried to make clear above the way to go is a userspace
sound server. And we have that, then it's perfectly fine to do network
support in it as well.

And again: no modern sound card supports hw mixing anymore. That's the
past, get over it.

Gustavo: OSS is only dead -- as an implementation of a kernel sound
system (though some people from 4front might even claim the contrary
here), OTOH it is very alive -- as an API, and (unfortunately) it is
going to stay around for a long time still. It's a much smaller API
then ALSA, and portable and used in a lot of commercial apps. That's
why we support it for compatibility in PA.

Regarding RT support in PA: Right now on F8 rt for pa is not enabled
by default, due to security. I'd really love to enable it by default,
which we could do if we had a safe process babysitter daemon which
would supervise PA and is running on a higher rtprio than
PA. Hopefully eventually someone will replace init/gnome-session which
something which can babysit processes very well, and this thing should
then do rt-supervising as well.

Also, contrary to what Gustavo says, you don't need to be root to do
RT, all you need is RLIMIT_RTPRIO set to something > 0.

Regarding event sounds: Yes, I disable them too by default, I think
everyone reasonable (except davidz, maybe :-)) does that. But why do
we do that? Partly because the sounds we have right now in GNOME suck
big time and are annoying like hell. And partly, because they are
truggered far too often. If you ever used a MacOS machine you probably
know that the event sounds there are lot more subtle and ... useful. I
can think of a couple of places where sound events make a lot of
sense, if they are high-quality:

- when you get an email a human voice should say something like
"You've got mail", instead of some stupid "ding" sound noone knows
what it means.

- when long-running actions complete you might also want a human voice
saying "CD burning finished", or "downloaded finished".

- For incoming IMs you should have a subtle "ping" sound. Having a
human voice everytime probably is too much, given their frequency.

- Some UI actions like workspace swiutching/fast-user switching, and
minimizing/maximizing might be good candidates for event sounds too.

So basically, what I try to say is: just because current sound events
suck, there's no reason they *have to* suck. I hope someone will
eventually give the sound theming spec another shot and provide us
whith more useful, internationalized default sound samples.

OK, so much about defending PA. I hope I answered to every single
question, comment, FUD spread. If not, just give me a ping!

So, where do we go from here?

At the Summit and internally at RH we discussed a little how we should
go on with PA and GNOME. So, here basically what I plan:

There are basically three areas where GNOME currently interfaces with
PA via compat layers only and where we should replace the relevant
code with something newer:

1. Currently esd is explicitly started via gnome-session. In F8 we
provide a compat script called "esd" that starts up PA. So,
g-s thinks it starts esd, while it actually starts PA. This is OK,
but this hard coded dependency on a binary called "esd" should go away. Instead PA
should be started via XDG autostart or suchlike. This would require some serializing of
sound events to fix the race we get when one app wants to play a
sound event and pa is not fully started yet. Not too difficult. This
removes the hard dep on ESD doesn't even replace it with a PA
specific one. Gustavo, Ronald, I hope you rejoice?

2. Sound events are generated directly via libesd from libgnome. This
hard dep sucks as well. What I propose instead is this: I will
introduce a new sound event API called "libcanberra", which is
intended to be cross-platform, cross-toolkit and well-supported on
PA. It basically exports just a single variadic function:

cbr_play(c, id,
CBR_META_ROLE, "event",
CBR_META_NAME, "click-event",
CBR_META_SOUND_FILE_WAV, "/usr/share/sounds/foo.wav",
CBR_META_DESCRIPTION, "Button has been clicked",
CBR_META_ICON_NAME, "clicked",
CBR_META_X11_DISPLAY, ":0",
CBR_META_X11_XID, "4711",
CBR_META_POINTER_X, "46",
CBR_META_POINTER_Y, "766",
CBR_META_LANGUAGE, "de_DE",
-1);

If that function is called, the caller should pass as many
properites as possible. Then, libcanberra will try to find the right
sound file for this event, and contact the sound server for
playback. The meta information is passed: to do transparent i18n, for
a11y, for sound effects (i.e. the spacial sound effects I mentioned
earlier with the POINTER_X and POINTER_Y props).

(In reality the API will probably have a couple of more functions,
for cacheing, and for predefining properties so that you don't have
to specify them for each event again. So maybe 5 functions or so.)

As soon as I have a version of this library I will write a small
module for gtk (the kind of you can load into every gtk app with
--gtk-module) which will basically do what libgnome currently does:
hooking into a couple of signals -- but instead of direct calls to
libesd it will call the aforementioned libcanberra function with the
appropriate parameters.

Advantages: suddenly sound events work for non-gnome apps (i.e. only
gtk-using apps) too. We can remove yet another part from libgnome,
and last but not least, yet another hard dep on ESD is gone, and
not even replaced by one on PA. Not even libcanberra becomes a hard
dep of Gtk. Gustavo, Ronald, this is where should rejoice, again.

3. Mixer APIs. There are thre mixer control tools right now: the OSD
that is shown when you press your volume-up/volume-down keys; the
mixer applet; and gnome-volume-control. The OSD is supported fine
through gst-pulse (our rocking PA plugin for gst), but for the
applet and the standalone mixer i'd like to see a replacement. Right
now both use the gst mixer abstraction API, which only exposes a
very limited set of what our PA mixer can do and which quite frankly
is a big mess. We'd have two options here: fix the gst mixer api, so
that it exports the whole functionality that PA offers. Or, just
make the mixer depend directly on the PA libs. I'd vote for the
latter. Why? Because abstraction APIs in most cases suck, and
especially if a large part of the API is only implemented in a single
backend (which would be PA). That's why in F9 we will probably drop
g-v-c and replace it with pa's specific mixer tool called
"pavucontrol", that we already ship. (I mentioned this already
above). So, what I'd like to see is that pavucontrol could become a
part of GNOME proper eventually, and for that to work PA would need
to become a blessed dependency. While I see not much worth in
developing two volume control tools in parallel, we could even keep
g-v-c around for those who prefer to stick with their bare
90s-style audio systems. (Ronald, Gustavo, that's
again where you should rejoice). The question of course remains,
which mixer app to maintain in GNOME. My own pavucontrol is quite
featureful, but I think it's not the best thing UI-wise (though some
people seem to disagree with me -- and do like it). I'd be happy if
someone would pick this up. If noone picks it up, I will probably hack up some
pa-specific applet and stick it together with pavucontrol in GNOME
SVN, and then suggest it for inclusion into GNOME proper.

So far my plans. When we have dealt with these three issues, GNOME
should work fine on both PA and without PA. Will take some time to
implement them all. But I hope that even people like Gustavo and
Ronald can live with it.

Oh, and I hope that my comments on Gustavo's and Ronald's position
didn't sound too harsh. It's just that I consider your positions
badly-informed and a bit FUDish, it's not intended to be personal.

Any questions?

Yours,
the stubborn Lennart

--
Lennart Poettering Red Hat, Inc.
lennart [at] poettering [dot] net ICQ# 11060553
http://0pointer.net/lennart/ GnuPG 0x1A015CC4

Follow-Ups:
- Re: Pulseaudio
  - From: Jeff Waugh
- Re: Pulseaudio
  - From: Gustavo J. A. M. Carneiro
- Re: Pulseaudio
  - From: Olav Vitters
- Re: Pulseaudio
  - From: Federico Mena Quintero
- Re: Pulseaudio
  - From: Frederic Crozat
- Re: Pulseaudio
  - From: Thomas Vander Stichele

References:
- Pulseaudio
  - From: Matteo Settenvini

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]