Gnome Audio Architecture and EsounD

Fredrik Ohrstrom,

Hi there, I saw this bit of information on one of the gnome sound
architecture pages, and wanted to know if you have seen the 
Enlightened Sound Daemon (EsounD) yet.  Details may be found on 
the web at .  Basically, it's 
an audio server along the lines of what you have suggested for 
gnome, and is currently in the gnome cvs tree as the module, 
"esound".  I'd very much appreciate any feedback and comments you 
may have on it.  If there is someone more appropriate for dealing 
with gnome audio issues, please feel free to forward this to them.
I have also CC'd this to the gnome mailing list, for additional
response.  The following is a point by point "response" to your 
ideas as expressed on the sound architecture page.


[snip background info]

> * Networked sound

> This is a necessity. We have had networked graphics for a looong time
> which is probably much more difficult to implement. Networked sound is
> slightly more sensible to network lag, but is in my opinion still
> easier to implement.  There are a lot of users running linux as
> x-terminals that would like to have sound support and I often use
> remote programs.

EsounD handles networked sound.  Once you go with sockets as an IPC, 
networking is a matter of a few extra arguments.  I think you mean 
"sensitive" to network lag above, and that's definitely a concern.
I claim that anyone relying on instantaneous playback of a sound from
a remote machine should rethink their design.  However, expecting a
reasonably short response time over a 10baseT ethernet network should
not be unreasonable.

> * The sound should be managed by a sound server

[snipped driver level issues]

I think the low level driver interface is handled well by the kernel,
as is.  I run the (non-free) OSS driver for my SB AWE 32, with no
problems as a loadable module.  I anxiously await the day when said 
driver is included on the installation media with a new sound card.

> Of course some support in the kernel is necessary
> for example to support switching of virtual terminals so the sound is
> related to a terminal and not global.

This may be more confusing than a global audio device for the machine.
I may be watching a build on a different VT than one I'm checking mail
in, for example.  However, I could wait for an audio completion signal 
from the make process, without constantly switching VTs, too.  

> * The sound server also handles multiple clients

> Today only one user at a time can write to /dev/dsp. There have been
> several suggestions to solve this. A sound server would be designed
> from the beginning to serve several clients just like the Xserver
> serves several programs. I believe that a sound server is a cleaner
> solution.

EsounD handles multiple client connections, and sends their audio
to a single /dev/dsp device.  A kernel mod to handle multiple /dev/dsp
connections may also be a reasonable solution, but would require a lot
more thought to implementation.  Networking would have to be handled 
via a proxy in this situation.

> * Base it on the X attitude of low round trip numbers

[snip comparison of streamed to "sampled" sound"

Esound handles both audio streams, and "caching" samples within the
server for playback via ID number.

> For some reason NAS and Xaudio have concentrated on streaming audio.

NAS also supports storing samples in "buckets" within the server.

[snip inapplicability of Xaudio for games]

The cached sample approach should be adequate for local, and local 
networked games.  For something running on a *distant* server, over
a modem, You'd still be looking at a noticable packet delay.  I 
suspect that if you're getting reasonable feedback from your Xserver,
you would get reasonable feedback from the audio server.

> Also other appliances is for synth programs
> and sequencers which, just like games, require some kind of realtime
> approach. 

Synth programs require much more information to generate the
appropriate sound, waveform, attack, decay, sustain, release, etc.
Also, pitch shifting with any accuracy is also a CPU intensive 
event, and should be avoided, IMHO.  Preprocessing with Timidity,
or a Mod player, then streaming the result to the server would
work better, and easier.  However, EsounD can support multiple 
"types" of players, so if it can be done in a CPU friendly manner,
it can be added.  I'm hesitant to add it in at this time.

> 1> So I propose that the simplest sound server should work like a
> remotely controllable modplayer. We can have a table of samples on the
> server side. The samples have names just like colors on the Xserver
> has names. (Yep, it sure looks like we should take a look at the
> general midi tables to design these.) Also some certain sample names
> are good to have like: Error, Ok, WaitAMoment, Start, Stop

EsounD has the ability of caching samples, they are played back
by ID number.  Naming capabilities will be added shortly, although
"well-defined" numbers for common desired sounds require less 
interpretation on the part of the server.  Maybe define an enumerated
type to handle the common sounds?

> We connect to the sound server and request that a certain  sample
> should be played at a certain time, volume, pitch, panning. 

Volume and panning of samples and streams will be added shortly.  
Adding Time to the mix will only confuse things on the server side,
which will have to wait to play sounds.  I suspect *most* sounds 
will be of the "play right now" type.  Samples will also be able to 
be linked together, so that one will finish before the next is begun.

> With this
> we can do quite a lot of things and it cuts away  network traffic
> since its not streamed audio but instead the sample names which are
> much smaller and a playing sequence can be buffered. Of course the
> play sample command can skip the timing and request that it should be
> played as soon as it arrives at the server. This would be used for
> effects and the  buffered,timed approach for music.

> 2> When we got this working we can add the possibility to  upload
> clientspecific samples. Just like the XCreatePixmap creates a pixmap
> that resides on the server and can be used several times. Now the
> server is getting really useful. 

EsounD currently supports only client specific samples.  Adding 
a few presets would not be a difficult task, but would require 
some configuration data.

> 3> The audio server should also be able to control the mixer. This
> sort of corresponds to the window manager. It manages different
> volumes from different devices and different programs.  

Volume mixing within the server can be handled by a control socket.
Naming of the streams/samples would make distinguishing between
samples and streams much easier.

> 4> Then we can add support for streaming data over the network or
> through shared memory. Getting sound back from the microphone could be
> interesting. Perhaps there should be a speech  to symbol converter in
> the sound server itself. 

Network, done.  Shared memory would be *very* useful, but I haven't
had time to add it in yet.  I currently consider it an optimization,
and have put off such until later.  Recording is implemented, and full
duplex recording "works for me".  Simply request the input stream, and
you get it.  Also a monitoring stream is implemented, so you can do
eye candy tricks with the mixed waveform.  Speech to symbol converter?
Sounds like a client program, read the microphone, translate, output.

> 5> Remote control of music cds is probably also interesting. The
> protocol is language independent just as X The sound server is most
> likely written i C and also the first sound access libraries. Then we
> can use these raw or build a wrapper like GDK around for later use
> higher up in the  hierarchies like GTK.

Remote control of CDs is currently considered beyond the scope of 
> * The protocol can be extended just like the X protocol.

> In the beginning though it is more likely that the protocol changes
> before it stabilizes. A normal modplayer would only require  a simple
> protocol but if we use Timidity as a base we could use some sort of
> modified midi over the wires. Whatever. 

> Timidity can be a little bit heavy on the processor so the server
> might only be able to support certain features corresponding to
> visuals) attack/decay part of sample, sustain part of sample, reverb,
> panning, number of simultaneously  playing samples and so on.

The protocol is *very* touchy at the moment, but I plan to add
some robustness to it.  Again, I claim the processing required 
for midi to dsp conversion may be prohibitive. Piping timidity
output (perhaps through a compression algorithm like adpcm)
would be a much cleaner solution.

> I have not spoken of FM modulated music because it is boring. But
> maybe someone wants to write a sound server for an adlib card that
> does its best (or worst) to simulate the different standard
> instruments with the FM synth.

I leave this area in the same land as direct MIDI support.
Currently beyond scope.

> Just my 2 cents. Please drop me a line if you have comments, want to
> code this or have coded this already! :-)

Have coded much already, you interested in a test drive? The 
xmp mod player has experimental support built in now,, if you're interested in that. Raster
and I ported mpg123 to it (I can send you the source).  Also
simple command line tools are included that can play .wav files,
and someone also has written a generic audio file player, as well.
Programs which already support /dev/dsp output are fairly easy 
to port, assuming the source is available, of course. =P
Feedback is always appreciated.  Look forward to hearing from you.

   Fredrik Ohrstrom

-- ebm
|  __                         a.k.a. Eric B. Mitchell |
|  |_) .  _  _|      _|  _  |
|  | \ ( (_ (_| (_| (_| (/_ |
| How's My Programming?   Call:  1 - 800 - DEV - NULL |

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]