EsounD stuff. was[Re: midi player?]



Hello, again.

I think you have the meaning of client and server reversed in 
the following discussion.  The server is the EsounD daemon (esd) 
itself.  The server handles the requests made by client processes.  
The clients are sound players/mixers that make requests of the 
server.  I shall make the appropriate substitutions in my reply.

Zack Williams wrote:
> 
> There are basically 3 types of computer sound formats:
> 
> 1. Short Clip / Uncompressed - These are wav, au, and other such formats. Good
> for anything less than 5 seconds, gets to big above that size. Good for alert
> sounds and other such things. Almost no processor or memory overhead.
> 
> 2. Long Clip / Compressed - These are mp3's and other compressed sound mediums.
> These are good for longer sounds and music files. These usually have high
> amounts of processor overhead, because of the decompression involved.
> 
> 3. Event based / Sampled - These are midi and mod sounds. To play these a
> piece of hardware or software has to be used to convert the events and
> samples into coherent output. These are also ussd only for longer sounds

1 & 2, ok.
3, I'm not entirely certain as to the mod format.  MIDI data, however, is not
a digital sample of the audio spectrum.  It consists of events which determine
what notes to play, what effects to apply to those notes, how much effect to 
apply, etc.  MIDI is more of a specification of what to play, instead of the
actual digitized audio signal.  Special software/hardware is required to convert
that specification into a digitized audio signal.  This is typically performed 
by hardware on the sound card itself, although Timidity (?) is a software 
MIDI renderer that creates a digitized waveform as its output.  I believe it is
beyond the scope of ESD to handle converting a MIDI event stream to a digitized 
waveform.  Aspects of MIDI such as pitch shifting, volume, pan, and mod effects 
would drastically increase the CPU load of the server, as well as introduce even
more timing issues on the server side.

> EsounD currently supports 1 very well becaues it allows for the storage of short
> clips on the [[server]] end to be played back at will. It does not support 2 as
> well because the sounds are decompressed on the [[client]] and then sent as a stream
> of uncompressed sound data, which puts a greater load on the [[client]] and the
> network, but still works ok. Type 3 sounds are supported in much the same way
> as type 2 sounds, and have the same network bandwidth and server load drawbacks.
> Also, EsounD currently ignores the fact that there exist hardware midi and mod
> renderers.

Of the two type 3 sounds, I believe neither are supported directly.  As I said
above, I believe it is beyond the scope of ESD to handle MIDI->DSP processing.
Therefore, I claim that ESD is perfectly aware that hardware MIDI renderers exist,
and defers all such processing to them.  A MIDI->DSP software renderer is free
to play the DSP stream to ESD, and have that audio played by the server.
As to the MOD format, there is experimental support for ESD in at least one MOD
tracker.  The support will be finalized as ESD stabilizes.  Again the MOD is 
converted to DSP format, and then played by ESD.  I would need more information 
regarding the MOD format and interfaces for hardware MOD rendering in order to 
determine whether MODs should be handled directly by ESD.

> Proposed solutions:
> 
> Allow for pluggable [[server]] side backends
>   Allow the user to plug in whatever program he/she wants that does the sound
>   rendering. This way we get the load off of the [[client]] and the network. mod and
>   midi sound rendering hardware can then be used. If the client cannot handle
>   the load of a certain renderer (ex. mp3 on a slow 486), the renderer could still
>   be done in software on the [[client]].  There are also plenty of backends out there
>   (just look at the number of frontends availible for mpg123 and timidi)
> 
> Compress the sound coming across the network:
>   This would be good for type 1 and 3 (especially mod samples), but would be
>   counter productive for type 2 sounds. Anything that uses EsounD as a simple
>   /dev/dsp should be compressed also.

Pluggable backends require some form of mapping from a client stream
type, to a decoding engine to use.  This would require additional logic
in the protocol handler, as well as some more intelligence in selecting
a "translation" function.  Currently esd_players have no translation type,
all are assumed to be copy only.  It should be fairly easy to add new 
"modes" and associated player types (see the server source).  Adding a 
generic interface such as this would require some more thought...

> I admit that the "one sound stream per channel" idea really sucked. I was
> thinking in terms of hardware that only supports a single midi or mod
> output stream at once. On the AWE card, I know you can have normal sound
> going out /dev/dsp, while simultaneously having midi's playing on the
> AWE chip. In the case of hardware accelleration support, if more than one
> sound format that uses that form of acceleration needs to be played, a software
> renderer could be used, and mixed in with everything else. I am kicking
> myself right now for not thinking of this earlier.

That's pretty much what ESD is for. =P  So that multiple processes attempting
to play sounds are converted into a single stream for the audio device.

> I looked at midid and NetMidi. They both basically allow the bidirectional
> transmission of midi commands over a network to other midi devices, and
> bridging using a network instead of midi cabling, whereas EsounD is intended
> to provide sound playback of recorded sound files to an output device, a
> completely unidirectional task.

Actually, recording (full duplex) and monitoring are possible with ESD, so
it too is a bidirectional entity.  I'm not touching MIDI.

> In short, the server should conserve as much network bandwidth and proccessor
> time as possible. 

These are contradictory goals.  Better compression means less network bandwidth, 
but more processor usage.  No compression means the network is abused for the 
benefit of CPU usage.  Ideally, you would be able to decide which end you want
to abuse.  Got a 486 playing sounds?  Save the CPU, kill the network.  Got SMP
PentiumII-400s on both ends? save the network, and abuse the CPUs.  This is the
fundamental tradeoff of a distributed system.  The key is being able to tune
the usage to your hardwares capabilities.

> Think about this - streaming a 44khz 16bit sound across a
> network requires about 150kbyte/sec, as compared to 128kbit/sec for an mp3 file.
> It is at least 8 times more efficent to simply stream the file instead of a sound
> stream. 

This is only the network side of the coin.  mp3s are also CPU intensive
to decode.  Compare the number of flops necessary to decode one minute of 
mp3 encoded audio to the number for a direct DSP sample.  Many, many, more.

Also, clients may be located on the same machine as the server, in
which case compression would be undesirable, and the use of a shared 
memory segment would result in zero transfer overhead.  Colocation 
optimizations such as this are also are not yet implemented.

> Even better, a midi file is usually under 50kbytes and many mods are less
> than 300kbytes, so streaming the file instead of sound data would be even more
> efficent in those cases. This is why I think that we should avoid having the
> renderers on the [[client]] at all costs.

See above for my position on MIDI.

> # Zack Williams  zdw@u.arizona.edu  http://www.u.arizona.edu/~zdw #

I'm informed that ADPCM is a lossy compression algorithm that is well
suited to real time audio streaming.  At a minimum, an ADPCM mode 
should be added to ESD to reduce network bandwidth from continuous 
streams.  I can see the time has come to put the EsounD position and 
vector (where it is, and where its headed) into a tangible form for
greater discussion.  Once some more debugging is done, I will focus 
on such documentation for a while.  Any other thoughts?

-- ebm
+=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=+
|  __                         a.k.a. Eric B. Mitchell |
|  |_) .  _  _|      _|  _     ericmit@ix.netcom.com  |
|  | \ ( (_ (_| (_| (_| (/_   www.netcom.com/~ericmit |
| How's My Programming?   Call:  1 - 800 - DEV - NULL |
+=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=+



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]