Re: memcpy() in beast profiling



   Hi!

On Tue, Oct 26, 2010 at 06:05:39PM +0200, Stefan Westerfeld wrote:
> During my attempts to make SpectMorph fast, I've used oprofile to determine the
> CPU usage during playback of a small song. It turned out that a significant
> part of the CPU usage was caused by memcpy().
> 
> To determine if it was BEAST or SpectMorph causing this, I removed the
> SpectMorph plugin and replaced it with a BEAST oscillator, so that my
> instrument consisted of one ADSR envelope one StandardOsc, and one Amplifier.
> 
> The amount of memcpy() during playing the song with that beast-only instrument
> was 17.17%. So BEAST spends a lot of time in memcpy(). More analysis revealed
> that this is the cost of virtualization, as BEAST does it:
> 
> Function master_process_locked_node, in bseenginemaster.c:
> 
>       /* catch obuffer pointer changes */
>       for (i = 0; i < ENGINE_NODE_N_OSTREAMS (node); i++)
>         {
>           /* FIXME: this takes the worst possible performance hit to support obuffer pointer virtualization */
>           if (node->module.ostreams[i].connected &&
>               node->module.ostreams[i].values != node->outputs[i].buffer + diff)
>             bse_block_copy_float (new_counter - node->counter, node->outputs[i].buffer + diff, node->module.ostreams[i].values);
>         }
> 
> Since it already has a FIXME, this is probably old news, but I thought it might
> be intersting to have a number how much CPU time this FIXME really costs (although
> this will of course vary depending on what one actually does with BEAST).

I've performed a bit more profiling. Since most of this CPU load is caused by
output buffer poking, I've replaced the output buffer poking in the modules
that are used in my test song with memcpy.  By doing so, I can tell which
modules exactly are causing the CPU load. Here is a list of memcpy activity by
module:

25.88%  sub_oport_process()
25.88%  sub_iport_process()
20.67%  voice_input_module_process_U()
10.32%  voice_switch_module_process_U()
5.16%   simple_adsr_process()
4.83%   context_merger_process()
2.63%   Bse::Summation::Summer::process()
2.63%   pcm_output_process()

There are two patterns that are often used, in these modules. One is passing the input
through to the output without changes (for instance sub_oport_process()). The other is
setting the output to a constant value, like this (bsemidireceiver.cc):

  module->ostreams[i].values = bse_engine_const_values (cdata->values[i]);

Here, the lookup for a const value block seems to be reasonably efficient, but after
the process function is over, the engine will memcpy() the block, which is causing the
CPU load.

   Cu... Stefan
-- 
Stefan Westerfeld, Hamburg/Germany, http://space.twc.de/~stefan


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]