Re: Fast factor 2 resampling



   Hi!

On Wed, Apr 05, 2006 at 02:05:37AM +0200, Tim Janik wrote:
> On Tue, 4 Apr 2006, Stefan Westerfeld wrote:
> >>ok, ok, first things first ;)
> >>
> >>as far as i see, we only have a couple use cases at hand, supposedly
> >>comprehensive filter setups are:
> >>-  8bit:  48dB
> >>- 12bit:  72dB
> >>- 16bit:  96dB
> >>- 20bit: 120dB
> >>- 24bit: 144dB
> >>
> >>if we have those 5 cases covered by coefficient sets, that'd be good 
> >>enough
> >>to check the stuff in to CVS and have production ready up/down sampling.
> >
> >Yes, these sound reasonable. Although picking which filter setup to use
> >may not be as easy as looking at the precision of the input data.
> >
> >For example ogg input data could be resampled with 96dB coefficients for
> >performance reasons, or 8bit input data could be resampled with a higher
> >order filter to get better transition steepness.
> 
> but that'd just be another choice out of those 5, other than the obvious 
> one.
> or am i misunderstanding you and you want to point out a missing setup?

No, I was just pointing out to the fact that choosing from these 5
should not (always) be automated for datahandles. In the plain C API,
this means that we now have

/* --- resampling datahandles with the factor 2 --- */
GslDataHandle* bse_data_handle_new_upsample2 (GslDataHandle *src_handle, int precision_bits);

instead of the old API

GslDataHandle* bse_data_handle_new_upsample2 (GslDataHandle *src_handle);


But actually there is a case that is not covered very well (and can not
be covered with the code designed as it is right now), and thats
resampling files with small sample rate. When it comes to that, the
aliasing area that I've designed into the inaudible area (22050-26100 Hz
if we're resampling 44100 Hz recordings) moves down into the audible
area (for instance 11025-13050 Hz upsampling a 22050Hz recording with
factor 2).

If that is a problem we need a non-halfband implementation as well (see
below for more cases where we need that), which is not too hard to write
but may be significantly slower (factor 2 or so).

> >>then, if the octave files and the paper you pasted from permit, it'd be 
> >>good
> >>to put the relevant octave/matlab files into CVS under LGPL, so the
> >>coefficient
> >>creation process can be reconstructed later on (and by other 
> >>contributors).
> >
> >I've asked the author of the paper, and he said we can put his code in
> >our LGPL project. I still need to put some polishing into the octave
> >code, because I somewhat broke it when porting it from matlab to octave.
> 
> ugh. i think putting just the link to the paper into our docs or the code
> comments would be enough, give it is publically available. but we can mirror
> it on site if we got permission for redistribution and there is reason to
> believe the original location may be non-permanent in any way.

I was't speaking about the paper, but only about the octave/matlab
source code given in the paper (for ultraspherical windows). He allowed
us to redistribute this, and thats what I want to do, so that everybody
can reproduce how we designed the filter coefficients.

> 
> >>>Linear phase filtering means three things:
> >>>
> >>>* we do "real interpolation", in the sense that for factor 2 upsampling,
> >>>every other sample is exactly kept as it is; this means that we don't
> >>>have to compute it
> >>>
> >>>* we keep the shape of the signal intact, thus operations that modify
> >>>the shape of the signal (non-linear operations, such as saturation)
> >>>will sound the same when oversampling them
> >>>
> >>>* we have the same delay for all frequencies - not having the same
> >>>delay for all frequencies may result in audible differences between
> >>>the original and up/downsampled signal
> >>>
> >>>  http://en.wikipedia.org/wiki/Group_delay
> >>>
> >>>gives a table, which however seems to indicate that "not being quite"
> >>>linear phase wouldn't lead to audible problems
> >>
> >>ok, thanks for explaining this. we should have this and similar things
> >>available in our docuemntation actually. either on a wiki page on 
> >>synthesis,
> >>or even a real documentation chapter about synthesis. thoughts?
> >
> >Maybe a new doxi file on synthesis details? I could write a few
> >paragraphs on the resampler.
> 
> that'd be good. will you check that in to docs/ then?
> one file that may be remotely suitable is:
>   http://beast.gtk.org/architecture.html
> but it's probably much better to just start synthesis-details.doxi.

Will do.

> >Oversampling is first upsampling a 44100 Hz signal to 88200 Hz, and then
> >downsampling it again to 44100 Hz. Its what I first designed the filters
> >for: for oversampling the engine. Thus I benchmarked it as seperate
> >case.
> 
> hm, i still don't have a good idea if we won't need n-times oversampling
> for the whole engine. basically, because i have a rough idea on what usual
> input output rates are or could be (44.1k, 48k, 88.2k, 96k), but not what
> good rates are to run the synthesis engine at (48K, 56K, 66.15K, 64K, 
> 72K)...

Well, I've at least already thought about accelerated FIR based
implementations: you can accelerate every rate change with a raional
number (P/Q), i.e. 3/2 upsampling or 5/4 upsampling (because you can
build complete coefficient tables, then).

However, the code will be quite a bit slower than factor 2 upsampling
due to two reasons:

 * you can not copy samples from the original signal as often as you
   can do it for factor 2 upsampling
 * for rational rates, you can not use half band filters, thus you don't
   have filters with every other coefficient zero

So in the worst case, we have a performance loss of factor 2 for the
first point and factor 2 for the second point. But of course these are
estimates and can't replace real benchmarking of an implementation once
it exists.

> >>>As you see, the variant which uses doubles for intermediate values is
> >>>not much better than the SSE variant, and both fulfill the spec without
> >>>problems.
> >>
> >>have you by any chance benched the FPU variant with doubles against the
> >>FPU variant with floats btw?
> >
> >Well, I tried it now: the FPU variant without doubles is quite a bit (15%)
> >faster than the variant which uses doubles as intermediate values.
> >
> >If you want *really cool* speedups, you can use gcc-4.1 with float
> >temporaries -ftree-vectorize and -ffast-math. That auto vectorization
> >thing really works, and replaces the FPU instructions with SSE
> >instructions automagically. Its not much slower than my hand crafted
> >version. But then again, we wanted a FPU variant to have a FPU variant,
> >right?
> 
> erm, i can't believe your gcc did that without also specifiying a
> processor type...

Well, I never have to specify a processor type, because my gcc only
supports one: native AMD64 code. But you are of course right in the
sense that my gcc always produces code which is perfectly optimized to
the processor it will run on, and my gcc always knows about all
instructions my processor supports and so on.

> and when we get processor specific, we have to provide alternative
> compilation objects and need a mechanism to clearly identify and
> select the required instruction sets during runtime.

I understand why you don't want to support _many_ object files per
algorithm. However, it may be reasonable to support _two_ object files
per algorithm, one compiled with -msse -ftree-vectorize and one without.
This also needs to be done for Bse::Resampler. It may be reasonable to
do it for common algorithms at least, like scaling a float block with a
float value or adding two float blocks together.

And we need a runtime check for checking whether SSE is available. But
thats not a problem, because there is one in arts/flow/cpuinfo.* we can
simply copypaste.

   Cu... Stefan
-- 
Stefan Westerfeld, Hamburg/Germany, http://space.twc.de/~stefan



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]