Sample Accurate Reconstruction with PV_RecordBuf and PV_BufRd

Context:
I’m currently working on a synth that will take in a bunch of impulse responses (HRTFs) and apply them to a set of channels for binaural rendering. I could simply use Convolution.ar to achieve this, but it would result in a lot of unnecessary FFTs & IFFTs and is pretty computationally expensive with the amount of channels I’m using. Instead, I want to simply do one FFT per input channel, do the necessary operations in frequency domain, then do one IFFT per ear.

As far as I understand it, the only way this would be possible is if I use the FFT / PV UGens. The solution that I’m looking at is to:

  1. Play back each impulse response (size of one frame) and store the frequency response using PV_RecordBuf
  2. In a separate synth, take the input channel, FFT it, convolve with the previously saved frequency response through PV_BufRd and PV_Mul, sum with the other channels for the respective ear, then IFFT.

Issue:
In the process of trying out the above solution, I’ve noticed that I can’t seem to get my impulse responses properly stored / played back by PV_RecordBuf and PV_BufRd. I seem to be able to get things that are similar, but nothing that is sample accurate.

Here is a code snippet that helps illustrate what I’m talking about. This is a modification of the PV_BufRd sample:

(
var sf;
// path to a sound file here
p = Platform.resourceDir +/+ "sounds/a11wlk01.wav";
// the frame size for the analysis - experiment with other sizes (powers of 2)
f = 1024;
// the hop size
h = 0.25;
// window type
w = 0;
// get some info about the file
sf = SoundFile.new( p );
sf.openRead;
sf.close;
// allocate memory to store FFT data to... SimpleNumber.calcPVRecSize(frameSize, hop) will return
// the appropriate number of samples needed for the buffer
y = Buffer.alloc(s, sf.duration.calcPVRecSize(f, h));
// allocate the soundfile you want to analyze
z = Buffer.read(s, p);
)

// this does the analysis and saves it to buffer 1... frees itself when done
(
SynthDef("pvrec", { arg recBuf=1, soundBufnum=2;
    var in, chain, bufnum;
    bufnum = LocalBuf.new(f);
    Line.kr(1, 1, BufDur.kr(soundBufnum), doneAction: 2);
    in = PlayBuf.ar(1, soundBufnum, BufRateScale.kr(soundBufnum), loop: 0);
    // note the window type and overlaps... this is important for resynth parameters
    chain = FFT(bufnum, in, h, w);
    chain = PV_RecordBuf(chain, recBuf, 0, 1.0, 0.0, h, w);
    // no ouput ... simply save the analysis to recBuf
    }).add;
)
a = Synth("pvrec", [\recBuf, y, \soundBufnum, z]);


// play your analysis back ... see the playback UGens listed above for more examples.
(
SynthDef("pvplay", { arg out=0, recBuf=1;
    var in, chain, bufnum;
	bufnum = LocalBuf.new(f);
	chain = PV_BufRd(bufnum, recBuf, Line.ar(dur:BufDur.ir(z)));
	Out.ar(out, IFFT(chain, w).dup);
    }).add;
);
b = Synth("pvplay", [\out, 0, \bufnum, x, \recBuf, y]);

// Plot 2 frames of the signal
// The PlayBuf plot has been delayed to align it with the PV_BufRd playback
// These signals sound the same, but look like their phase is different.
(
{[
	IFFT(PV_BufRd(LocalBuf.new(f), y, Line.ar(dur:BufDur.ir(z))), w),
	DelayL.ar(PlayBuf.ar(1, z, BufRateScale.kr(z), loop: 0), delaytime:(f * h)-64 * SampleDur.ir)
]
}.plot(2 * f / 48000)
)

My expectation with this code snippet is that recording the sound file with PV_RecordBuf and playing it back with PV_BufRd should return the original signal. Instead, it looks like the phase has been jumbled in some way. Changing the hop size seems to significantly change how it sounds too. Here is the plot I’m generating with the last code block:

Is there something I’m missing here? Does PV_RecordBuf not handle all of the phase information? Or is there some weird misalignment with the frame size, hop size, window type, etc? I’ve tried lots of combinations and can’t seem to come up with the answer.

Any help with the specific PV_RecordBuf / PV_BufRd issue or an alternate solution to the binaural decoder would be greatly appreciated.

IIRC PV_BufRd does apply a phase increment based on the phase difference (per bin) from one frame to the next, so it probably isn’t reliable to reproduce exactly the same frame over and over.

Convolution2 would at least not repeatedly FFT the fixed kernel.

You can also load a buffer with the kernel in FFT’s format (which I would have to recheck in the source code) and use FFTTrigger to read it. Then PV multiply.

hjh

1 Like

Here’s a working(?) example.

First, I wanted to test with a simple impulse as the kernel. This should at least sound transparent.

s.boot;

n = 1024;
~halfN = n div: 2;
~cos = Signal.fftCosTable(n);

// first test with a neutral IR (impulse)
~timeDomain = Signal.newClear(n).put(0, 1);

~fft = ~timeDomain.fft(Signal.newClear(n), ~cos);

(
~packed = [~fft.real[0 .. ~halfN - 1], ~fft.imag[0 .. ~halfN - 1]]
.collect(_.as(Array)).lace(n);

~packed[1] = ~fft.real[~halfN];  // nyquist is real-only
)

k = Buffer.sendCollection(s, ~packed, 1);

b = Buffer.read(s, Platform.resourceDir +/+ "sounds/a11wlk01.wav");

// should be transparent
(
a = {
	var sig = PlayBuf.ar(1, b, loop: 1);
	var fft = FFT(LocalBuf(n), sig);
	var kernel = FFTTrigger(k, polar: 0);  // filled above w/ complex data
	fft = PV_MagMul(fft, kernel);
	(IFFT(fft) * 0.2).dup
}.play;
)

(The buffer format is: DC real, Nyquist real, bin 1 real, bin 1 imag, bin 2 real, bin 2 imag … bin n/2-1 real, bin n/2-1 imag.)

It sounds OK though I’m not sure about plotting.

Then try another IR, such as one recorded from a filter.

t = Buffer.alloc(s, n, 1);

// record a time-domain IR
(
a = {
	var impulse = Impulse.ar(0);
	var response = Ringz.ar(impulse, 600, n/SampleRate.ir);
	RecordBuf.ar(response, t, loop: 0, doneAction: 2);
	response.dup
}.play;
)

t.getToFloatArray(wait: -1, action: { |data| ~timeDomain = data; "done".postln });

// same procedure
~fft = ~timeDomain.fft(Signal.newClear(n), ~cos);

(
~packed = [~fft.real[0 .. ~halfN - 1], ~fft.imag[0 .. ~halfN - 1]]
.collect(_.as(Array)).lace(n);

~packed[1] = ~fft.real[~halfN];  // nyquist is real-only
)

// k is already allocated
k.sendCollection(~packed);

// same synth, should now be resonant-filtered
// Ringz is loud! So I pulled the volume back
(
a = {
	var sig = PlayBuf.ar(1, b, loop: 1);
	var fft = FFT(LocalBuf(n), sig);
	var kernel = FFTTrigger(k, polar: 0);  // filled above w/ complex data
	fft = PV_MagMul(fft, kernel);
	(IFFT(fft) * 0.03).dup
}.play;
)

PS I suggest PV_MagMul for this, not PV_Mul: “Multiplies magnitudes of two inputs and keeps the phases of the first input” (emph. mine).

hjh

1 Like

Amazing! Thank you for the quick reply! This is exactly what I was looking for!

That’s definitely the detail I missed. Thanks for clarifying that!

The code you posted almost works. With a hop size of 0.5 and window type 0, the response I got was rotated in the frame (FFT shifted). Top plot is the output you get when doing a PV_Mul with a dirac and a the output of FFTTrigger and the bottom plot is the impulse response I’m trying to recreate:

I’m not exactly sure why this is. There is probably some detail somewhere else I missed. However, the easy workaround is to simply FFT shift the impulse response before FFT’ing it. This means swapping the first half of the time domain response with the second half. Another way of looking at it is rotating the impulse response by half a frame size. Here is the code snippet I used:

~halfN.do({
    arg i;
    ~timeDomain.swap(i, i + ~halfN);
});

With that in place, everything seems to be working! The impulse response seems to be slightly off, but I’m guessing that is related to the window settings I chose. It’s close enough to be passable for now.

One pitfall others might run into is that you have to be careful with how you use FFTTrigger. It’s important that the buffer referenced by FFTTrigger is not overwritten by other FFT / PV UGens. So for example, in PV_MagMul(fft, kernel), it’s important that the kernel is the second argument and not the first. I noticed this when trying to directly plot the output of FFTTrigger by IFFT’ing it.

The rest of this reply is going to respond to some minor comments you made and is not really relevant to the heart of the post, but if you’re curious about the binaural decoder, read on.

Unfortunately, not repeating the FFT of the fixed kernel is not a huge help here. With even just 16 channels of input, the duplicated IFFTs and FFTs (each input channel is convolved with a different stereo impulse response) can be pretty compute intensive.

When using HRTFs, the phase relationship between two different kernels is important, so discarding it with MagMul doesn’t help here.

Cool! Nice that the offset worked.

That makes sense. I misinterpreted your comparison of the plotted waveforms.

hjh