Simple HRTF implementation?

ortensie · December 27, 2020, 9:36pm

Hello folks!

These holidays I’m digging into spatialisation for a project I’m currently working on.
Until now, I only used super-simple spatialisation, like Pan and Pan2, but now I’m trying to push a little bit further.

What I’m working on is basically a system with 32 audio sources, each of which requires its own independent spatialisation. These 32 sources are organised into 8 SynthDefs.

I’ve already hacked some examples in the ATK, but these seem too complicate and computationally expensive for my needs, as I don’t even need to render a 3D space, but just a 2D space (front-back, left-right).
Moreover, I can’t get these encoders-decoders to work within SynthDefs (I admit, I didn’t spend much time on it, but anyway…)

I was wondering if someone can suggest me a very simple way/technique to render (even NOT too accurately) a 2D space into stereo recordings.
Maybe a simple HRTF filter for front and back to interpolate with two stereo signals?

Thank you so much!
All the best,
Ardan.

Benu · December 27, 2020, 10:29pm

Not sure If you want output to 32 speakers, or you start with 32 sources/synths and place them in a plane for 2ch stereo?

Ambisonics is more for the first case. For the second there are certainly better solutions. In SC I don’t know any of the shelf solution. Pan only does amplitude, something with delay would give already a stronger effect.

Purely hypothetical: If I had to start from scratch I’d: Mess around with amplitude and delay between left and right. The distance between human ears + some math = delay between left and right depending angle. Lowpass filtering on the opposite ear to muffle, the phaseshift around the cutoff might increase the delay effect.

If you want some random or strange stereo effects try multiple and different Allpassfiltering between L/R. Not for super accurate locating, but good for widening sounds and give them some air, specially if they’re rich in harmonics or noise…

ortensie · December 27, 2020, 10:42pm

Yes, I just have 32 sources to be placed into a 2ch stereo out. Sorry if I wasn’t clear!

Pan works just ok for left-right location, I’m just struggling in rendering front and back…

Benu · December 27, 2020, 10:53pm

I was wrong about ATK, just discovered that ATK seems to have HRTF for binaural decoding. So probably the most evident solution. Encoding/decoding etc. is a hassle if you don’t need the compatibility to ambisonic and you loose some information too.

If you want back/front the delay-idea won’t do it either, this probably needs some FIR voodoo.

I wonder if one can isolate the HRTF from the source of ATK?

jamshark70 · December 27, 2020, 11:26pm

I’ve found that the UHJ decoder(s?) in ATK distinguish subtly between front and back, and it requires less precision in choosing the kernel for the listener’s specific head and ear shape.

As for reducing computational complexity: 3D ambisonic B-format has 4 channels: “whole” (W, analogous to the Mid in Mid-Side format) and X, Y, Z channels for the deviation from W at left, front and above respectively. If you’re not using up/down, then the Z channel will always be 0. So the ambisonic panner doesn’t have to calculate it – see PanB2, which should be faster than a full rotational matrix.

Decoding the 3D signal using HRTF or UHJ is then:

Convolve W against W kernels (L and R).
Convolve X against X kernels.
Convolve Y against Y kernels.
And Z…
Mix the 4 stereo signals.

If Z is 0, then its convolution will be 0. So you can skip it and do 6 convolutions instead of 8. You could steal the logic from ATK and just drop Z.

(Also, in Ambisonics, you can mix the B-format signals directly and decode the mix – convolution is expensive – it will save a lot of CPU time to do 6 convolutions on the mixdown rather than 6 x n sources.)

hjh

totalgee · December 30, 2020, 5:22pm

@jamshark is right that you should just mix the (four-channel) B-format signals and only decode at the end. I’ve not found it particularly expensive, and 32 independent sources spatialized this way (even without trying his optimizations) is definitely doable on modern laptops, especially if you’re just using the “matrix” encoders and transformations (vs. slightly heavier “kernel” encoders).

If your specific source locations were “fixed” (not moving) and there were only a few of them, you could consider doing the convolution yourself (see an example here that “transforms” eight fixed virtual speakers to binaural using direct convolution with the HRTFs of your choice)…

…but that’s a bit of a nuisance. I’d recommend you use the ATK for this (even if you’re only panning in the horizontal plane), just because it’s so easy to work with. Another example here assumes all your audio output will be in B-format, and just does decoding to stereo (using the ATK’s binaural decoder) as the last thing on the Server, so the final audio output you is two-channel stereo.

The examples are from a “binaural livecoding” mini-workshop several of us did in November. Some of the examples are related to SuperDirt (because many of the attendees were TidalCycles users without much SC experience), but you can skip the SuperDirt bits to use with “regular” SuperCollider… (Also, you don’t need to use an Ndef this way to do the decoding – the only reason I prototyped it that way was for the ease of changing things on the fly while experimenting.) For a project or performance, I would simplify things by using Synths and Groups directly to manage the encoding/decoding setup (you can use things like a custom ServerTree (or s.tree) function to rebuild the desired configuration after a reboot or Cmd-period).

Glen.