# Sound Spatialization

Hi -

I’m somewhat new to SuperCollider, but this question is probably more generally about sound design.

I’d like to be able to mix different sounds in “virtual” 3D space… specifically, mixing mono sources so that they can sound three-dimensional to a person wearing headphones. I’m looking to understand this process a little better.

My understanding is that this is all based on HRTFs - and that the main way of dealing with this all is by using the Ambisonics Toolkit (ATK) and going through a process like the one outlined in this thread:
[Simple HRTF implementation? - #5 by jamshark70](Simple HRTF Implementation)

This thread got very complex very quickly, though - and it seems to involve multiple encodings and decodings. Essentially, a mono signal would be converted into a four-channel signal (in the UHJ format) and then decoded into a stereo binaural sound…Are there simpler ways of doing this? And for that matter, are there “better ways” of doing this?

I know there are a handful of VST plugins to do some version of this process - some are free, some cost thousands of dollars. I’m interested to know more about what makes for those distinctions and what kinds of possibilities are available in SuperCollider to do this kind of work.

Thank you a ton!

The theory of ambisonics is incredibly complex… Using the atk is simple. That’s how you know it’s good!

Have a look at using FoaEncode with a direction, or even the inbuilt supercollider one. Then you use a binaural decoder and supply it a hrtf. Hrtfs are included with atk. Just make sure you follow the install instructions downloading the matrices.

Post back if you get stuck!
J

To give an example…

Using FoaDecoderMatrix - you can just whack it right in to FoaDecode.ar - its just numbers.

{
var mono = SinOsc.ar(220) * -15.dbamp;
var ambi = FoaPanB.ar(mono, \theta.kr(pi/2), 0);
Out.ar(0, stereo);
}.play


Using FoaDecoderKernel - you need to allocate a buffer for the convolution.

s.waitForBoot {

s.sync;

SynthDef(\ambi_test, {
var mono = SinOsc.ar(220) * -15.dbamp;
var ambi = FoaPanB.ar(mono, \theta.kr(pi/2), 0);
Out.ar(0, stereoUHJ);
}).play().onFree({ kernel.free });
};


There are two collections of hrtfs included with atk, see the help on FoaDecoderKernel.newListen for more info.

s.waitForBoot {
var kernel = FoaDecoderKernel.newListen(1002); // see the help for more hrtfs

s.sync;

SynthDef(\ambi_test, {
var mono = SinOsc.ar(220) * -15.dbamp;
var ambi = FoaPanB.ar(mono, \theta.kr(pi/2), 0);
Out.ar(0, binarual);
}).play().onFree({ kernel.free });
};


My attempt, plain acoustics path simulation in a somewhat naive way. Certainly not bug free.

/*
Simulation of room acoustics for first reflections.

3DSpace left-handed:
-pos -> pos left to right,
-y -> y bottom to top,
-z -> z from viewer into the screen.

room, birds eye view:
+------------front--------------+
|                               |
|   s1         s3               |
|        s2            s4  s5   |

.                               .
.left                      right.
.                               .

^  |           ol   or             |
|  |                               |
z  |                               |
+-------------back--------------+
/  pos -> width
/
y = up

source:      synth
|
--------------------------------
|    |    |    |    |    |    | --> direct sound & 6 first reflections

filter & attn: sound radiation pattern

|    |    |    |    |    |    |

room

||   ||   ||   ||   ||   ||   ||--> 2x ds & first reflections, 2 observers

(filter & attn: room properties at reflection point)
distance delay
distance attn

||   ||   ||   ||   ||   ||   ||

filter & attn: 'mic' pattern      TODO

||   ||   ||   ||   ||   ||   ||
-+----+----+----+----+----+----+---- left  -> reverb
|    |    |    |    |    |    |
+----+----+----+----+----+----+----- right -> reverb

"pseudo cardioid" ray intersection

Center the virtual microphone at c=<0,0,0>. Then, by substracting the actual
mic position from the virtual source move the virtual source.
Assume mic sphere radius = 1.
Calculate the intersection ray virtual source - mic sphere. Use the first
intersection point.
Calculate the distance intersection point to x. The result is the
attenuation / directivity of the microphone. Divide by two, the mic_sphere diameter?
x closer to, or beyond, the boundary of the sphere gives a more
directive microphone.

.  .              .  .
.        .        .        .
.          .      .          .
c                  c
.          .      .          .
.      x .        . x      .
.  .              .  .

*/

(
~vdot = {
arg v1, v2;
(v1.x*v2.x) + (v1.y*v2.y) + (v1.z*v2.z)
};

~vlength = {
arg v1;
~vdot.value(v1, v1).sqrt
};

~pdistance = {
arg p1, p2;
~vlength.value(p2 - p1);
};

~vnormalize = {
arg v;
v / ~vlength.value(v);
};

~roomReflection = {
/*
calculates the (virtual)position, distance, delay, amplitude decay for a
source from the observer-pov for the direct sound and the first re-
flections. A rectangular/box room is assumed.

args:
observer(Cartesian): location of an observer in the room
source  (Cartesian): location of a source in the room
room    (Cartesian): The room is assumed from <0,0,0> to
<width, height, length> left hand coordinate system
so only the 'opposite' corner to the origin has to
be specified.
result:
vsources dict with (Cartesian) observer, source, room and arrays with
virtual sources, distance to observer, distance dependent delay and distance
dependent attenuation.
*/

| observer, source, room |

var posarr;
var vsources = Dictionary.new(n: 8);
// coordinates of direct sound and first reflections
vsources.put(\observer, observer);
vsources.put(\source, source);
vsources.put(\room, room);
posarr = Array[
// direct sound
source,
// virtual source first floor reflection
Cartesian(source.x, -1 * source.y, source.z),
// ceiling
Cartesian(source.x, room.y + (room.y - source.y), source.z),
// left
Cartesian(-1 * source.x, source.y, source.z),
// right
Cartesian(room.x + (room.x - source.x), source.y, source.z),
// back
Cartesian(source.x, source.y, -1 * source.z),
// front
Cartesian(source.x, source.y, room.z + (room.z - source.z))
];
vsources.put(\pos, posarr);
// distance to observer
vsources.put(\dist, vsources[\pos].collect({arg item; observer.dist(item)}));
// delay
vsources.put(\delay, vsources[\dist].collect({arg item; item / 340})); //speed of sound m/s
// distance amplitude attenuation
vsources.put(\attn, vsources[\dist].collect({arg item; 1 / item}));
};

~micAttenuation = {
/*
args:
vsources (Dict)     : result dict from ~roomReflection
opattern (Cartesian): vector that defines the  observer/mic pattern

result:
vsources dict with the addition of opattern and the microphone AOI dependent
attenuation.

TODO: frequency dependent pattern?
*/
| vsources, opattern |

var pos = vsources[\pos].collect(
{| item | item - vsources[\observer]}
);
var vec = pos.collect(
{| item | ~vnormalize.value(Cartesian(0,0,0) - item)}
);
var b = pos.collect(
{ | item, i | 2 * (~vdot.value(item, vec[i]))}
);
var c = pos.collect(
);
var delta = b.collect(
{| item, i | (item * item) - (4 * c[i])}
);
var nd = b.collect(
{| item, i | ((-1 * item) + (delta[i].sqrt))} // /2?
);
var nip = nd.collect(   //nearest intersection point
{| item, i | pos[i] + (vec[i] * item)}
);
var micattn = nip.collect(
{ | item | item.dist(opattern)}
);
vsources.put(\micattn, micattn);
vsources.put(\opattern, opattern);
vsources
};

~stereo = {
| leftObserver, rightObserver, source, room |

var left = ~roomReflection.value(leftObserver, source, room);
var right = ~roomReflection.value(rightObserver, source, room);
left = ~micAttenuation.value(left,   Cartesian(0.0, 0.25,-0.7));
right = ~micAttenuation.value(right, Cartesian(0.0,-0.25,-0.7));
Dictionary.newFrom([
\left, left,
\right, right
]);
};
)

(
|sig|

Array[
sig,                                        // direct sound
OnePole.ar(in: sig, coef: 0.75, mul: 0.65), // towards the floor
OnePole.ar(in: sig, coef: 0.6, mul: 0.65),  // ceiling
OnePole.ar(in: sig, coef: 0.3, mul: 0.8),   // left
OnePole.ar(in: sig, coef: 0.4, mul: 0.8),   // right
OnePole.ar(in: sig, coef: 0.85, mul: 0.6),  // back
BLowPass.ar(in: sig, freq: 500, rq: 1.0, mul: 0.8) //front
]
};

SynthDef(\source, {
| out |

var sigl, sigr, room;
var env = EnvGen.kr(
Env(
levels: [0, 1, 0.0],
times: [0.01, 0.03],
curve: 0
),
gate: Dust.kr(density: 5)
);
var sig = SinOsc.ar(freq: 1253) * SawDPW.ar(freq: 543) * env; //amp

room = ~stereo.value(
Cartesian( 9.5, 2,  4), //left observer
Cartesian(10.5, 2,  4), //right observer
Cartesian(8,   3,  15), //source
Cartesian(20,   6, 30)   //room
);

sigl = AllpassL.ar(
in: sig, maxdelaytime: 0.1, delaytime: room[\left][\delay], decaytime: 0, mul: room[\left][\attn]
);
sigr = AllpassL.ar(
in: sig, maxdelaytime: 0.1, delaytime: room[\right][\delay], decaytime: 0, mul: room[\right][\attn]
);
sigl = sigl * room[\left][\micattn];
sigr = sigr * room[\right][\micattn];

Out.ar(out, [sigl, sigr]);