Sound Spatialization

tullio · October 19, 2022, 11:56pm

Hi -

I’m somewhat new to SuperCollider, but this question is probably more generally about sound design.

I’d like to be able to mix different sounds in “virtual” 3D space… specifically, mixing mono sources so that they can sound three-dimensional to a person wearing headphones. I’m looking to understand this process a little better.

My understanding is that this is all based on HRTFs - and that the main way of dealing with this all is by using the Ambisonics Toolkit (ATK) and going through a process like the one outlined in this thread:
[Simple HRTF implementation? - #5 by jamshark70](Simple HRTF Implementation)

This thread got very complex very quickly, though - and it seems to involve multiple encodings and decodings. Essentially, a mono signal would be converted into a four-channel signal (in the UHJ format) and then decoded into a stereo binaural sound…Are there simpler ways of doing this? And for that matter, are there “better ways” of doing this?

I know there are a handful of VST plugins to do some version of this process - some are free, some cost thousands of dollars. I’m interested to know more about what makes for those distinctions and what kinds of possibilities are available in SuperCollider to do this kind of work.

Thank you a ton!

jordan · October 20, 2022, 8:19am

The theory of ambisonics is incredibly complex… Using the atk is simple. That’s how you know it’s good!

Have a look at using FoaEncode with a direction, or even the inbuilt supercollider one. Then you use a binaural decoder and supply it a hrtf. Hrtfs are included with atk. Just make sure you follow the install instructions downloading the matrices.

Post back if you get stuck!
J

jordan · October 20, 2022, 10:36am

To give an example…

Using FoaDecoderMatrix - you can just whack it right in to FoaDecode.ar - its just numbers.

{
	var mono = SinOsc.ar(220) * -15.dbamp;
	var ambi = FoaPanB.ar(mono, \theta.kr(pi/2), 0);
	var stereo = FoaDecode.ar(ambi, FoaDecoderMatrix.newStereo());
    Out.ar(0, stereo);
}.play

Using FoaDecoderKernel - you need to allocate a buffer for the convolution.

s.waitForBoot {
	var kernel = FoaDecoderKernel.newUHJ();

	s.sync;

	SynthDef(\ambi_test, {
		var mono = SinOsc.ar(220) * -15.dbamp;
		var ambi = FoaPanB.ar(mono, \theta.kr(pi/2), 0);
		var stereoUHJ = FoaDecode.ar(ambi, kernel);
		Out.ar(0, stereoUHJ);
	}).play().onFree({ kernel.free });
};

There are two collections of hrtfs included with atk, see the help on FoaDecoderKernel.newListen for more info.

s.waitForBoot {
	var kernel = FoaDecoderKernel.newListen(1002); // see the help for more hrtfs

	s.sync;
	
	SynthDef(\ambi_test, {
		var mono = SinOsc.ar(220) * -15.dbamp;
		var ambi = FoaPanB.ar(mono, \theta.kr(pi/2), 0);
		var binarual = FoaDecode.ar(ambi, kernel);
		Out.ar(0, binarual);
	}).play().onFree({ kernel.free });
};

ingo · October 20, 2022, 5:18pm

My attempt, plain acoustics path simulation in a somewhat naive way. Certainly not bug free.

/*
Simulation of room acoustics for first reflections.

3DSpace left-handed:
    -pos -> pos left to right,
    -y -> y bottom to top,
    -z -> z from viewer into the screen.

room, birds eye view:
   +------------front--------------+
   |                               |
   |   s1         s3               |
   |        s2            s4  s5   |

   .                               .
   .left                      right.
   .                               .

^  |           ol   or             |
|  |                               |
z  |                               |
   +-------------back--------------+
  /  pos -> width
 /
y = up


source:      synth
               |
--------------------------------
|    |    |    |    |    |    | --> direct sound & 6 first reflections

filter & attn: sound radiation pattern

|    |    |    |    |    |    |

             room

||   ||   ||   ||   ||   ||   ||--> 2x ds & first reflections, 2 observers

(filter & attn: room properties at reflection point)
       distance delay
       distance attn

||   ||   ||   ||   ||   ||   ||

filter & attn: 'mic' pattern      TODO

||   ||   ||   ||   ||   ||   ||
-+----+----+----+----+----+----+---- left  -> reverb
|    |    |    |    |    |    |
+----+----+----+----+----+----+----- right -> reverb



"pseudo cardioid" ray intersection

Center the virtual microphone at c=<0,0,0>. Then, by substracting the actual
mic position from the virtual source move the virtual source.
Assume mic sphere radius = 1.
Calculate the intersection ray virtual source - mic sphere. Use the first
intersection point.
Calculate the distance intersection point to x. The result is the
attenuation / directivity of the microphone. Divide by two, the mic_sphere diameter?
x closer to, or beyond, the boundary of the sphere gives a more
directive microphone.

          .  .              .  .
       .        .        .        .
      .          .      .          .
           c                  c
      .          .      .          .
       .      x .        . x      .
          .  .              .  .

*/

(
~vdot = {
    arg v1, v2;
    (v1.x*v2.x) + (v1.y*v2.y) + (v1.z*v2.z)
};

~vlength = {
    arg v1;
    ~vdot.value(v1, v1).sqrt
};

~pdistance = {
    arg p1, p2;
    ~vlength.value(p2 - p1);
};

~vnormalize = {
    arg v;
    v / ~vlength.value(v);
};

~roomReflection = {
    /*
    calculates the (virtual)position, distance, delay, amplitude decay for a
    source from the observer-pov for the direct sound and the first re-
    flections. A rectangular/box room is assumed.

    args:
    observer(Cartesian): location of an observer in the room
    source  (Cartesian): location of a source in the room
    room    (Cartesian): The room is assumed from <0,0,0> to
                         <width, height, length> left hand coordinate system
                         so only the 'opposite' corner to the origin has to
                         be specified.
    result:
    vsources dict with (Cartesian) observer, source, room and arrays with
    virtual sources, distance to observer, distance dependent delay and distance
    dependent attenuation.
    */

    | observer, source, room |

    var posarr;
    var vsources = Dictionary.new(n: 8);
    // coordinates of direct sound and first reflections
    vsources.put(\observer, observer);
    vsources.put(\source, source);
    vsources.put(\room, room);
    posarr = Array[
        // direct sound
        source,
        // virtual source first floor reflection
        Cartesian(source.x, -1 * source.y, source.z),
        // ceiling
        Cartesian(source.x, room.y + (room.y - source.y), source.z),
        // left
        Cartesian(-1 * source.x, source.y, source.z),
        // right
        Cartesian(room.x + (room.x - source.x), source.y, source.z),
        // back
        Cartesian(source.x, source.y, -1 * source.z),
        // front
        Cartesian(source.x, source.y, room.z + (room.z - source.z))
    ];
    vsources.put(\pos, posarr);
    // distance to observer
    vsources.put(\dist, vsources[\pos].collect({arg item; observer.dist(item)}));
    // delay
    vsources.put(\delay, vsources[\dist].collect({arg item; item / 340})); //speed of sound m/s
    // distance amplitude attenuation
    vsources.put(\attn, vsources[\dist].collect({arg item; 1 / item}));
};


~micAttenuation = {
    /*
    args:
    vsources (Dict)     : result dict from ~roomReflection
    opattern (Cartesian): vector that defines the  observer/mic pattern

    result:
    vsources dict with the addition of opattern and the microphone AOI dependent
    attenuation.

    TODO: frequency dependent pattern?
    */
    | vsources, opattern |

    var pos = vsources[\pos].collect(
        {| item | item - vsources[\observer]}
    );
    var vec = pos.collect(
        {| item | ~vnormalize.value(Cartesian(0,0,0) - item)}
    );
    var b = pos.collect(
        { | item, i | 2 * (~vdot.value(item, vec[i]))}
    );
    var c = pos.collect(
        {| item | ~vdot.value(item,item) - 1}  //-1 : (radius * radius) radius = 1
    );
    var delta = b.collect(
        {| item, i | (item * item) - (4 * c[i])}
    );
    var nd = b.collect(
        {| item, i | ((-1 * item) + (delta[i].sqrt))} // /2?
    );
    var nip = nd.collect(   //nearest intersection point
        {| item, i | pos[i] + (vec[i] * item)}
    );
    var micattn = nip.collect(
        { | item | item.dist(opattern)}
    );
    vsources.put(\micattn, micattn);
    vsources.put(\opattern, opattern);
    vsources
};

~stereo = {
    | leftObserver, rightObserver, source, room |

    var left = ~roomReflection.value(leftObserver, source, room);
    var right = ~roomReflection.value(rightObserver, source, room);
    left = ~micAttenuation.value(left,   Cartesian(0.0, 0.25,-0.7));
    right = ~micAttenuation.value(right, Cartesian(0.0,-0.25,-0.7));
    Dictionary.newFrom([
        \left, left,
        \right, right
    ]);
};
)


(
~radiation = {
    |sig|

    Array[
        sig,                                        // direct sound
        OnePole.ar(in: sig, coef: 0.75, mul: 0.65), // towards the floor
        OnePole.ar(in: sig, coef: 0.6, mul: 0.65),  // ceiling
        OnePole.ar(in: sig, coef: 0.3, mul: 0.8),   // left
        OnePole.ar(in: sig, coef: 0.4, mul: 0.8),   // right
        OnePole.ar(in: sig, coef: 0.85, mul: 0.6),  // back
        BLowPass.ar(in: sig, freq: 500, rq: 1.0, mul: 0.8) //front
    ]
};

SynthDef(\source, {
    | out |

    var sigl, sigr, room;
    var env = EnvGen.kr(
        Env(
            levels: [0, 1, 0.0],
            times: [0.01, 0.03],
            curve: 0
        ),
        gate: Dust.kr(density: 5)
    );
    var sig = SinOsc.ar(freq: 1253) * SawDPW.ar(freq: 543) * env; //amp
    sig = ~radiation.value(sig);

    room = ~stereo.value(
        Cartesian( 9.5, 2,  4), //left observer
        Cartesian(10.5, 2,  4), //right observer
        Cartesian(8,   3,  15), //source
        Cartesian(20,   6, 30)   //room
    );

    sigl = AllpassL.ar(
        in: sig, maxdelaytime: 0.1, delaytime: room[\left][\delay], decaytime: 0, mul: room[\left][\attn]
    );
    sigr = AllpassL.ar(
        in: sig, maxdelaytime: 0.1, delaytime: room[\right][\delay], decaytime: 0, mul: room[\right][\attn]
    );
    sigl = sigl * room[\left][\micattn];
    sigr = sigr * room[\right][\micattn];

    Out.ar(out, [sigl, sigr]);
}).add;

y = Synth(\source);

)

y.free;