Dot product UGen

ddgg · July 20, 2021, 10:04pm

Hello everybody!

I came across something that I thought being pretty straight forward to implement, but somehow is getting me crazy.

I’d like to calculate the dot product between an x number of signals and an x*x Matrix. Something that in the language I would write like this:

(
var dot = {
	|m,v|
	m.collect({
	|i|
	(i*v).sum
})
};
dot.value([[0.5,0],[0.5,0.5]], [3,4]) //  -> [1.5, 3.5]
)

But how could I multiply the output of a UGen with a Matrix?

DC.ar([3,4])*[[0.5,0],[0.5,0.5]]

Any suggestion? thank you!

ddgg

dkmayer · July 20, 2021, 10:55pm

Actually there’s no difference as multichannel expansion is done in the language.


// inner arrays of m understood as rows, vectors as columns
// of course convention can be changed

~linMap = { |m, v| (m.flop * v).flop.collect(_.sum) };

~linMap.([[10, 20], [50, 100]], [5, 7]);


x = { 
	Saw.ar(
		~linMap.(
			[[10, 20], [50, 100]], 
			[5, 7] * SinOsc.ar(0.1).range(0.98, 1.02)
		).poll,
		0.1
	)
}.play

x.release

ddgg · July 21, 2021, 9:29am

Hello Daniel!
Thank you. That seems pretty straight forward.
In your experience, would this work also for an x size vector and a x*x size Matrix where x is…lets say…40? Wouldn’t the Language start to suffer the computation?

dkmayer · July 21, 2021, 10:58am

Well, let’s check it out

To be exact, the language is busy only at compile time, then let’s see how much CPU is consumed by the server when running. You might have to increase the server option numWireBufs

In the following example, after ca. 4 seconds of compilation on my machine, 1768 ugens are engaged at 3.5 % CPU. That look pretty reasonable as long as you are not wanting to run many of these in parallel.

s.options.numWireBufs = 64 * 16;
s.reboot;


~linMap = { |m, v| (m.flop * v).flop.collect(_.sum) };

(
n = 40;

x = { 
	Saw.ar(
		~linMap.(
			{ { rand(5.0) } ! n } ! n, 
			{ rand(5.0) } ! n * SinOsc.ar(0.1).range(0.98, 1.02)
		),
		0.1
	)
}.play
)

s.scope(40)

x.release

It gets a bit heavier if you have a vector of independent ugens, though not all too much. Of course, it’s much heavier with a matrix of ugens, though from your post I assume you’re not wanting that.

ddgg · July 21, 2021, 2:56pm

Unfortunately this is exactly what I wanted. I’m sorry, I didn’t express myself clearly…
I would like to be able to change either the vector values or the matrix coefficient at each DSP block.
I’ll try to present another example and then we will see…

jamshark70 · July 21, 2021, 3:09pm

I’m curious where the weight would come from. AFAIK SomeUnit.ar * float and SomeUnit.ar * OtherUnit.ar would not perform very differently.

hjh

dkmayer · July 21, 2021, 10:37pm

Well, it still looked doable with a size of 40 in the examples I tried. If you have a chance of reducing the matrix (triangular, band, or sparse in another way), the better.

I had such cases in mind:


s.options.numWireBufs = 64 * 16;
s.reboot;


~linMap = { |m, v| (m.flop * v).flop.collect(_.sum) };



// constant matrix ...

// 1806 ugens, 5 % CPU

(
n = 40;

x = { 
	Saw.ar(
		~linMap.(
			{ { rand(5.0) } ! n } ! n, 
			SinOsc.ar({ rand(5.0) } ! n).range(0.5, 1.5)
		),
		0.1
	)
}.play
)

x.release


// ... versus constant vector

// 4926 ugens, 15 % CPU

(
n = 40;

x = { 
	Saw.ar(
		~linMap.(
			SinOsc.ar({ { rand(50.0) } ! n } ! n).range(0.5, 1.5), 
			{ rand(5.0) } ! n
		),
		0.1
	)
}.play
)

x.release

jamshark70 · July 21, 2021, 11:22pm

Indeed, in this case you couldn’t reduce the number of oscillators, and these will be heavier than math ops, so it’s always going to be more expensive.

As these are all linear operations, though, I wonder if there’s a way to reduce some redundancy by applying the range operation to the sums rather than to the individual oscillators: ab + ac + ad is 3 muls and 2 adds, while a(b + c + d) is 2 adds and 1 mul… and scale up to 1600 MulAdds vs potentially only 40 muls and I guess 39*40 adds. (But then, does the overhead of splitting up the MulAdd units outweigh the elimination of 1560 multiply ops? Optimization is tricky.)

hjh

jamshark70 · July 22, 2021, 12:09am

// special case assuming all oscillators
// are mul'ed and add'ed by the same amount
~linMapRange = { |oscMatrix, vector, oscMul, oscAdd|
	var vectorMul = vector * oscMul;
	var offset = vector.sum * oscAdd;
	(oscMatrix.flop * vectorMul).flop.collect(_.sum) + offset
};

(
n = 40;

x = { 
	Saw.ar(
		~linMapRange.(
			SinOsc.ar({ { rand(50.0) } ! n } ! n), 
			{ rand(5.0) } ! n,
			0.5, 1.0
		),
		0.1
	)
}.play
)

The original has 4296 ugens; this variant has 3366. But tbh, the Saws and SinOscs are responsible for most of the CPU usage.

Derivation:

~linMap.([[a * 0.5 + 1, b * 0.5 + 1], [c * 0.5 + 1, d * 0.5 + 1]], [e, f])

[(a * 0.5 + 1)e + (b * 0.5 + 1)f, (c * 0.5 + 1)e + (d * 0.5 + 1)f]

[ae * 0.5 + e, bf * 0.5 + f, ce * 0.5 + e, df * 0.5 + f]

let g = e * 0.5  // mul factor from 'range'
let h = f * 0.5

let i = e * 1  // add term from 'range'
let j = f * 1

[ag + i + bh + j, cg + i + dh + j]
[ag + bh + i + j, cg + dh + i + j]

let k = i + j
[ag + bf + k, cg + df + k]

hjh

ddgg · August 4, 2021, 2:58pm

Well, after a few attempts, here is my solution:

(
var signals = 50; 
/*
signals ugens comparison
for  10 signals -> 143 ugens  
     20 -> 454 
     30 -> 978 
     40 -> 1701 
     50 -> 2624
*/
var diag, coeffs, dot, init;

diag = {
	|cnt|
	cnt.collect({
		|i|
		cnt.collect({
			|j|
			if(j == i){-0.1}{-0.1.rand}
		})
	})
};

coeffs = diag.value(signals);
init = ({0.5.rand}!signals);

dot = {
	|m,v|
	m.collect({|row| (v*row).sum})
};

{
	var step;
	init = SinOsc.ar(({300.rand}!signals),0,0.01);
	step = dot.(coeffs, init);
	Mix.ar(step);
}.play; //.plot(300/s.sampleRate).superpose_(true).plotColor_(({Color.rand}!init.size))
s.scope;
)

thank you very much for your advices.
I’m still not completely satisfied with it since I would love to be able to change the matrix coefficients in real time and I don’t really foresee any “less-demanding” solution. So I’ll try to develop a UGen in cpp which handles matrix multiplication. Let’s see how it goes…

amindfv · August 4, 2021, 4:21pm

I think (unless the optimizer does this already) you’ll want to replace sum with Mix - it greatly reduces the total number of operations, plus makes use of SIMD instructions.

Tom

ddgg · August 4, 2021, 4:44pm

Thank you Tom. I just gave it a try, but it seems to me that there is not so much difference.
You meant inside the dot function:

dot = {
	|m,v|
	m.collect({|row| Mix.ar(v*row)})
};

right? it actually adds too many UGens in the Graph (for an high number of signals) leading to an Exception Error…

ddgg · August 4, 2021, 5:28pm

As a matter of fact I could avoid the dot function and simply write:

{
	var step;
	init = SinOsc.ar(({300.rand}!signals),0,0.01);
	step = Mix.ar(coeffs.flop*init); // instead of dot.(coeffs,init)
	Mix.ar(step);
}.play;

maybe defining coeffs as an array of multichannel Busses or Controls.
Thank you very much for your feedbacks. The discussion helped me a lot

ddgg · August 4, 2021, 5:43pm

sorry, I hope that you don’t think that I am spamming, but interestingly enough, if I write the exact same function defining coeffs as:

var coeffs = Control.names(\coeffs).kr([[1,2,3],[4,5,6],[7,8,9]]);

I don’t have to flop the array…Can anyone explain me why?

if you want a quick test:

(
var m = [[1,2,3],[4,5,6],[7,8,9]];
var v = [2,3,4];
(m.flop*v).sum
)

(
{
	Mix.ar([[1,2,3],[4,5,6],[7,8,9]].flop*DC.ar([2,3,4])).poll;
	0;
}.play
)

(
{
Mix.ar(Control.names(\coeffs).kr([[1,2,3],[4,5,6],[7,8,9]])*DC.ar([2,3,4])).poll;
}.play	
)