How to reduce cpu usage when dealing with ugen filled matrices

Aueh · November 13, 2023, 12:04am

Some time ago i made a waveshaper that uses smooth curved transition functions. Im pretty happy with the result. But its very cpu expensive. If i run just one instance it already takes 40% of the cpu. I have a feeling its because of some calculations that require 4 by 4 matrices. This waveshaper only takes one ugen as input, but because of multichannel expansion i end up with 4 times 4 instances of this ugen.

So im thinking how i should reduce these costs. Is there a way to prevent multichannel expansion? So what i mean is something like this.

sig = SinOsc.ar();
[sig, sig, sig]

But instead of having multiple channels, having well… just an array filled with 3 times the same ugen.

Ive noticed that alot of people find my explaning a bit vague, so please dont hesitate to ask so i can try to clearify what i mean.

semiquaver · November 13, 2023, 12:23am

the array you describe only performs the SinOsc calculation once…

check out

{
	var sig = SinOsc.ar();
	[sig, sig, sig]
}.asSynthDef.dumpUGens

You’ll see that all of the SinOscs are identical…

Aueh · November 13, 2023, 12:33am

Thanks! I always thought it created multiple separate ugens when using multichannel expansion. This is good to know. Now i have to continue looking for the cause of high cpu usage.

Aueh · November 13, 2023, 12:48am

So this is the end of a big ugen dump of my waveshaper.

[ 820_Lag, audio, [ 819_/, 0.1 ] ]
[ 821_MulAdd, audio, [ 820_Lag, 99.99999999999, 763_MulAdd ] ]
[ 822_MulAdd, audio, [ 820_Lag, -144.99999999999, 764_MulAdd ] ]
[ 823_*, audio, [ 822_MulAdd, 37_* ] ]
[ 824_MulAdd, audio, [ 821_MulAdd, 36_*, 823_* ] ]
[ 825_MulAdd, audio, [ 820_Lag, 69.749999999993, 765_MulAdd ] ]
[ 826_MulAdd, audio, [ 825_MulAdd, 2_Phasor, 824_MulAdd ] ]
[ 827_MulAdd, audio, [ 820_Lag, -11.137499999999, 766_MulAdd ] ]
[ 828_+, audio, [ 826_MulAdd, 827_MulAdd ] ]
[ 829_-, audio, [ 828_+, 650_MulAdd ] ]
[ 830_MulAdd, audio, [ 34_bitAnd, 829_-, 650_MulAdd ] ]
[ 831_Out, audio, [ 0_Control[0], 830_MulAdd ] ]

Its the first time i use dumpUgens, i didnt know about it. But it seems my little invention is out of proportion. Are there any tips on increasing efficiency, when you have to do alot of mathematical operations?

Aueh · November 13, 2023, 2:50pm

The problem was not the amount of operations, but that alot of those operations created audiorate binaryOpUgens which didnt needed to be audio rate. So i used A2K.kr() where i could and now the cpu usage is down to 10%.

jamshark70 · November 14, 2023, 12:32am

This may or may not be relevant to this specific problem, but one area that SynthDef doesn’t optimize well is repeated creation of the same deterministic operation.

(
d = SynthDef(\test, { |lo = 0, hi = 1, outLo = 0, outHi = 10|
	Out.kr(0,
		Array.fill(2, {
			Rand(lo, hi).linlin(lo, hi, outLo, outHi)
		})
	)
});
d.dumpUGens;
)

[ 0_Control, control, nil ]
[ 1_Rand, scalar, [ 0_Control[0], 0_Control[1] ] ]
[ 2_Clip, scalar, [ 1_Rand, 0_Control[0], 0_Control[1] ] ]
[ 3_-, control, [ 0_Control[3], 0_Control[2] ] ]
[ 4_-, control, [ 0_Control[1], 0_Control[0] ] ]
[ 5_/, control, [ 3_-, 4_- ] ]
[ 6_*, control, [ 5_/, 0_Control[0] ] ]
[ 7_-, control, [ 0_Control[2], 6_* ] ]
[ 8_MulAdd, control, [ 5_/, 2_Clip, 7_- ] ]
[ 9_Rand, scalar, [ 0_Control[0], 0_Control[1] ] ]
[ 10_Clip, scalar, [ 9_Rand, 0_Control[0], 0_Control[1] ] ]
[ 11_-, control, [ 0_Control[3], 0_Control[2] ] ]
[ 12_-, control, [ 0_Control[1], 0_Control[0] ] ]
[ 13_/, control, [ 11_-, 12_- ] ]
[ 14_*, control, [ 13_/, 0_Control[0] ] ]
[ 15_-, control, [ 0_Control[2], 14_* ] ]
[ 16_MulAdd, control, [ 13_/, 10_Clip, 15_- ] ]
[ 17_Out, control, [ 0, 8_MulAdd, 16_MulAdd ] ]

11-15 are the same as 3-7.

I have some experimental code that collapses this to:

[ 0_Control, control, nil ]
[ 1_Rand, scalar, [ 0_Control[0], 0_Control[1] ] ]
[ 2_Clip, scalar, [ 1_Rand, 0_Control[0], 0_Control[1] ] ]
[ 3_-, control, [ 0_Control[3], 0_Control[2] ] ]
[ 4_-, control, [ 0_Control[1], 0_Control[0] ] ]
[ 5_/, control, [ 3_-, 4_- ] ]
[ 6_*, control, [ 5_/, 0_Control[0] ] ]
[ 7_-, control, [ 0_Control[2], 6_* ] ]
[ 8_MulAdd, control, [ 5_/, 2_Clip, 7_- ] ]
[ 9_Rand, scalar, [ 0_Control[0], 0_Control[1] ] ]
[ 10_Clip, scalar, [ 9_Rand, 0_Control[0], 0_Control[1] ] ]
[ 11_MulAdd, control, [ 5_/, 10_Clip, 7_- ] ]
[ 12_Out, control, [ 0, 8_MulAdd, 11_MulAdd ] ]

… producing two chains of Rand → Clip → MulAdd (required, because there are two distinct Rand units whose results will be different), but only one set of coefficient calculations based on the control inputs.

Since you’re working with a matrix, probably every chain traces back to something different, which this would not be able to optimize. But it’s possible that, if you’re collapsing control inputs down to other values, and this is inside a loop, maybe some of those operations are duplicated.

To try the linked gist, save the .sc file into your extensions directory, recompile, and do UGenCache.optimize = true; before building your SynthDef (once in a session).

hjh

Aueh · November 14, 2023, 12:58pm

wow Cool, ill try it out when im home!