CPU headroom, multi threading

Benu · May 26, 2024, 8:18pm

I mainly build and play systems with a ridiculous number of channels and speakers. With multichannel expansion, multiple iterations and general megalomania, CPU headroom gets thin on a single thread.

I never understood the supernova thing - is it still alive? haven’t read about it for a while.

Is it possible to run synths in different threads?

Could I use multiple servers to distribute CPU load? One server per synth? The don’t speak to each other. It’s always direct to hardware output.

Any hints to unlock the power of modern CPUs in SC?

I’m a rookie in coding. I don’t always understand the discussion on the forum. I need a “for dummies” answer.

Thanks, Bernhard

josh · May 26, 2024, 8:31pm

I have had good luck running multiple versions of scsynth at the same time. I often have to allocate a lot of buffers though, so I tend to do that on one instance, and leave things that are using their own memory (FFT, delays, etc) on different instances. It’s some book keeping, but I feel like this is kind of what the architecture of scsynth / sclang was made for.

semiquaver · May 26, 2024, 9:23pm

supernova works on my machine… Server.supernova then s.reboot

I quick checked just piling up SinOscs in a ParGroup with supernova vs a Group with scsynth and I see about 50-60 percent better performance. You do have to plan though as regards node order to get any benefit

Sam_Pluta · May 26, 2024, 11:13pm

Multiple servers is super easy and efficient.

//make 4 servers, each with their own id
(
    var id = 57100;
    
    ~servers = 4.collect{|i| 
        Server.new("server"++i, NetAddr("localhost", id+i), Server.local.options).boot
    };
)

//see the servers in the array - you should also look in Activity Monitor and see a bunch of scsynth instances
~servers.postln;

//each of the servers send the audio out channels [0,1], which are the first two channels of BlackHole (you won't hear anything at this point)
(
    4.do{|i|
        1000.do{{SinOsc.ar(Rand(1000,3000), 0, 0.001)}.play(~servers[i])};
    }   
)

//look at the 1000 synths on each server:
~servers[0].queryAllNodes;
~servers[1].queryAllNodes;
~servers[2].queryAllNodes;
~servers[3].queryAllNodes;

//check out the cpu usage:
~servers[0].avgCPU

Sam

prko · May 27, 2024, 1:24am

I have used multiple servers on one machine and also used Supernova, and I find Supernova easier to use. Please compare the following three cases and let me know if I have done something wrong!

1. Using multiple servers:

Step 1:

(
fork {
	Server.killAll; 
	Server.scsynth;
	4.do { |i|
		var thisServerKey = ("s" ++ i).asSymbol;
		var thisServerName = ("localhost" ++ i).asSymbol;
		var thisServerAddr = 58112 + i;
		var thisServerEnvVar = ("~" ++ thisServerKey).asString;
		var thisServerDefaultSynthName = ("default" ++ i).asSymbol;
		
		currentEnvironment.put(
			thisServerKey, 
			Server(thisServerName, NetAddr("localhost", thisServerAddr))
		);
		
		defer { thisServerEnvVar.interpret.makeWindow };
		
		thisServerEnvVar.interpret.waitForBoot{
			
			SynthDef(thisServerDefaultSynthName, { arg out=0, freq=440, amp=0.1, pan=0, gate=1;
				var z;
				z = LPF.ar(
					Mix.new(VarSaw.ar(freq + [0, Rand(-0.4,0.0), Rand(0.0,0.4)], 0, 0.3, 0.3)),
					XLine.kr(Rand(4000,5000), Rand(2500,3200), 1)
				) * Linen.kr(gate, 0.01, 0.7, 0.3, 2);
				OffsetOut.ar(out, Pan2.ar(z, pan, amp));
			}, [\ir]).send(Server.named.at(thisServerEnvVar));
			
			//thisServerEnvVar.interpret.plotTree // Window should be rearranged
		};
	};
}
)

Step 2:

(
~test = fork {
	inf.do { |f|
		var nth = (f % 4).asInteger;
		var thisServerEnvVar = ("~s" ++ nth);
		(
			server: thisServerEnvVar.interpret,
			instrument: (\default ++ nth).asSymbol, 
			degree: rrand(0.0, 12.0).round(1 / 4), 
			db: rrand(-30, -25), 
			pan: rrand(-1.0, 1.0)
		).play;
		0.0014.wait;
	};
}
)

~test.stop

2. Using a supernova:

(
s.serverRunning.if { s.quit };
Server.supernova;

s.options.threads = 8;
// Number of threads to use on the CPU.
// Depending on the number of cores on your machine.

s.waitForBoot{
    p = ParGroup.new;
    ~test = fork {
        loop{
            (
                degree: rrand(0.0, 12).round(1/4),
                group: p,
                db: rrand(-30, -25),
                pan: rrand(-1.0, 1.0)
            ).play;
            0.0014.wait
        }
    }
}
)

~test.stop

3. Using a scsynth:

(
s.serverRunning.if { s.quit };
Server.scsynth;

// s.options.threads = 4
// Not applicable when using scsynth.

s.waitForBoot{
    g = Group.new;
    ~test = fork {
        loop{
            (
                degree: rrand(0, 12.0).round(1/4),
                group: g,
                db: rrand(-30, -25),
                pan: rrand(-1.0, 1.0)
            ).play;
            0.0014.wait
        }
    }
}
)

~test.stop

prko · May 27, 2024, 1:43am

Ah! @Sam_Pluta has already posted how to use multiple servers efficiently.
Sorry!

And, I have a question:

Allocation of specific applications to a specific thread of CPU is done by the OS, not by the user.
In this regard, which is more efficient between using supernova and using multiple scsynth servers?
I ask this because using supernova seems to show a flatter CPU usage, but I am not entirely sure…

semiquaver · May 27, 2024, 3:16am

IIRC supernova requires you to group processes that that read each-other’s outputs together into Groups and processes that don’t into ParGroups - these can (should) be nested as needed as needed. So it requires a little care - but I’m getting some pretty great results just now goofing around…

scztt · May 27, 2024, 9:21am

I find that Supernova is more efficient with large, complex, routing-heavy synths and graphs even when it’s only being run on a single thread (e.g. without ParGroup).

prko · May 27, 2024, 11:23am

Here is a comparison between using ParGroup and not using it:

Screenshot 2024-05-27 at 20.19.40

Screenshot 2024-05-27 at 20.19.16

jordan · May 27, 2024, 12:01pm

The elements in your ParGroup aren’t computationally expensive enough to see a benefit from multithreading.

Switching threads is actually very expensive, it is only when the things inside ParGroup are themselves are computationally expensive that you’d see a benefit. Try making a synth with hundreds of comb filters, or placing groups inside the pargroup each with 100s of these small synths in, you should see a difference then.

Measuring cpu usage like this isn’t simple either as (I think) the main thread does a ‘spin lock’ until the other threads are done, meaning it will report near to 100% usage while they are processing, but it is actually doing nothing meaningful. Also, cpu usage isn’t important really (well it is for energy consumption…), what matters for audio is how long it takes to perform the calculation — when waiting for memory to load, the cpu will report a low percentage, but might actually be unable to process the data fast enough.

Spacechild1 · May 27, 2024, 8:32pm

it is only when the things inside ParGroup are themselves are computationally expensive that you’d see a benefit.

Yes! See also the last paragraph in Question on Graph / Topological Sort - #16 by Spacechild1

Also, cpu usage isn’t important really (well it is for energy consumption…), what matters for audio is how long it takes to perform the calculation

True! For benchmarking realtime audio performance, the criteria really is: “how many X can I run simultaneously before getting dropouts?”

Spacechild1 · May 27, 2024, 8:34pm

What’s your OS? How many cores/threads do you have? And what’s the reason for s.options.threads = 10?

prko · May 27, 2024, 11:51pm

The OS is macOS 14.5, and the machine is Macbook Pro 2021 (m1 max)

Spacechild1 · May 28, 2024, 9:33am

AFAICT, the M1 Max has 8 performance cores and 2 efficiency cores, so you don’t want more than 8 DSP threads!

In general, I can imagine that Supernova does not work all that well on Apple Silicon machines because of the new CPU design (performance cores + efficiency cores). I think there is no guarantee that the DSP helper threads will always run on the performance cores and we would really need Use audio workgroups for supernova DSP thread pool. · Issue #5624 · supercollider/supercollider · GitHub.

Anyway, I would be curious to see your results with a lower number of threads, e.g. 8, 6 or 4.

prko · May 28, 2024, 1:03pm

Here is demo video:
demo video

and screenshots:

scztt · May 28, 2024, 1:20pm

A small addition - this kind of testing should always be done with a laptop plugged in - when running on battery power, CPU clock speeds are much more uneven so the effective performance you get for audio is much less than the “average” CPU clock speed at any given time. This has historically been a problem with Macs, as they often make more high-power processors feasible on their laptops by extremely aggressive power management - which is fine for most things, but can have an outsized impact on audio processing.

jordan · May 28, 2024, 3:31pm

Wanted to just post a benchmark that shows supernova’s strengths a little bit better.

12 × AMD Ryzen 5 5600X 6-Core Processor

I’m defining stable as ‘does not produce xruns while opening and closing the server meter over a one minute period’.

server type	num threads	num synths	stable	cpu% per core	perf delta	relative perf per core
scsynth	1	20	stable	37	100%	benchmark
–
nova	1	19	stable	35	95%	95%
nova	2	30	stable	41	150%	75%
nova	3	33	stable	40	165%	55%
nova	6	37	stable	48	185%	31%
nova	12	37	unstable	48	185%	15%

Things to take away.

Using the virtual/hyper cores the cpu reports hinders performance (nova defaults to six not twelve so its fine).
Supernova is roughly equivalent to scsynth when run on one core.
Diminishing returns are quickly reached.
Xruns occur far before reaching a CPU limit, implying that memory access or thread synchronisation is the bottleneck.
Across the used cores, supernova’s cpu usage is flat.

If any one else wants to run this and report back, here it is!

(
var numThreads = 6;
var numSynths = 37;
var useSupernova = true;

if(useSupernova) {
	Server.supernova
} {
	Server.scsynth
};

s.options.threads = numThreads;
s.options.memSize = 2 * 1024 * 1024;

~fftsize = 2048;

~createIR = {
	var ir, irbuffer, bufsize, buffer;
	ir = [1] ++ 0.dup(100) ++ (
		(1, 0.999998 .. 0).collect {|f|
			f = f.squared.squared;
			f = if(f.coin) { 0 }{ f.squared };
			f = if(0.5.coin) { 0 - f } { f }
		} * 0.1
	);
	ir = ir.normalizeSum;
	
	irbuffer = Buffer.loadCollection(s, ir);
	
	s.sync;
	
	bufsize = PartConv.calcBufSize(~fftsize, irbuffer);		
	buffer = Buffer.alloc(s, bufsize, 1);
	buffer.preparePartConv(irbuffer, ~fftsize);
	
	s.sync;
	
	irbuffer.free; 	
	buffer;
};

s.waitForBoot {
	Window.closeAll;
	SynthDef(\bigOne, {
		var part = PartConv.ar(Impulse.ar(340), ~fftsize, \irbuf.kr(-1), 0.5);	
		Out.ar(0, part)
	}).add;
	
	s.sync;
	
	~g = ParGroup();
	
	s.sync;
	
	~irbuf = ~createIR.();
	numSynths.do{ 	
		Synth.head(~g, \bigOne, [\irbuf: ~irbuf])
	};	

};
)

Spacechild1 · May 28, 2024, 4:23pm

Which OS?

As a side note, there is a pending PR of mine that improves Supernova performance on Windows (and under certain circumstances also on Linux): Supernova: thread affinity fixes/improvements by Spacechild1 · Pull Request #5618 · supercollider/supercollider · GitHub

jordan · May 28, 2024, 4:42pm

Linux, current dev branch. Oh yes I saw that!

semiquaver · May 28, 2024, 7:03pm

If supernova is, as it seems always at least as performant as scsynth what is scsynth for?