CPU headroom, multi threading

I mainly build and play systems with a ridiculous number of channels and speakers. With multichannel expansion, multiple iterations and general megalomania, CPU headroom gets thin on a single thread.

I never understood the supernova thing - is it still alive? haven’t read about it for a while.

Is it possible to run synths in different threads?

Could I use multiple servers to distribute CPU load? One server per synth? The don’t speak to each other. It’s always direct to hardware output.

Any hints to unlock the power of modern CPUs in SC?

I’m a rookie in coding. I don’t always understand the discussion on the forum. I need a “for dummies” answer.

Thanks, Bernhard

I have had good luck running multiple versions of scsynth at the same time. I often have to allocate a lot of buffers though, so I tend to do that on one instance, and leave things that are using their own memory (FFT, delays, etc) on different instances. It’s some book keeping, but I feel like this is kind of what the architecture of scsynth / sclang was made for.

supernova works on my machine… Server.supernova then s.reboot

I quick checked just piling up SinOscs in a ParGroup with supernova vs a Group with scsynth and I see about 50-60 percent better performance. You do have to plan though as regards node order to get any benefit

Multiple servers is super easy and efficient.

//make 4 servers, each with their own id
(
    var id = 57100;
    
    ~servers = 4.collect{|i| 
        Server.new("server"++i, NetAddr("localhost", id+i), Server.local.options).boot
    };
)

//see the servers in the array - you should also look in Activity Monitor and see a bunch of scsynth instances
~servers.postln;

//each of the servers send the audio out channels [0,1], which are the first two channels of BlackHole (you won't hear anything at this point)
(
    4.do{|i|
        1000.do{{SinOsc.ar(Rand(1000,3000), 0, 0.001)}.play(~servers[i])};
    }   
)

//look at the 1000 synths on each server:
~servers[0].queryAllNodes;
~servers[1].queryAllNodes;
~servers[2].queryAllNodes;
~servers[3].queryAllNodes;

//check out the cpu usage:
~servers[0].avgCPU

Sam

1 Like

I have used multiple servers on one machine and also used Supernova, and I find Supernova easier to use. Please compare the following three cases and let me know if I have done something wrong!

1. Using multiple servers:

Step 1:

(
fork {
	Server.killAll; 
	Server.scsynth;
	4.do { |i|
		var thisServerKey = ("s" ++ i).asSymbol;
		var thisServerName = ("localhost" ++ i).asSymbol;
		var thisServerAddr = 58112 + i;
		var thisServerEnvVar = ("~" ++ thisServerKey).asString;
		var thisServerDefaultSynthName = ("default" ++ i).asSymbol;
		
		currentEnvironment.put(
			thisServerKey, 
			Server(thisServerName, NetAddr("localhost", thisServerAddr))
		);
		
		defer { thisServerEnvVar.interpret.makeWindow };
		
		thisServerEnvVar.interpret.waitForBoot{
			
			SynthDef(thisServerDefaultSynthName, { arg out=0, freq=440, amp=0.1, pan=0, gate=1;
				var z;
				z = LPF.ar(
					Mix.new(VarSaw.ar(freq + [0, Rand(-0.4,0.0), Rand(0.0,0.4)], 0, 0.3, 0.3)),
					XLine.kr(Rand(4000,5000), Rand(2500,3200), 1)
				) * Linen.kr(gate, 0.01, 0.7, 0.3, 2);
				OffsetOut.ar(out, Pan2.ar(z, pan, amp));
			}, [\ir]).send(Server.named.at(thisServerEnvVar));
			
			//thisServerEnvVar.interpret.plotTree // Window should be rearranged
		};
	};
}
)

Step 2:

(
~test = fork {
	inf.do { |f|
		var nth = (f % 4).asInteger;
		var thisServerEnvVar = ("~s" ++ nth);
		(
			server: thisServerEnvVar.interpret,
			instrument: (\default ++ nth).asSymbol, 
			degree: rrand(0.0, 12.0).round(1 / 4), 
			db: rrand(-30, -25), 
			pan: rrand(-1.0, 1.0)
		).play;
		0.0014.wait;
	};
}
)

~test.stop

2. Using a supernova:

(
s.serverRunning.if { s.quit };
Server.supernova;
​
s.options.threads = 8;
// Number of threads to use on the CPU.
// Depending on the number of cores on your machine.
​
s.waitForBoot{
    p = ParGroup.new;
    ~test = fork {
        loop{
            (
                degree: rrand(0.0, 12).round(1/4),
                group: p,
                db: rrand(-30, -25),
                pan: rrand(-1.0, 1.0)
            ).play;
            0.0014.wait
        }
    }
}
)

~test.stop

3. Using a scsynth:

(
s.serverRunning.if { s.quit };
Server.scsynth;
​
// s.options.threads = 4
// Not applicable when using scsynth.
​
s.waitForBoot{
    g = Group.new;
    ~test = fork {
        loop{
            (
                degree: rrand(0, 12.0).round(1/4),
                group: g,
                db: rrand(-30, -25),
                pan: rrand(-1.0, 1.0)
            ).play;
            0.0014.wait
        }
    }
}
)

~test.stop

Ah! @Sam_Pluta has already posted how to use multiple servers efficiently.
Sorry!

And, I have a question:

Allocation of specific applications to a specific thread of CPU is done by the OS, not by the user.
In this regard, which is more efficient between using supernova and using multiple scsynth servers?
I ask this because using supernova seems to show a flatter CPU usage, but I am not entirely sure…

IIRC supernova requires you to group processes that that read each-other’s outputs together into Groups and processes that don’t into ParGroups - these can (should) be nested as needed as needed. So it requires a little care - but I’m getting some pretty great results just now goofing around…

I find that Supernova is more efficient with large, complex, routing-heavy synths and graphs even when it’s only being run on a single thread (e.g. without ParGroup).

Here is a comparison between using ParGroup and not using it:

Screenshot 2024-05-27 at 20.19.40

Screenshot 2024-05-27 at 20.19.16

The elements in your ParGroup aren’t computationally expensive enough to see a benefit from multithreading.

Switching threads is actually very expensive, it is only when the things inside ParGroup are themselves are computationally expensive that you’d see a benefit. Try making a synth with hundreds of comb filters, or placing groups inside the pargroup each with 100s of these small synths in, you should see a difference then.

Measuring cpu usage like this isn’t simple either as (I think) the main thread does a ‘spin lock’ until the other threads are done, meaning it will report near to 100% usage while they are processing, but it is actually doing nothing meaningful. Also, cpu usage isn’t important really (well it is for energy consumption…), what matters for audio is how long it takes to perform the calculation — when waiting for memory to load, the cpu will report a low percentage, but might actually be unable to process the data fast enough.

1 Like

it is only when the things inside ParGroup are themselves are computationally expensive that you’d see a benefit.

Yes! See also the last paragraph in Question on Graph / Topological Sort - #16 by Spacechild1

Also, cpu usage isn’t important really (well it is for energy consumption…), what matters for audio is how long it takes to perform the calculation

True! For benchmarking realtime audio performance, the criteria really is: “how many X can I run simultaneously before getting dropouts?”

1 Like

What’s your OS? How many cores/threads do you have? And what’s the reason for s.options.threads = 10?

The OS is macOS 14.5, and the machine is Macbook Pro 2021 (m1 max)

AFAICT, the M1 Max has 8 performance cores and 2 efficiency cores, so you don’t want more than 8 DSP threads!

In general, I can imagine that Supernova does not work all that well on Apple Silicon machines because of the new CPU design (performance cores + efficiency cores). I think there is no guarantee that the DSP helper threads will always run on the performance cores and we would really need Use audio workgroups for supernova DSP thread pool. · Issue #5624 · supercollider/supercollider · GitHub.

Anyway, I would be curious to see your results with a lower number of threads, e.g. 8, 6 or 4.

2 Likes

Here is demo video:
demo video

and screenshots:



A small addition - this kind of testing should always be done with a laptop plugged in - when running on battery power, CPU clock speeds are much more uneven so the effective performance you get for audio is much less than the “average” CPU clock speed at any given time. This has historically been a problem with Macs, as they often make more high-power processors feasible on their laptops by extremely aggressive power management - which is fine for most things, but can have an outsized impact on audio processing.

2 Likes

Wanted to just post a benchmark that shows supernova’s strengths a little bit better.

12 × AMD Ryzen 5 5600X 6-Core Processor

I’m defining stable as ‘does not produce xruns while opening and closing the server meter over a one minute period’.

server type num threads num synths stable cpu% per core perf delta relative perf per core
scsynth 1 20 stable 37 100% benchmark
nova 1 19 stable 35 95% 95%
nova 2 30 stable 41 150% 75%
nova 3 33 stable 40 165% 55%
nova 6 37 stable 48 185% 31%
nova 12 37 unstable 48 185% 15%

Things to take away.

  • Using the virtual/hyper cores the cpu reports hinders performance (nova defaults to six not twelve so its fine).
  • Supernova is roughly equivalent to scsynth when run on one core.
  • Diminishing returns are quickly reached.
  • Xruns occur far before reaching a CPU limit, implying that memory access or thread synchronisation is the bottleneck.
  • Across the used cores, supernova’s cpu usage is flat.

If any one else wants to run this and report back, here it is!

(
var numThreads = 6;
var numSynths = 37;
var useSupernova = true;

if(useSupernova) {
	Server.supernova
} {
	Server.scsynth
};

s.options.threads = numThreads;
s.options.memSize = 2 * 1024 * 1024;

~fftsize = 2048;

~createIR = {
	var ir, irbuffer, bufsize, buffer;
	ir = [1] ++ 0.dup(100) ++ (
		(1, 0.999998 .. 0).collect {|f|
			f = f.squared.squared;
			f = if(f.coin) { 0 }{ f.squared };
			f = if(0.5.coin) { 0 - f } { f }
		} * 0.1
	);
	ir = ir.normalizeSum;
	
	irbuffer = Buffer.loadCollection(s, ir);
	
	s.sync;
	
	bufsize = PartConv.calcBufSize(~fftsize, irbuffer);		
	buffer = Buffer.alloc(s, bufsize, 1);
	buffer.preparePartConv(irbuffer, ~fftsize);
	
	s.sync;
	
	irbuffer.free; 	
	buffer;
};

s.waitForBoot {
	Window.closeAll;
	SynthDef(\bigOne, {
		var part = PartConv.ar(Impulse.ar(340), ~fftsize, \irbuf.kr(-1), 0.5);	
		Out.ar(0, part)
	}).add;
	
	s.sync;
	
	~g = ParGroup();
	
	s.sync;
	
	~irbuf = ~createIR.();
	numSynths.do{ 	
		Synth.head(~g, \bigOne, [\irbuf: ~irbuf])
	};	

};
)
3 Likes

Which OS?

As a side note, there is a pending PR of mine that improves Supernova performance on Windows (and under certain circumstances also on Linux): Supernova: thread affinity fixes/improvements by Spacechild1 · Pull Request #5618 · supercollider/supercollider · GitHub

Linux, current dev branch. Oh yes I saw that!

If supernova is, as it seems always at least as performant as scsynth what is scsynth for?