Scsynth underperforming compared to other Jack clients

It would also be interesting to see if there’s a significant difference in performance between scsynth and supernova with the same sclang code

Supernova crashes due to bad alloc, in different version, with James test…

I am not sure if you were asking me, @lucas. In any case, my test machine is not a laptop but an Intel NUC Mini-PC. When I run the test, lscpu reports around 4 GHz.

It seems to perform well with that number. The hypothesis is that throttling degrades performance the same way turbo does. Something I noticed y my laptop, if one core gets loaded, temperature raises to 90~100° and the frequency goes down to 2ghz top. That also affect the DSP load percentage, it seems to constantly adapt to the top frequency. So, it’s very difficult to measure real performance in terms of DSP load. That’s kind of complex and sad, I can’t know what I’m buying, it is supposed to be better but is not.

And for that matter, I had a cpu of 2 or 3 previous generations that performed at 2 ghz and 60 degrees without fan noise. I can’t explain how disappointing it was to spend that money.

It won’t. Supernova does no parallel processing at all, unless you put nodes into a ParGroup.

Without ParGroups, supernova might actually clock in a little slower than scsynth.

Revised for supernova (edited to remove an unrelated experiment):

(
var num = thisProcess.argv[0].asInteger;
var group;

Server.supernova;

Server.default.waitForBoot({
	SynthDef(\sin10, { |out = 0|
		Out.ar(out,
			(SinOsc.ar(
				NamedControl.kr(\freq, Array.fill(10, 440))
			).sum()).dup()
		)
	}).add();
	group = ParGroup.new;
	Server.default.sync();
	n = Array.fill(num, {
		Synth.new(\sin10, [freq: Array.fill(10, { exprand(200, 800) })],
			target: group, addAction: \addToHead)
	});

	{ ReplaceOut.ar(0, In.ar(0, 2) * 0.001) }.play(target: group, addAction: \addAfter);

	loop {
		[s.avgCPU, s.peakCPU].postln;
		1.0.wait;
	};
});
)

I couldn’t reproduce this crash. But I have seen supernova crash when confronted with large numbers of groups being created and reordered (which is why I don’t dare use it in a show anymore).

~70% is already quite excellent. It’s indeed puzzling that you couldn’t get this at first. I can understand how that would erode confidence in the system, leading to more testing to be sure these results will hold.

But… bumping it up a few more percentage points may quickly become a matter of diminishing returns.

Also, comparing against your Mac results (since you said earlier that you wished to match the performance on your Mac):

… 2500 or 3000 more sines in Linux, at a lower CPU reading. Hm. It occurs to me that the Mac test may have been at 64 samples, so I may be drawing a wrong inference.

hjh

I find these tests interesting, but I’m not sure how informative they are.
SinOsc (for instance) does a lot. It checks inputs, has interpolation checks, etc
scsynth does a lot - it manages (even if you aren’t using them) a chunk of memory for control and audio buses, etc.
So I’m not sure what the comparison is really showing?

All these comparisons were a (possibly avoidable) detour to get to the bottom of the phenomenon that triggered this thread and it’s predecessor:

When using a small Jack blocksize (e.g. 64), the update of the CPU load numbers in scide causes xruns, at least on my machine.

I could verify this in two ways:

  1. Run the same code from the IDE or by loading it directly into sclang (see example below). In the former case xruns appear at a much lower CPU load than in the latter case.

  2. Disabling the load display in the source code. With the so modified IDE I could get almost the same performance than when loading the test code directly into sclang.

Below you find my test code, which is designed to load the CPU unevenly. This can be verified by comparing it to another Jack client which produces a very even load. The test client can sustain higher CPU loads without causing xruns. All depends of course on the Jack blocksize, as xruns are more likely to happen with smaller blocks. See the test results earlier in this thread.

My initial goal was to see how much I can do in SC in Linux with a small blocksize and I was frustrated by the fact that - if I ran my code in the IDE - I could do very little (num = 200 in the code below). When run directly in sclang from the command line I can run 10 times more: sclang test.scd 2000. I tuned my numbers (200 and 2000) in order to be able to run for at least one minute without an xrun.

( // file: test.scd
var arg1, num;

arg1 = thisProcess.argv[0];
arg1.isNil().if({
	num = 200;
}, {
	num = arg1.asInteger();
});

Server.default.quit();
Server.default.options.maxNodes = 4096;
Server.default.options.memSize = 16384;

Server.default.waitForBoot({
	num.do({ { SinOsc.ar([200, 202], 0, num.reciprocal()) }.play() });
});

)

Hmmm… Some conjecture. I’m no expert on thread load balancing, but something doesn’t make sense to me here.

The GUI update and the audio thread must obviously be different threads, since they belong to different processes. We’re assuming that the OS’s load balancer would put these on different cores. But if the GUI update is blocking the audio thread, that would suggest that these threads are ending up on the same core. Which is… :confused: why. Why would the OS do that, and do it consistently, when there are 11 other cores it could use?

I could be missing something, but I thought that was the whole benefit of multicore chips for real-time audio environments – if one core is too busy with DSP, user interaction could offload to another core and run literally concurrently.

I guess if it were me, I’d look up processor affinity commands to see if I could pin scsynth to one or two cores, and scide/sclang to a couple different cores – try to forbid the OS from doing IDE drawing on the same core.

At least from my naive understanding of multicore hardware, this is not in the slightest expected behavior. (But I may well be misunderstanding – I’m not considering system threads in this speculation.)

hjh

Yes, I am puzzled too. I have tried to keep scide and scsynth on different processors with taskset but this didn’t help. Minimizing the IDE does. Maybe it is related to some weird behavior of Qt?

Right, my question wasn’t about running with ParGroups/on multiple cores, it was about having something very similar to scsynth to test with. It would be quite interesting, for example, if the same sclang code performed very differently on one core with supernova than it did with scsynth - pointing to an issue with scsynth, instead of likely elsewhere.

It turns out that also printing to the post window once a second has a similar effect on my machine, i.e. to provoke xruns as compared to not posting.

I thought that rolling one’s own server under the nose of the IDE would solve the problem, but it does only partially. Printing to the post window allows me to go up to 400 when run from the IDE and up to 1900 when run from the terminal.

It must be said that any update in the window system will provoke an xrun in this test, but printing to the terminal does so much less than printing to the IDE post window. The latter is something that might be problematic also in other situations, i.e. with larger block sizes.

Here my test code:

(
var arg1, num, options, server;

arg1 = thisProcess.argv[0];
arg1.isNil().if({
	num = 400;
}, {
	num = arg1.asInteger();
});

options = ServerOptions.new();
options.maxNodes = 4096;
options.memSize = 16384;

server = Server.new("myserver", NetAddr.new("localhost", 57115), options);

server.waitForBoot({
	num.do({ { SinOsc.ar([200, 202], 0, num.reciprocal()) }.play(server) });
	loop {
		"% %\n".postf(server.avgCPU.round(0.1), server.peakCPU.round(0.1));
		1.wait();
	};
});
)

Yes, this is a very good idea. I tried but failed with:

terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
Server 'localhost' exited with exit code 0.

when running the code below with sclang test-supernova.scd 2000. It works with 1000.

(
var arg1, num;

arg1 = thisProcess.argv[0];
arg1.isNil().if({
	num = 200;
}, {
	num = arg1.asInteger();
});

Server.supernova;

Server.default.quit();
Server.default.options.maxNodes = 8192;
Server.default.options.memSize = 32768;

Server.default.waitForBoot({
	var group;

	group = ParGroup.new();
	num.do({ {
		SinOsc.ar([200, 202], 0, num.reciprocal());
	}.play(target: group, addAction: \addToHead) });
});

)

That GUI processing causes xruns and that small buffer sizes like 64 causes lots of xruns with a good sound card, that works fine in other systems, are all Linux problems, I’m quite sure about it by experience. On the internet, I see recipes and advice and people saying they achieved it or not but no real results. Is it a distro problem? Is it the kernel scheduling model? Is it the drivers? Is it Jack? Is it sunday? I don’t know.

@eckel An idea, can you test the very same with other os/distro in the same machine? That could give you some other clues and a bigger picture. But it requires some work, e.g. I wouldn’t do it myself (there are two things I dislike about electroacoustic music: cables and system configuration).

I also guess that in every system it will be a point of instability, then the problem may be that you want an extreme setup with variables you can’t control. This is my advice to you, most people set things up so they work fine for most cases. Nonetheless, I like these threads because they are very informative and something can always arise.

Not any GUI processing does and not any real time process reacts with xruns. When I run this naive Pd test patch, I can print to the Pd post window 100 times a second and have the audio thread load the CPU with 80% without causing a single xrun. Pd runs with Jack at a blocksize of 64 at 44.1 kHz and a delay setting of 1ms.

This is not the same as a JACK buffer size of 64. The JACK buffer size is independent of the size of a Pd control period.

hjh

I meant the Jack blocksize. Both the Pd blocksize and the Jack buffersize are 64.

Ok, thanks for clarifying that. It isn’t necessarily obvious that they are separate.

It may be that Qt is more CPU expensive than Tcl/Tk. (One thing that annoys me to no end in Pd is its antique pixelated appearance in Linux and Windows – c’mon, it’s 2021 – but, if Tcl/Tk doesn’t antialias and Qt does, then Qt is doing more work. But why that work has to be high priority enough to bomb an audio thread, I don’t know.)

hjh

Just a quick note - the std::bad_alloc error is reported here: supernova: silence/crash when starting many synths · Issue #1840 · supercollider/supercollider · GitHub
TL;DR this happens on supernova when it runs out of realtime memory with many nodes. Increasing server.options.memSize “fixes” it.

Ah, I see that I misread the post – I should have grouped “Pd, with Jack at a blocksize” together, rather than reading it as “Pd, with Jack, at a [Pd] blocksize…” :man_facepalming:

Over the years, I’ve seen a lot of confusion in both SC and Pd forums over control blocks vs audio buffers (I misunderstood some aspects myself, for a long time) and I saw the term “block” through that lens… which was not right in this case. I’m sorry about that.

May I suggest one other benchmark? To run the test in command-line sclang, but also open a server status window with Server.default.makeGui – this will also trigger Qt string drawing, but in sclang rather than IDE. Maybe there’s a difference (in thread priority?). Sclang forces all GUI operations onto a lower priority thread… if you can get more juice with a 64-sample buffer and sclang GUI work vs the IDE, that might tell us something.

hjh