SuperCollider on Linux

@ludo

Jack clients run independently from the Jack server.

Jack clients run independently from the Jack server.

So the difference between how Jack reports its load for puredata and SuperCollider has no real practical implication, it is just reported differently?

I have explained the load difference here: SuperCollider on Linux - #30 by Spacechild1

Turbo boost, sorry for the noise.

Now I can see (with everything green except preemption) DSP peaks and xruns starting at 40~50% (onboard sound card) and 50%60 (external sound card)[1] constant cpu load in one core thread. With 256 or 512 buffer size is the same behavior, 40~50% cpu gives xruns. But I can’t confirm if the IDE is the issue, I have the same behaviour running a script from the terminal.

[1] Everything is configured the same and the test is the same, the usb devices in my laptop seem to require more cpu.

I have seen this with some plugins running with wine, low cpu and some xruns, it would be nicer if that wasn’t the case, and I don’t like my laptop.

I would be surprised if the GUI threads interfered with the scsynth audio thread

I find that I can provoke xruns by moving windows around or rapidly maximizing and minimizing windows even when the CPU usage on one core is moderate and the others are at 0%. So there seems to be some interaction, somehow. I agree that it is surprising.

can you also post the output when running scsynth from scide?

Another test:

  • Set the CPU governor to “performance.”
  • I didn’t set a small buffer size – I went with 1024 because my intent was not to provoke xruns, but rather to see if there was an observable difference under a normal, usable setup.
  • Load my live coding environment.
  • free all of the instruments except one.
  • Play a specific pattern at a specific tempo – because of the Voicer, this will cap at 20 waveguide synths (so there’s a consistent load to measure).

Then I did this in three environments: A/ IDE, B/ commandline sclang, but running my performance load script (which creates sclang GUI windows), and C/ commandline sclang issuing live-coding commands directly (no GUI windows).

A minus B = IDE
B minus C = Qt windows belonging to sclang

After making sure I controlled for all variables (in a couple of tests, there were some mixers I forgot to delete), I found that all three environments reported about 31% CPU usage (from scsynth, which I believe is the most relevant measurement, since it’s actual execution time for one hardware block / hardware block duration).

I repeated the tests with hardware buffer = 256 samples, and found

A. (IDE) ~32% (but with occasional xruns – built-in soundcard, which I know to have suboptimal performance)
B. (sclang + GUI) 32.5%
C. (sclang no GUI) 32-32.5%

One conclusion that might be drawn is that perhaps the performance at 64 samples doesn’t predict real-world performance of a system that is actually usable on stage. In a performance, I typically use 256 samples (which works well on my system with a USB soundcard), and in that configuration, I find that GUIs introduce no difference in performance that is outside of a reasonable margin of error. (When the readings for one environment fluctuate within +/-1 or 2%, a half-percent difference between A, B or C doesn’t matter.)

hjh

I was running scsynth from scide but didn’t realise scide was not displayed. I have now filtered for scsynth, sclang, and scide separately.

While doing all that and posting this reply in Firefox, I get xruns with running this piece of code from the IDE with the 64 frames and 2 periods Jack setting and an external USB sound card:

Server.default.waitForBoot({
	200.do({ { SinOsc.ar([200, 202], 0, 0.001) }.play() });
});

As reported before, if I run sclang and scsynth headless, I can do 4000.do without xruns.

Inspired by Jame’s test, I used 1024 frames and 3 periods for my simple test. I start to get xruns with this run from the IDE:

Server.default.waitForBoot({
	2000.do({ { SinOsc.ar([200, 202], 0, 0.001) }.play() });
});

Jack (and SC, which uses the Jack load) and htop report a load of about 35%. Too little in my opinion to accept xruns. There has to be a way to improve this if pd and other Jack clients can do it and SC can do it when run headless.

Just out of curiousity, can you post the htop view when running Pd instead of Supercollider?

Jack 64/2, 44.1 kHz, USB audio interface. Pd blocksize: 64, delay: 1ms. No xruns.

I’m curious how many sinewaves the Pd patch is running?

EDIT: Also: Tried with 256. (I can’t meaningfully run my built-in soundcard at 64 or 128.)

(
SynthDef(\sin, { |out, freq = 200, amp = 0.0001|
	var sig = SinOsc.ar([freq, freq+1]) * amp;
	Out.ar(out, sig);
}).add;
)

(
t = Task {
	s.options.maxNodes.do { |i|
		if(i % 100 == 0) { i.postln };
		Synth(\sin, [freq: exprand(200, 800), amp: 0.0001]);
		0.05.wait;
	}
}.play;
)

I hit maxNodes at about 35% CPU with a couple xruns reported, but not a cascade of them.

I have to suspect system tuning in your case. Something in my system is giving SC sufficient priority to run well, and you seem not to have that something in your system. I’m afraid I don’t know what that thing is, though.

hjh

It runs 7000 sine waves.

Gerhard

Can you make any sense of these htop number?

I just thought of one other thing.

{ SinOsc.ar }.play is not only a sinewave. It’s actually a sinewave plus an envelope generator and a multiplier (and if mul != 1, actually two multipliers). At least this.

If Pd is simply summing sine waves while the SC test is structured around Function:play, then SC is being asked to do more work. It’s hard to evaluate this as the contents of the Pd patch haven’t been shared.

Benchmarking really should avoid convenience methods (as these are a good way to leave uncontrolled experimental variables floating around).

hjh

Yes, I agree with your observations. My main goal with all these tests is to use a maximum of performance per core. The actual number of sine waves is not so important for me but to know that I don’t waste resources. In the case of Pd this goal seems to be met better in my current configuration.

@eckel

One difference I noticed is that pd runs with a lower RT priority (-7) than jackdbus (-11), whereas scsynth runs with a much higher RT priority (-51).

Can you also show the htop output when running scsynth headless?

I’ve hesitated adding my own anecdote here because it’s just one data point. But it seems like it may be relevant:

I bought one computer in ~2014 and used it for years with SC (and didn’t update the SC version much). I bought another machine in 2018/2019 which was much faster (much better CPU, much more RAM, etc), but the newer machine was much more prone to hitting xruns. So much so that I switched back to the 2014 computer to play live shows.

I ran the Realtime Config Quick Scan on both machines and they had exactly the same results. Jack and scsynth settings were identical. Both using the same external USB audio interface.

I don’t know if it was the hardware or the SC version or something else, but notably I was not using the GUI at all - just running scsynth/supernova on the command line.

I’ve still got both setups so I can run benchmarking tests if they’d be helpful.

Thanks for sharing. I think this investigation is important. I agree that it is tricky with this kind of anecdotal evidence, but I think @eckel is doing a good job of providing straightforward numbers to support the idea that something is up. Perhaps we could put together some form of test suite and then rally some users with different machines and setups to run them to get a bigger data set and thereby understand the problem better?

1 Like