SuperCollider on Linux

I just made another test with loading an SC file directly into sclang. This is what the file test.scd contains:

Server.default.waitForBoot({
	500.do({ { SinOsc.ar([200, 202], 0, 0.001) }.play() });
});

If I run this in the IDE I get lots of xruns. If I run it from the terminal with sclang test.scd, I don’t get any xruns.

Yes, I can reproduce this as well. I had to go to 1000.do to create the xruns in the first place, but when I ran the file from the terminal it worked fine.

Jack reported ~25% load when I ran the code from the terminal and ~30% when I ran it from the IDE. Shouldn’t those numbers be the same?

I want to add that using the configuration script, installing a realtime kernel and possibly applying some suggestions by the Ardour manual gives me a really good configured audio system on linux. Of course, installing Ubuntu Studio or AVLinux might give you good results more easily, but it’s certainly not the only way.

That is very bizarre. I can’t even speculate a way to explain that. In theory, the IDE does nothing to touch any audio threads or audio subsystem at all… but obviously the theory doesn’t cover everything.

I’m totally stumped by that.

It makes me wonder about other editors, then (emacs or vim)… same issue?

I’d call it a bug actually, if it can be consistently reproduced. There’s no reason to accept performance degradation based on the editor.

This at least could be substantiated in the source code, by comparing Pd sources against SC’s.

hjh

What is also curious in the case of Pd is that the load displayed in Jack is minimal as compared to SC. In Pd it is only a few percent (and 85% in ps) wheras in SC it is in the order of magnitude than the cpu load reported by ps. Is seems that Pd runs an independent audio thread and then only copies the frames into Jack. But this is just a wild guess, as I am not familiar with the source code of both.

This is correct! By default, Pd uses a “polling scheduler”, which means that the scheduler runs in a dedicated thread and communicates with the actual audio callback via a lockfree ringbuffer. This means the audio callback does almost no work. The size of the ringbuffer (= “delay” in Pd’s audio settings) introduces additional latency to compensate for CPU spikes and non-RT-safe operations. This is necessary because Pd runs DSP and control code deterministically in the same thread.

I compiled SC without QT in order to run sclang headless. In this case I can pump up the number of synths to 4000, resulting in a ps CPU load of 85% in sustained operation. At that load I start to get a few xruns every now and then (I added a line in the xrun callback in SC to also have them reported in the console). This is the kind of behaviour I was expecting and which I get with pd and my test client.

So it all seems to have to do with GUI code interrupting the audio processing.

1 Like

Something funny about DSP load indicators. With a not powerful nor configured laptop, 256 buf size, I have 5%+ DSP load in jack with nothing running, if I run the yes command alone in a terminal it goes down to 2%+ …why?

LOL, I get xruns purely from starting jackd with 64 sample frames and 2 periods, without even a single client connected; that scripts is all green but inotify and RT kernel. Any pointers for hassle free installation of RT kernel on Debian?

Regarding performance IDE vs terminal - without particular knowledge of the internals, I would be surprised if the GUI threads interfered with the scsynth audio thread - it’s an entirely independent process. I would guess that perhaps the system just runs a number of additional threads when the IDE is used. Perhaps this can be observed with top or something similar?

Great that you have narrowed it down! I wonder how hard it would be to fix the underlying issue. I suppose everyone would prefer the GUI lagging over having an xrun, but I still don’t understand why the two would even be connected (unless the two share a CPU core and that core is at 100%).

What is also curious in the case of Pd is that the load displayed in Jack is minimal as compared to SC.

This would also have implications for running multiple servers to make use of several cores right? If the load is kept out of Jack adding parallel processes should be fine, but if the load is somehow shared more with Jack then maybe expanding to multiple servers wouldn’t yield the desired outcome if Jack itself acts as a bottleneck.

@eckel

Can you post the top output of all relevant threads (scsynth, sclang and scide)? Then we can have a look at the thread priorities.

@ludo

Jack clients run independently from the Jack server.

Jack clients run independently from the Jack server.

So the difference between how Jack reports its load for puredata and SuperCollider has no real practical implication, it is just reported differently?

I have explained the load difference here: SuperCollider on Linux - #30 by Spacechild1

Turbo boost, sorry for the noise.

Now I can see (with everything green except preemption) DSP peaks and xruns starting at 40~50% (onboard sound card) and 50%60 (external sound card)[1] constant cpu load in one core thread. With 256 or 512 buffer size is the same behavior, 40~50% cpu gives xruns. But I can’t confirm if the IDE is the issue, I have the same behaviour running a script from the terminal.

[1] Everything is configured the same and the test is the same, the usb devices in my laptop seem to require more cpu.

I have seen this with some plugins running with wine, low cpu and some xruns, it would be nicer if that wasn’t the case, and I don’t like my laptop.

I would be surprised if the GUI threads interfered with the scsynth audio thread

I find that I can provoke xruns by moving windows around or rapidly maximizing and minimizing windows even when the CPU usage on one core is moderate and the others are at 0%. So there seems to be some interaction, somehow. I agree that it is surprising.

can you also post the output when running scsynth from scide?

Another test:

  • Set the CPU governor to “performance.”
  • I didn’t set a small buffer size – I went with 1024 because my intent was not to provoke xruns, but rather to see if there was an observable difference under a normal, usable setup.
  • Load my live coding environment.
  • free all of the instruments except one.
  • Play a specific pattern at a specific tempo – because of the Voicer, this will cap at 20 waveguide synths (so there’s a consistent load to measure).

Then I did this in three environments: A/ IDE, B/ commandline sclang, but running my performance load script (which creates sclang GUI windows), and C/ commandline sclang issuing live-coding commands directly (no GUI windows).

A minus B = IDE
B minus C = Qt windows belonging to sclang

After making sure I controlled for all variables (in a couple of tests, there were some mixers I forgot to delete), I found that all three environments reported about 31% CPU usage (from scsynth, which I believe is the most relevant measurement, since it’s actual execution time for one hardware block / hardware block duration).

I repeated the tests with hardware buffer = 256 samples, and found

A. (IDE) ~32% (but with occasional xruns – built-in soundcard, which I know to have suboptimal performance)
B. (sclang + GUI) 32.5%
C. (sclang no GUI) 32-32.5%

One conclusion that might be drawn is that perhaps the performance at 64 samples doesn’t predict real-world performance of a system that is actually usable on stage. In a performance, I typically use 256 samples (which works well on my system with a USB soundcard), and in that configuration, I find that GUIs introduce no difference in performance that is outside of a reasonable margin of error. (When the readings for one environment fluctuate within +/-1 or 2%, a half-percent difference between A, B or C doesn’t matter.)

hjh