CPU headroom, multi threading

Spacechild1 · May 28, 2024, 8:57pm

AFAICT Supernova was indeed intended as a replacement for scsynth. However, it hasn’t been tested all too well and had many bugs, which made many people hesitate to use it. I have fixed a few nasty bugs in the past, but there are still some open issues.

Also, Supernova has been designed as a singleton, i.e. there can only be a single instance per binary. This means that you cannot reasonably use it in an audio plugin, for example. Scsynth, on the other hand, allows multiple instances via libscsynth, similar to libpd.

Sam_Pluta · May 29, 2024, 9:29pm

This thread made me try and build an multithreaded UGen of 8 sawtooth waves using std::thread in C++. Long story short…this is not efficient at all! I’m happy I got to work though.

Sam

scztt · May 30, 2024, 6:33am

IIRC TimB’s thesis about Supernova has a section discussing things like thread wake-up and synchronization times - I remember these being useful for me even when I didn’t understand the deeper details, because it put a scale on the performance cost of having a multithreaded pipeline at all.

MacOS now has kernel-level ways to synchronize multiple threads to the main audio thread (or, at least, the audio workgroup stuff implies that they’re doing a better job at synchronization…). I’m curious whether there’s any significant performance improvement to be had with a Supernova-like architecture that properly uses these.

scztt · May 30, 2024, 6:40am

Somewhat tangentially, I’ve been cautiously in the market for a new laptop and I was surprised to find that ONE laptop review website actually runs realtime audio tests for most of their reviews. I don’t know the test they’re running very well, and it’s a Window-only thing so it’s unlikely to apply cleanly to Linux or Mac - but this information is still important, and can make a huge difference for an audio laptop, and I don’t really know of any other way to find this (apart from borrowing a laptop and running tests yourself). I’ve literally brought a USB stick in to an Apple store in the past, to try to run SuperCollider dropout tests and see what a 3k laptop upgrade was actually getting me in terms of, you know, grain count - this is a bit easier.

For example: https://www.notebookcheck.net/Lenovo-ThinkPad-T16-G2-in-review-Quiet-office-laptop-with-long-battery-life.739438.0.html#c10090093 (look for the DPC Latency / LatencyMon test)

(If anyone wants to recommend a solid Linux-capable laptop with a big GPU, please DM me!)

Spacechild1 · May 30, 2024, 8:40am

Writing proper multi-threaded code, especially in a (soft)realtime context, is far from trivial. In particular, you can’t really create and join threads on the fly, instead you need to maintain a thread pool and (lockfree) task queue. You’d also need to set the appropriate thread priorities, possibly pin threads to specific cores, etc. Also, as @scztt already hinted, you’d need to make sure that the individual workloads are large enough to outweigh thread wake-up times and context switching costs.

That being said, it is possible to do multi-threaded DSP even within a UGen (or across multiple UGens), but you really need to know what you’re doing. For example, VSTPlugin has an option for multi-threaded plugin processing which can reduce CPU load significantly. (It essentially offloads the plugin processing to helper threads, keeping the main audio thread free for other tasks.)

Sam_Pluta · May 30, 2024, 3:25pm

I assumed the use-case was not ideal. I will try to see how many sawtooth or sine waves it takes per thread to make this efficient. I also didn’t create a threadpool, so I’ll try that as well.

FWIW - the NessStretch UGen does its threading in Rust (9 or 10 simultaneous threads). Maybe it is a more ideal use-case, since it is doing 1000-10000 FFT/IFFT operations in a single audio block. And I imagine the crossbeam crate deals with the pooling internally. Though if not, I am always open to more efficiency.

Sam

Spacechild1 · May 30, 2024, 6:49pm

Also keep in mind that creating/joining threads itself is not realtime safe, so it must not be done on the RT thread, i.e. neither in the process function, nor in the unit ctor/dtor! This basically leaves only two options:

create/join the thread(s) in the plugin load()/unload() functions; the thread(s) are shared between all UGen instances. See the DiskIO UGens, for example.
create/join the thread(s) on the NRT thread. You can dispatch to the NRT thread in the Unit ctor and dtor, e.g. with SendMsgFromRT or DoAsynchronousCommand, but it is not trivial to do this safely.

(In VSTPlugin I actually do a mix between 1 and 2: the DSP thread pool is shared between all UGen instances, but it is created lazily on the NRT thread by the first instance that requests it. This ensures that the threads are only created when needed.)

Again, also make sure that you set the right thread priorities.

When using multithreading libraries, you need to be aware where/when exactly the threads are created. (The same goes for internal memory allocation and system calls!) Only call functions/methods resp. create/destroy objects on the RT thread if you are absolutely sure that it is realtime-safe.

Naturally, all of this applies to Rust as well!

Sam_Pluta · May 30, 2024, 7:45pm

At the risk of derailing this discussion, is there a reference for using the NRT thread inside UGens? Or good examples? VSTPlugin I guess? Anyone have any others - like a basic example?

Sam

Spacechild1 · May 30, 2024, 8:38pm

Not that I know of, I’m afraid. The main issue with SendMsgFromRT or DoAsynchronousCommand is that the UGen might get destroyed while NRT commands are still be pending, which would lead to a crash. There are workarounds, but they are not particularly obvious.

VSTPlugin I guess?

Yes, see the VSTPluginDelegate class in sc/src/VSTPlugin.h · master · Pure Data libraries / vstplugin · GitLab.

Another example would be the AooDelegate class in sc/src/Aoo.hpp · sc · cm / aoo · GitLab (Note: this extension hasn’t been released yet and is subject to change.)

The basic idea is to create a reference counted “stub” that points back to the UGen. The UGen itself holds a copy of this stub and invalidates it in the destructor (e.g. by setting a boolean). NRT command objects also hold a copy of this stub – instead of a raw pointer to the UGen – and when they call back into the RT thread, they can check if the stub is still valid.

Ideally, plugin authors really shouldn’t have to deal with this problem at all. I have some ideas on how to improve the plugin API in this respect, but too little time at the moment to come up with a proper PR.