How bad are mutex in plugins and what is the alternative?

Hello everyone

I am in the process of porting som really nice virtual analog BBD delay code to mkplugins. The original code seems to have been written for juce and the like and so I have spent some time making it more realtime safe by moving the delay line allocation stuff to using RTAlloc, but I have come across a bit of a problem. The delay has a filter on it’s input and output and this filter is written in a pretty fancy way using unique pointers and mutex:

In the Writing Unit generators help file it says: “If synchronization with other threads is required, this has to be done in a lock-free manner.” So I assume this mutex stuff is faux pas (but why is that - is it because we want supernova to handle this stuff for us if necessary to avoid surprises?).

Anyway - my main question is: Does anyone have ideas for rewriting this in a non mutexey and sc friendly manner?

Since moving the delay line allocation stuff to RTAlloc/RTFree style I have not experienced any performance issues with it but this is only anecdotal since I have only tested on one laptop/system.

Thanks for the help and the enlightenment!

The delay has a filter on it’s input and output and this filter is written in a pretty fancy way using unique pointers and mutex:

The code in question seems to cache the filter coefficients, so they can be shared by several plugin instances. The mutex is there to make sure it’s always threadsafe. In the original MK plugins the filter coefficient computation is probably always done on the UI thread, in which case RT-safety wouldn’t be an issue. However, a SuperCollider UGen constructor runs on the audio thread, so you have to be careful.

Regarding the mutex: UGen constructors never run in parallel - not even in Supernova! - so you wouldn’t need the mutex at all, assuming that you only compute the coefficients once in the constructor.

(Curiously, supernova uses a spinlock when running UGen constructors [see https://github.com/supercollider/supercollider/blob/e4743da23948f6ce1cfe7750273814d0f489cc20/server/supernova/sc/sc_synth.cpp#L127], which I don’t think is even necessary, as OSC bundles are always dispatched on the main audio thread - before building the DSP graph and waking up the helper threads [see https://github.com/supercollider/supercollider/blob/e4743da23948f6ce1cfe7750273814d0f489cc20/server/supernova/server/server.hpp#L203]).

Now, the bigger problem is actually the hidden dynamic memory allocation in std::vector and std::unique_ptr. As you probably know already, it internally calls malloc, which is forbidden on the RT thread. You have to use RTAlloc and friends. (Actually, you can use the STL containers with custom allocators, but often it’s just easier to allocate manually).

In the Writing Unit generators help file it says: “If synchronization with other threads is required, this has to be done in a lock-free manner.” So I assume this mutex stuff is faux pas (but why is that - is it because we want supernova to handle this stuff for us if necessary to avoid surprises?).

It’s because locking a mutex is not deterministic. If you try to lock a mutex that is already locked, the OS will put the thread to sleep for an indeterminate amount of time - which is certainly not want you want the audio thread to do! Here is a very good introduction on this topic: Ross Bencina » Real-time audio programming 101: time waits for nothing

(but why is that - is it because we want supernova to handle this stuff for us if necessary to avoid surprises?)

Supernova internally uses spinlocks for thread synchronization. See also the ACQUIRE_BUS_AUDIO, ACQUIRE_BUS_CONTROL and LOCK_SNDBUF macros in Unit.h. In a nutshell: a spinlock doesn’t put the thread to sleep, instead it will keep spinning until the lock is available. Note that the critical section should be as small as possible to avoid burning CPU cycles and must not contain potentially blocking function calls. Generally, spinlocks should be used carefully and only when absolutely necessary.

Since moving the delay line allocation stuff to RTAlloc/RTFree style I have not experienced any performance issues with it but this is only anecdotal since I have only tested on one laptop/system.

I’m not sure what you mean by “delay line allocation” because the code in question is only about allocating filter coefficients…

1 Like

Of course, the easiest solution would be to just allocate and calculate the filter coefficients in each UGen instance. I don’t think people would create hundreds of virtual analog Delay UGens… How many filter “steps” are we talking about?

1 Like