Handling control/audio rate server side in the calculation function

When calculating a sample block at audio rate, interesting/chaotic things may happen for input parameters that may have both audio and control rate.

So for example in a delay I am making, I get a pointer to the sample block like so

const float *delayTime = in(1);

This is the delaytime parameter of my plugin.

If I then just access it in my for loop like so:

for (int i = 0; i < nSamples; ++i) {
const float time = delayTime[i];
}

Things work as expected if i then plug an audio rate signal into this parameter in SuperCollider, but undefined behaviour occurs when using control rate input or just a literal like 0.5. This is of course because the input buffer I have above for the input at index 1 is a full block, but when getting control rate input it only has a value in the first index (control rate = one value per block).

The question then is - how do you handle this in a nice way?

You could use the calc_FullRate enum for comparison to check the input parameter’s rate and then accordingly switch between processing a block of samples or the first sample in the block.

for (int i = 0; i < nSamples; ++i) {
    const float time =
        (inRate(1) == calc_FullRate) ? delayTime[i] : delayTime[0];
}

What is your preferred way of doing this?

The Server plugins often use different perform routines for different combinations of input rates, e.g. for SinOsc:

void SinOsc_next_ikk(SinOsc* unit, int inNumSamples);
void SinOsc_next_ika(SinOsc* unit, int inNumSamples);
void SinOsc_next_iak(SinOsc* unit, int inNumSamples);
void SinOsc_next_iaa(SinOsc* unit, int inNumSamples);

You can put the actual audio algorithm in a seperate function, so the perform routines become almost trivial.

However, depending on the number of inputs and possible rates, you might get too many combinations. In that case, it is more practical to do some kind of branching in the perform routine.

1 Like

Of course! I didn’t think of dividing it up like that. Only thought of creating control rate and audio rate calc functions. Thanks!

Keep in mind also that the compiler can do a lot of this heavy lifting for you. An rough and sketchy example:

struct Delay {};

float GetControlRateParam(const float* param, size_t /*ignored*/) 
{
    return param[0];
};

float GetAudioRateParam(const float* param, size_t index) 
{
    return param[index];
};

template <auto GetParam1>
void Delay_next(Delay* unit, int inNumSamples)
{
    const float* delayParam; // = in(1);
    for (int i = 0; i < inNumSamples; ++i)
    {
        auto delayTime = GetParam1(delayParam, i);
        // do something with delayTime
    }
}

auto Delay_next_k = Delay_next<GetControlRateParam>;
auto Delay_next_a = Delay_next<GetAudioRateParam>;

int main()
{
    Delay delay;

    Delay_next_k(&delay, 64);
    Delay_next_a(&delay,  64);

    return 0;
}

Since you’re passing the function for fetching your parameter in as a template argument, the compiler knows which fetching function you’re using when it’s compiling - it will inline and optimize based on this, allowing you to more or less write one version of your algorithm that can be used for both cases.

For the control rate version, the compiler will expand auto delayTime = GetParam1(delayParam, i); into auto delayTime = delayParam[0], and (based on a cursory test) is smart enough to hoist this out of the loop entirely.

You can dynamically build versions of your next function pretty easily like this - suppose you rename GetControlRateParam -> Ar … then you could do:

template <auto DelayTime, auto DecayTime, auto Mul>
void Comb_next(Comb* unit, int inNumSamples) { }

auto Comb_next_akk = Comb_next<Ar, Kr, Kr>;
auto Comb_next_kak = Comb_next<Kr, Ar, Kr>;

You can even branch in your code based on the rate of each argument - since the comparison will be known about at compile time, the compiler will go ahead and only generate the branch in question:

if (DelayMax == Kr) {
    // control rate specific stuff.... this disappears if DelayMax != Kr
}

In general, you’ll sometimes have to fiddle with how you structure your code so that the compiler generates the most optimized version - but usually this amounts to small refactorings, much less than trying to hand-roll N different versions for each combination of parameters.

3 Likes

Back in January I played with some code to permute phasor input rates, I am by no means versed in c++ templates so this might be improved upon but at least an example of an implementation that has been known to compile and work. Will compile a version of the next function for each combination of audio/control input rates.

Starts on line 1533 here: https://github.com/esluyter/supercollider/blob/topic/fix-phasor/server/plugins/TriggerUGens.cpp

This is brilliant. I will have to try out this technique of using templates for this. Thanks!

I’m curious scott - how do you tell the results of the compiler optimizations here - are you looking at the assembly code in a debugger or something?

godbolt.org is your friend here. If you’re trying to think through how to compose generics things together into something that can be well optimized by the compiler for a UGen, I would suggest prototyping it compiler explorer. Not need to actually run the code, so you can have stand-in objects or forward declares for everything, or just remove the supercollider-specific objects entirely - it’s really more of a hypothesis-tester.

Here’s an excerpt of something I was playing around with recently - it’s pretty basic but I think illustrative:


(Importantly: you need to enable -O3 in the flags section on the right, else you’re not doing an optimized compile and it won’t really tell you anything interesting)

The question: If I have a struct with some function pointers representing operations, and I assign functions (lambdas in this case) to that struct and then run them in a different context, how much can the compiler optimize this?

You can immediately see two things:

  1. My main function compiles directly down to a return of a constant value, 321. Meaning: if the lambdas I assign to the struct are compile-time constants, then the compiler able to look deep enough to perform the entire calculation at compile time.
  2. The calculate function is still compiled, and has three calls, suggesting that in cases where the compiler DOESN’T know what’s in Graph at compile time, it still has to call all three functions.

Even if you don’t know assembly, you can glean a lot - you can easily see what individual lines of code compile into. Some things are obvious: any presence of call means the thing being called can’t be inlined (which in turn means probably SOMETHING about that operation is opaque to the compiler).

Conversely: if you can get something to compile down to e.g. a single constant value return like my example, it means that the compiler has good insight into the entire tree of operations. For UGen-building, this can be critical: if you can extract values that are consumed by control flow code (e.g. values used in a for loop or if block) out of a function, and instead supply them as template arguments or compile-time constants (if there are e.g. a few well-known values for these), and do your branching when constructing your UGen versus in it’s _next.

This gives you a space you can play around with configurations: for example, if you have a compile-time constant or template param that you’re branching on, e.g. template <bool DoInterpolation>, you can easily verify that the unused branch is compiled out entirely: just specify calculate<false> and the code in the true branch should not appear in the compiled version at all.

Incidentally, compiler explorer is a great way to get a handle on complex template usage stuff: the turnaround time is so short, you can more or less livecode with it, allowing you to experiment until you can figure out the right expression for something.

This is brilliant. Thanks a lot!