Figuring out how to develop SC plugins

hey, after spending some time setting up a c++ developer environment in VSCode i have tried to implement my rung divisions prototype from gen~ in c++. I have implemented parts of that in c++ and build it using the cookie cutter plugin.
Im already getting the nice contrapuntal shapes from the reversed encoded 3-bit and 8-bit outputs :slight_smile:
The shift register here uses a trick to double its length and to get these initial symetric shapes. If chance is 1 then you are always filling it with 1’s, but because of the xor that means once it loops around the xor will set them to zero again, then at the next loop it sets them to 1 again.

(
{
    ShiftRegister.ar(
        freq: 32,
        chance: 1,
        length: 8,
        direction: 1
    );
}.plot(1);
)

5 Likes

Im currently working on a solution to use the 3-bit output to modulate the trigger frequency of the scheduling phasor which triggers the shift register in a feedback loop (with the modDepth param: thats rungler style benjolin / blippo Box ) and use the 8-bit output for the data input with the chance param in a feedback loop. Thats already working fine :slight_smile:

(
{
    ShiftRegister.ar(
        freq: 32,
        chance: 0.5,
        length: 8,
        direction: 1,
		modDepth: 2,
	);
}.plot(1);
)

I think it would be nice to have one chaos param which controls both the 3-bit output for feedback FM of the trigger frequency and the data input which gets xored via the change param with the 8-bit output in a feedback loop (8-bit output in a feedback loop instead of the random noise). So modDepth and chance would be replaced by a chaos param normalized between 0 and 1.
There is just one design problem with that, which is: to what state should the register be initialised?

The default initializes with 0 and when chance is initially set to 1, we get the triangle “shark fin” pattern because of the xor trickery.

If we initialize with 1 and set chance to 0 we just get one bit moving across the register, but the output isnt properly scaled between 0 and 1 then. To get a normalized ramp pattern between 0 and 1, we would have to have the initial bit moving across the register but for every step the register advances it puts in a 1 before the initial bit and at the moment where all bits are 1s they should reset to 0.

Additionally if we make one chaos param which controls both the moddepth of the 3-bit modulation and the chance of the 8-bit feedback then the initial state with chance = 1 would include frequency modulation of the trigger frequency, thats not giving us a predictable initial state (e.g. shark fin pattern).
Maybe just implement a data / seed param and dont care about the initial state.

This video explains some fundamentals about shift registers: https://www.youtube.com/watch?v=va2XAdFtmeU
Unfortunately this one about the Rung Divisions doesnt: https://www.youtube.com/watch?v=iUCxxVFRcaw&t=325s
and the sound examples here are also way more interesting: Stream Fancy Synthesis | Listen to rung divisions playlist online for free on SoundCloud
https://www.fancysynthesis.net/

3 Likes

i have also been investigating to have an interpolationMix param for linear interpolation between the stepped outputs and cubic interpolation (with interpolation overshoot is a common problem). That worked fine. But i think this should be an additional ugen, where you could plug in your arbitrary stepped signal (but have to pass the phase as an argument). I have investigated different interpolation formulas: the cubic interpolation used in LFDNoise3 brute forces the overshooting solution by multiplying with 0.8, dont know if thats the best way to go about that. There are others with less overshoot.

a signature @nathan lfo is (switches between interpolation and no interpolation):

var lfo = { |rate|
	var toggle = ToggleFF.ar(Dust.ar(rate));
	Select.ar(toggle, [
		LFNoise2.ar(rate),
		TRand.ar(-1, 1, Dust.ar(rate * 3))
	]);
};
4 Likes

ive recorded a short test snippet using the ramp output for deriving scheduling triggers and the 3-bit and 8-bit for modulating the frequency of two pulsar streams with additional pm cross modulation.You get some nice canonical structures from the reverse encoded outputs :slight_smile:

5 Likes

the first prototype is working fine now (here are just the current arguments):

(
{
    ShiftRegister.ar(
        freq: 1000,
        chance: 1.0,
        length: 8,
        rotate: 1,
		data: 0,
		fbIndex: 0,
		fbSource: 0,
	);
}.plot(0.021);
)

You can set a trigger freq and fbIndex for feedback FM from either the 3-bit output or 8-bit output selectable via fbSource. You can rotate the shift register to the left or the right via rotate (between -16 and 16), set the current number of bits for shift register via length between 1 and 16, set the initial shift register seed via data and after extracting the LSB (least significatn bit), you can XOR that with a noise source (random values for every sample between 0 and 1) via the chance param, which when data = 0, length = 8, rotate = 1 and chance = 1 results in the first 8 steps of a rising shift register pattern until it hits 1 and for the next 9 - 16 steps to a falling shift register until i hits 0 (resulting in that stepped “shark fin” triangle shape, which is effectively doubling the length of the shift register via xor trickery).

1 Like

I have additionally studied the ChaoticCore of the BlippoBox and the Benjolin and figured out one idea which might be interesting and working currently on a solution for that:

In the Benjolin you have 2x pairs of PulseOscs and 2x pairs of TriangleOscs and your Rungler, which is basically a 3-bit encoded and normalized ShiftRegister (only capable of left shift one step per trigger (no rotation param for left and right shift with multiple steps), with hardcoded seed data input of 0 and chance set to a harcoded value).

One PulseOsc is triggering the ShiftRegister and the other PulseOsc is the input to sample (what we had already with the random values per sample but with fixed chance value). The output of the encoded and normalized ShiftRegister is then used for FM/PM of the PulseOcs and the TriangleOscs. The TriangleOscs your “Havoc waves” are then your dual output Oscs.

One idea i had was to setup two RampSignals between 0 and 1, with control over freq1 and freq2 and use one of them to clock the ShiftRegister and the other as the input to sample. The Rungler output would be scaled by fmIndex1 and fmIndex2 and be used for FM/PM of the RampSignals in a feedback loop. Then you dont even need two sets of PulseOscs and TriangleOscs, you can just use the RampSignals to clock and be the input to sample and also be the output. Then you could use these coupled-FM RampSignals to drive your favourite bandlimited and oversampled wavetable Osc. Currently working on that, will keep you in the loop (i can imagine putting together a Chaos Utils Ugen Suite. Probably would need a bit of help here with best practice Ugen design and building on different platforms (im on windows)). There is yet one other idea i have, more about that later.

2 Likes

Here is a first prototype for the Rungler, outputting two ramp signals between 0 and 1, where one clocks the shift register (via rampToTrig function) and the other flips the extracted least significant bit at > 0.5. The third output is the encoded unipolar 3-bit output of the ShiftRegister. I have implemented cubic-interpolation for the bipolar 3-bit output (just used internally) which then does phase modulation of the ramp signal in a feedback loop via controlable via fbIndex, the modulators are then additionally put into a tracking one-pole filter for different PM feedback flavours controlable via the fltRatio. The ramp signals could then both be used to drive OscOS. Lets see :slight_smile:


(
{
	
	var freqA = 441;
	var freqB = 231;
	var fbIndexA = 2;
	var fbIndexB = 1;
	var fltRatioA = 2;
	var fltRatioB = 1;
	
	var rungler = Rungler.ar(
		freqA,
		freqB,
		fbIndexA,
		fbIndexB,
		fltRatioA,
		fltRatioB
	);
	
	[
		rungler[0], 
		rungler[1], 
		rungler[2]
	];
	
}.plot(0.021);
)

crazy looking waves :smiley:

2 Likes

Here a some short test snippets where rampOutA and rampOutB drive a separate OscOS (with 4x oversampling) which are then mixed together and then plugged into a disperser and NonlinearFilter (in lowpass mode with tanh distortion in the feedback path), where the runglerOut is used to modulate the cutoff frequency.



1 Like

Another possible Ugen which im interested in developing right now, is a dual wavetable oscillator with cross modulation based on OscOS, basically a complex wavetable oscillator.
This is the holy grail to me right now, it would fit into my pulsar instrument really nicely!
Im already using a version of that build in SC:

	// Create feedback loop
	fbChannels = LocalIn.ar(2 * numChannels);

	// Generate carrier for chain A with cross PM
	pmods_A = (0..numChannels - 1).collect{ |fbChan, i|
		var pmIndex = param.(\one, \xmIndex, 0, ControlSpec(0, 3));
		var pmFltRatio = param.(\one, \xmFltRatio, 1, ControlSpec(1, 5));
		var pmod = fbChannels[fbChan] / 2pi * pmIndex;
		~unitShapers.onePoleFilters[\lpf].(pmod, grainData_B[\grainSlopes][i] * pmFltRatio);
	};
	grainOscs_A = wavetableOsc.(\one, 0,
		(grainData_A[\grainPhases] + pmods_A).wrap(0, 1)
	);

	// Generate carrier for chain B with cross PM
	pmods_B = (numChannels..numChannels * 2 - 1).collect{ |fbChan, i|
		var pmIndex = param.(\two, \xmIndex, 0, ControlSpec(0, 3));
		var pmFltRatio = param.(\two, \xmFltRatio, 1, ControlSpec(1, 5));
		var pmod = fbChannels[fbChan] / 2pi * pmIndex;
		~unitShapers.onePoleFilters[\lpf].(pmod, grainData_A[\grainSlopes][i] * pmFltRatio);
	};
	grainOscs_B = wavetableOsc.(\two, 1,
		(grainData_B[\grainPhases] + pmods_B).wrap(0, 1)
	);

	LocalOut.ar(grainOscs_B ++ grainOscs_A);

But the SynthDef has to run at blocksize 1 to make full advantage of that, which is not really an option with multichannel expansion and oversampling, way too much CPU (about 60-80%) and using feedback with FM/PM with blocksizes other than 1 is not a different flavour its just awful! This would make a great fit to the possible Chaos Ugen Library.

DualOscOS should have two phases as arguments phaseA and phaseB, which should be cross-modulated via PM and pmIndexA and pmIndexB (not FM, you wouldnt have access to the phases for granulation and FM is not behaving well for feedback) and an additional tracking OnePole lowpass filter with adjustable filterRatioA and filterRatioB in the feedback path for well behaving PM (check out the DX7 patent) and different PM flavours (like we had before with the Rungler).

This is cross modulation for pulsar synthesis / granulation using exponential FM instead of linear through-zero PM: https://dafx2020.mdw.ac.at/proceedings/papers/DAFx2020_paper_61.pdf


	// Create feedback loop
	fbChannels = LocalIn.ar(2 * numChannels);

	// oscillator A with cross-coupled exponential FM
	fmods_A = (0..numChannels - 1).collect{ |fbChan, i|
		var fmIndex = param.(\one, \xmIndex, 0, ControlSpec(0, 3));
		var fmFltRatio = param.(\one, \xmFltRatio, 1, ControlSpec(1, 5));
		var fmod = fbChannels[fbChan] * fmIndex;
		~unitShapers.onePoleFilters[\hpf].(fmod, grainData_B[\slopes][i] * fmFltRatio);
	};

	grainPhases_A = ~grainFunctions.multiChannel[\rampSubSample].(
		triggers: triggers,
		slopes: grainData_A[\freqs] * (2 ** fmods_A) * SampleDur.ir,
		subSampleOffsets: subSampleOffsets
	);
	grainOscs_A = wavetableOsc.(\one, 0, grainPhases_A.wrap(0, 1));

	// oscillator B with cross-coupled exponential FM
	fmods_B = (numChannels..numChannels * 2 - 1).collect{ |fbChan, i|
		var fmIndex = param.(\two, \xmIndex, 0, ControlSpec(0, 3));
		var fmFltRatio = param.(\two, \xmFltRatio, 1, ControlSpec(1, 5));
		var fmod = fbChannels[fbChan] * fmIndex;
		~unitShapers.onePoleFilters[\hpf].(fmod, grainData_A[\slopes][i] * fmFltRatio);
	};

	grainPhases_B = ~grainFunctions.multiChannel[\rampSubSample].(
		triggers: triggers,
		slopes: grainData_B[\freqs] * (2 ** fmods_B) * SampleDur.ir,
		subSampleOffsets: subSampleOffsets
	);
	grainOscs_B = wavetableOsc.(\two, 0, grainPhases_B.wrap(0, 1));

	LocalOut.ar(grainOscs_B ++ grainOscs_A);

This will take way more time, but im making a plan :slight_smile:

1 Like

The dual OscOs shouldn’t be too difficult to make, especially if both oscillators use the same buffer. I guess you would need turn OscOs into a class with a next function that returns just one sample at a time. Then make this PM UGen contain two of those. I do this with SawNext, where it is an object that lives inside most of the OSOscillators as the internal ramp.

Sam

1 Like

hey, i have already created a namespace for the bit utils which im using to derive 3-bit and inversed 8-bit outputs from the shiftregister function in the main for loop. The Shiftregister i have implemented makes use of modular arithmetic instead of shifting bits around which i have learned from the GO book and has some advantages when wanting to rotate or increase / decrease the size of the current bits in the shift register:

here is the namespace for the bit utils:

#include "BitUtils.hpp"

namespace BitUtils {

    float randomFloat() {
        return static_cast<float>(rand()) / static_cast<float>(RAND_MAX);
    }

    int rotateBits(int value, int rotation, int length) {
        // Use wrap instead of % to handle negative rotation amount
        int normalizedRotation = sc_wrap(rotation, 0, length - 1);
        int complementRotation = length - normalizedRotation;
        
        // Calculate bit ranges for the given length
        int maxValueForLength = static_cast<int>(std::pow(2, length));
        double leftShiftMultiplier = std::pow(2, normalizedRotation);
        double rightShiftDivisor = std::pow(2, -complementRotation);
        
        // Perform the bit shifts
        double leftShifted = value * leftShiftMultiplier;
        double rightShifted = value * rightShiftDivisor;
        
        // Extract the relevant parts
        int leftPart = static_cast<int>(leftShifted) % maxValueForLength;
        int rightPart = static_cast<int>(std::floor(rightShifted));
        
        // Combine both parts to get the rotated result
        return leftPart + rightPart;
    }

    // Extract top numBits and apply LSB weighting (bit5*1 + bit6*2 + bit7*4)
    float getMSBBits(int value, int numBits, int totalBits) {
        int startBit = totalBits - numBits;  // Calculate start bit for MSB
        int result = 0;
        
        for (int i = 0; i < numBits; i++) {
            int bitIndex = startBit + i;
            
            // Extract the bit using power/modulus
            int divisor = static_cast<int>(std::pow(2, bitIndex));
            int bit = (value / divisor) % 2;
            
            // Apply LSB-first weighting
            int weight = static_cast<int>(std::pow(2, i));
            result += bit * weight;
        }
        
        // Normalize to 0-1 range
        int maxValue = static_cast<int>(std::pow(2, numBits)) - 1;
        return static_cast<float>(result) / static_cast<float>(maxValue);
    }

    // Extract bottom numBits and apply MSB weighting (bit0*128 + bit1*64 + ... + bit7*1)
    float getLSBBits(int value, int numBits, int totalBits) {
        int result = 0;
        
        for (int i = 0; i < numBits; i++) {
            int bitIndex = i;  // Start from bit 0
            
            // Extract the bit using power/modulus
            int divisor = static_cast<int>(std::pow(2, bitIndex));
            int bit = (value / divisor) % 2;
            
            // Apply MSB-first weighting
            int weight = static_cast<int>(std::pow(2, numBits - 1 - i));
            result += bit * weight;
        }
        
        // Normalize to 0-1 range
        int maxValue = static_cast<int>(std::pow(2, numBits)) - 1;
        return static_cast<float>(result) / static_cast<float>(maxValue);
    }
}

and here the implementation inside the main for loop on a derived trigger from the scheduling ramp:

        if (clockTrigger) {
            // Rotate shift register
            int rotated = BitUtils::rotateBits(m_shiftRegister, rotate, length);

            // Extract LSB for feedback
            int extractedBit = rotated % 2; // Get LSB
            int withoutLSB = rotated - extractedBit; // Remove LSB

            // XOR with modulated ramp2
            bool feedbackBit = (modulated_ramp2 < chance);
            int newBit = extractedBit ^ (feedbackBit ? 1 : 0);

            // Update shift register
            m_shiftRegister = withoutLSB + newBit;

            // 7. STATE UPDATING: Calculate new rungler output
            m_runglerOut_3bit = BitUtils::getMSBBits(m_shiftRegister, 3, NUM_BITS);
            m_runglerOut_8bit = 1.0f - BitUtils::getLSBBits(m_shiftRegister, 8, NUM_BITS);
        }

in my main utils function i have implemented the cubic interpolation, the rampToTrig function and the OnePole filter already as classes, which i can use from the main cpp file (and im already doing for these implementations):

#pragma once

#include "SC_PlugIn.hpp"

namespace Utils {

class CubicInterpolator {
private:
    double mHistory1{0.0}, mHistory2{0.0}, mHistory3{0.0}, mHistory4{0.0};
   
public:
    CubicInterpolator() = default;
   
    void reset(float initialValue = 0.0f) {
        float scaledValue = initialValue * 0.8f;
        mHistory1 = mHistory2 = mHistory3 = mHistory4 = scaledValue;
    }
   
    float process(float input, double phase, bool clockTrigger) {
        if (clockTrigger) {
            float scaledInput = input * 0.8f;
           
            mHistory4 = mHistory3;
            mHistory3 = mHistory2;
            mHistory2 = mHistory1;
            mHistory1 = scaledInput;
        }
       
        float t = static_cast<float>(phase);
        float y0 = static_cast<float>(mHistory4); 
        float y1 = static_cast<float>(mHistory3); 
        float y2 = static_cast<float>(mHistory2); 
        float y3 = static_cast<float>(mHistory1); 
       
        return cubicinterp(t, y0, y1, y2, y3);
    }
};

//////////////////////////////////////////////////////////////////////////////////////////////////////////////

class RampToTrigger {
private:
    double m_prevPhase{0.0};
    bool m_prevWrap{false};

public:
    RampToTrigger() = default;
    
    void reset() {
        m_prevPhase = 0.0;
        m_prevWrap = false;
    }
    
    bool process(double currentPhase) {
        // wrap detection (current sample vs previous sample)
        double delta = currentPhase - m_prevPhase;
        double sum = currentPhase + m_prevPhase;
        
        // Detect phasor wrap
        bool currentWrap = false;
        if (sum != 0.0) {
            currentWrap = (std::abs(delta / sum) > 0.5);
        }
        
        // Rising edge detection - only trigger once per wrap
        bool trigger = currentWrap && !m_prevWrap;
        
        // Store state for next sample
        m_prevPhase = currentPhase;
        m_prevWrap = currentWrap;
        
        return trigger;
    }
};

//////////////////////////////////////////////////////////////////////////////////////////////////////////////

class OnePoleFilter {
private:
    static constexpr float PI = 3.14159265358979323846f;
    float m_state{0.0f};
   
public:
    OnePoleFilter() = default;
   
    void reset() {
        m_state = 0.0f;
    }
   
    float processLowpass(float input, float slope) {
        // Clamp slope to full Nyquist range, then take absolute value
        float safeSlope = std::abs(sc_clip(slope, -0.5f, 0.5f));
       
        // Calculate coefficient
        float coeff = std::exp(-2.0f * PI * safeSlope);
       
        // OnePole lowpass: y[n] = x[n] * (1-b) + y[n-1] * b
        m_state = input * (1.0f - coeff) + m_state * coeff;
       
        return m_state;
    }
   
    float processHighpass(float input, float slope) {
        float lowpassed = processLowpass(input, slope);
        return input - lowpassed;
    }
};

} // namespace Utils

so i guess i will be able to use the current implementation of OscOS to make a class out of that to be used with a next function. I guess it would be desirable to pass different buffers and different buf_locs for different modulation per OscOS. The main problem im facing right now is,that my c++ class i take is about c++20 and alot of code im investigating is in an older style of c++ which makes it more hard to understand. There are currently some specific assignments or shortcuts i dont understand when im studying code.

I think these are all great and thanks for doing this in the open!

Just a couple of notes about the c++.

It would be better to use structs, leave everything public and get rid of the destructor. See rule of zero, or rule of five.

The random float function isn’t in keeping with sc’s style as you can’t set the seed, SC provides some way of doing this nicely (I can’t remember right now). Also, the c++ way of doing randomness is to use the stuff from the random header, rand is generally avoided.

1 Like

hey, thanks.

While creating these classes and namespaces i got a bit confused what might be the best way to organize the utilities. Have too learn a bit more details about all of these. I will have a look at structs :slight_smile:

so i could just use frand / frand2 instead from SC_RGen.h?

Im not sure about the arguments though.
Currently looking at the NoiseUgens.cpp and how frand / frand2 are used in there.
For example frand2(s1, s2, s3) in WhiteNoise. Where are these arguments defined?

Yeah the UGen code is completely unreadable because it uses unsafe macros everywhere. It is basically impossible to learn how to write good unit code from looking at the builtin ones.

If we take the following and expand the three main the macros (RGET LOOP1 RPUT)…

This gives use…

void WhiteNoise_next(WhiteNoise* unit, int inNumSamples) {
    float* out = ZOUT(0);
    RGen& rgen = *unit->mParent->mRGen;                                                                                
    uint32 s1 = rgen.s1;                                                                                               
    uint32 s2 = rgen.s2;                                                                                               
    uint32 s3 = rgen.s3;
                                                                                                 
    int xxn = (inNumSamples);                                                                                            
    assert(inNumSamples);                                                                                                
    do {                                                                                                          
       ZXP(out) = frand2(s1, s2, s3);                                                                                                    
    } while (--xxn);                                                                                               

    rgen.s1 = s1;                                                                                                      
    rgen.s2 = s2;                                                                                                      
    rgen.s3 = s3;
}

This should really just be turned into a method/function that takes an RGen&, something like float frand2(RGen& r) but this part of the code base is in dire need for refactoring.

thanks, thats really confusing indeed :slight_smile: does this mean one should create a wrapper function around that right now? I mean your comment was on creating random numbers and then ive tried to figure out the sc way and then we ended up here, to potentially replace my randomFloat() function with a more appropriate way of doing things in sc.

Short little update: i think i will be ready in the upcoming 1-2 months to upload a github repository with these developments: Chaotic Shift Register, Rungler / Chaotic Core from the Blippo Box / Benjolin and the Dual OscOS. Im currently doing some more research and trying to figure out the best design decisions for these Ugens (less parameters but lots of degrees of freedom). I will make sure all the technical details are covered and then it would be nice if someone would like to help to make the c++ structure right and help me building for different OS. After that i have some more things i could think of implementing…

5 Likes

okay i think i figured this out way more early than i thought :slight_smile:
The sincInterpolate function took me some night shifts to debug because of one implicit cast to int which i have overlooked and a memory corruption bug.
I have implemented the wavetable oscillator from scratch based on my initial SC implementation with sinc interpolation and mipmapping studying the chapter in the GO book once more, the Supercollider API and BufRd sourcecode for LOOP_BODY and investigating the OscOS and some additional AI help. There is some syntax in there which im currently not fully understanding, but its nice to have a moment of success beside my ongoing c++ course. The code is already quite clean in my opinion and its working as intended. Next is the oversampling implementation and then cross modulation PM. Im so happy :slight_smile:

Here is the Utils.hpp with all the utilities needed, then you have everything external from the next function in the main cpp file:

#pragma once
#include "SC_PlugIn.hpp"
#include "wavetables.h"

namespace Utils {

// ===== BASIC MATH UTILITIES =====

inline float lerp(float a, float b, float t) {
    return a * (1.0f - t) + b * t;
}

// ===== PHASE PROCESSING UTILITIES =====

struct RampToSlope {
    float m_lastPhase{0.0f};
    
    void reset(float currentPhase) {
        m_lastPhase = currentPhase;
    }
    
    float process(float currentPhase) {
        float delta = currentPhase - m_lastPhase;
        m_lastPhase = currentPhase;
        return sc_wrap(delta, -0.5f, 0.5f);
    }
};

// ===== BUFFER INTERPOLATION UTILITIES =====

inline float peekNoInterp(const float* buffer, int bufSize, int index) {
    const int wrappedIndex = sc_wrap(index, 0, bufSize - 1);
    return buffer[wrappedIndex];
}

inline float peekLinearInterp(const float* buffer, int bufSize, float phase) {
    
    const float sampleIndex = phase;
    const int intPart = static_cast<int>(sampleIndex);
    const float fracPart = sampleIndex - intPart;
    
    const int idx1 = sc_wrap(intPart, 0, bufSize - 1);
    const int idx2 = sc_wrap(intPart + 1, 0, bufSize - 1);
    
    const float a = buffer[idx1];
    const float b = buffer[idx2];
    
    return lerp(a, b, fracPart);
}

inline float peekCubicInterp(const float* buffer, int bufSize, float phase) {

    const float sampleIndex = phase;
    const int intPart = static_cast<int>(sampleIndex);
    const float fracPart = sampleIndex - intPart;
    
    const int idx0 = sc_wrap(intPart - 1, 0, bufSize - 1);
    const int idx1 = sc_wrap(intPart, 0, bufSize - 1);
    const int idx2 = sc_wrap(intPart + 1, 0, bufSize - 1);
    const int idx3 = sc_wrap(intPart + 2, 0, bufSize - 1);
    
    const float a = buffer[idx0];
    const float b = buffer[idx1];
    const float c = buffer[idx2];
    const float d = buffer[idx3];
    
    return cubicinterp(fracPart, a, b, c, d);
}

// ===== SINC INTERPOLATION UTILITIES =====

struct SincTable {
    static constexpr int TABLE_SIZE = 8192;
    static constexpr int SINC_LEN = 8;
    static constexpr int SINC_HALF_LEN = 4;
    
    std::array<float, TABLE_SIZE> table;
    
    SincTable() {
        // Load table and convert in constructor
        auto doubleTable = get_sinc_window8();
        for (int i = 0; i < TABLE_SIZE; ++i) {
            table[i] = static_cast<float>(doubleTable[i]);
        }
    }
    
    const float* data() const { return table.data(); }
    constexpr int size() const { return TABLE_SIZE; }       
    constexpr int sincLen() const { return SINC_LEN; }  
    constexpr int sincHalfLen() const { return SINC_HALF_LEN; }
};

// Sinc interpolation function using the sinc table
inline float sincInterpolate(float scaledPhase, const float* buffer, int bufSize, int startPos, int endPos, int sampleSpacing, const SincTable& sincTable) {

    const float sampleIndex = scaledPhase / static_cast<float>(sampleSpacing);
    const int intPart = static_cast<int>(sampleIndex);
    const float fracPart = sampleIndex - static_cast<float>(intPart);

    float result = 0.0f;
    
    for (int i = 0; i < sincTable.sincLen(); ++i) {

        // === WAVEFORM BUFFER ACCESS (no interpolation) ===
        int waveIndex = startPos + (intPart + (i - sincTable.sincHalfLen())) * sampleSpacing;
        waveIndex = sc_wrap(waveIndex, startPos, endPos);
        float waveSample = peekNoInterp(buffer, bufSize, waveIndex);
        
        // === SINC TABLE ACCESS (cubic interpolation) ===
        float sincPhase = (static_cast<float>(i) - fracPart) / static_cast<float>(sincTable.sincLen()) * static_cast<float>(sincTable.size());
        float sincSample = peekCubicInterp(sincTable.data(), sincTable.size(), sincPhase);

        result += waveSample * sincSample;
    }

    return result;
}

// ===== MIPMAP UTILITIES =====

inline float mipmapInterpolate(float phase, const float* buffer, int bufSize, int startPos, int endPos, float slope, const SincTable& sincTable) {
    // Calculate mipmap parameters
    const float rangeSize = static_cast<float>(endPos - startPos);
    float samplesPerFrame = std::abs(slope) * rangeSize;
    float octave = std::max(0.0f, std::log2(samplesPerFrame));
    int layer = static_cast<int>(std::ceil(octave));
    
    // Calculate spacings for adjacent mipmap levels  
    int spacing1 = static_cast<int>(std::pow(2, layer));
    int spacing2 = static_cast<int>(std::pow(2, layer + 1));
    
    // Pre-scale phase by range size
    const float scaledPhase = phase * rangeSize;
    
    // Get interpolated signals within the specified range
    float sig1 = sincInterpolate(scaledPhase, buffer, bufSize, startPos, endPos, spacing1, sincTable);
    float sig2 = sincInterpolate(scaledPhase, buffer, bufSize, startPos, endPos, spacing2, sincTable);

    // Crossfade between the two interpolated signals
    return lerp(sig1, sig2, sc_wrap(octave, 0.0f, 1.0f));
}

// ===== MULTI-CYCLE WAVETABLE UTILITIES =====

inline float wavetableInterpolate(float phase, const float* buffer, int bufSize, int numCycles, float cyclePos, RampToSlope& rampToSlope, const SincTable& sincTable) {
    
    // Calculate slope
    float slope = rampToSlope.process(phase);
    
    // Calculate cycle parameters
    const int cycleSamples = bufSize / numCycles;
    
    // GO book approach: wrap cyclePos to 0-1, then scale by numCycles
    //const float wrappedPos = sc_wrap(cyclePos, 0.0f, 1.0f);
    //const float scaledPos = wrappedPos * static_cast<float>(numCycles);
    
    // OscOS approach: clip cyclePos to 0-1, then scale by (numCycles - 1)
    const float clippedPos = sc_clip(cyclePos, 0.0f, 1.0f);
    const float scaledPos = clippedPos * static_cast<float>(numCycles - 1);
    const int intPart = static_cast<int>(scaledPos);
    const float fracPart = scaledPos - static_cast<float>(intPart);
    
    // Calculate cycle indices
    const int cycleIndex1 = intPart % numCycles;
    const int cycleIndex2 = (intPart + 1) % numCycles;
    
    // Calculate start/end positions for each cycle
    const int startPos1 = cycleIndex1 * cycleSamples;
    const int endPos1 = startPos1 + cycleSamples;
    const int startPos2 = cycleIndex2 * cycleSamples;
    const int endPos2 = startPos2 + cycleSamples;
    
    // Process each cycle
    float output1 = mipmapInterpolate(phase, buffer, bufSize, startPos1, endPos1, slope, sincTable);
    float output2 = mipmapInterpolate(phase, buffer, bufSize, startPos2, endPos2, slope, sincTable);
    
    // Crossfade between the two cycles
    return lerp(output1, output2, fracPart);
}

} // namespace Utils

Ive made some optimizations for the sincInterpolate function. Ive completely enrolled the hot for loop and got rid of all the function calls (no overhead), replaced all the array lookups with hardcoded integers / floats and used bit-wise logic which replaces the costly sc_wrap calls.
I have additionally replaced the log/pow functions in the less frequently called mipmapInterpolate function by fast log / pow approximations by Jatin Chowdhury.
This lead to an unbelievable amount of perfomance gain. Before i had with DualOscOS about 10-12 %, now its about 2-3% CPU.

inline float sincInterpolate(float scaledPhase, const float* buffer, int bufSize, int startPos, int endPos, int sampleSpacing, const SincTable& sincTable) {

    const float invSampleSpacing = 1.0f / static_cast<float>(sampleSpacing);
    const float sampleIndex = scaledPhase * invSampleSpacing;
    const int intPart = static_cast<int>(sampleIndex);
    const float fracPart = sampleIndex - static_cast<float>(intPart);

    const float sincOffset = fracPart * SincTable::SPACING;
    const float* const sincData = sincTable.data();

    // Power-of-2 optimization
    const int mask = (endPos - startPos) - 1;
    const int basePos = ((intPart * sampleSpacing - startPos) & mask);
    
    float result = 0.0f;
    
    {
        // i = 0: offset = -4
        const int wavePos = startPos + ((basePos - 4 * sampleSpacing) & mask);
        const float waveSample = buffer[wavePos];
        
        const float sincPos = 0.0f - sincOffset;
        const int sincIntPart = static_cast<int>(sincPos);
        const float sincFracPart = sincPos - static_cast<float>(sincIntPart);
        const int sincIdx1 = sincIntPart & 8191;
        const int sincIdx2 = (sincIdx1 + 1) & 8191;
        const float sincSample = sincData[sincIdx1] + sincFracPart * (sincData[sincIdx2] - sincData[sincIdx1]);
        result += waveSample * sincSample;
    }
    
    {
        // i = 1: offset = -3
        const int wavePos = startPos + ((basePos - 3 * sampleSpacing) & mask);
        const float waveSample = buffer[wavePos];
        
        const float sincPos = 1024.0f - sincOffset;
        const int sincIntPart = static_cast<int>(sincPos);
        const float sincFracPart = sincPos - static_cast<float>(sincIntPart);
        const int sincIdx1 = sincIntPart & 8191;
        const int sincIdx2 = (sincIdx1 + 1) & 8191;
        const float sincSample = sincData[sincIdx1] + sincFracPart * (sincData[sincIdx2] - sincData[sincIdx1]);
        result += waveSample * sincSample;
    }
    
    {
        // i = 2: offset = -2
        const int wavePos = startPos + ((basePos - 2 * sampleSpacing) & mask);
        const float waveSample = buffer[wavePos];
        
        const float sincPos = 2048.0f - sincOffset;
        const int sincIntPart = static_cast<int>(sincPos);
        const float sincFracPart = sincPos - static_cast<float>(sincIntPart);
        const int sincIdx1 = sincIntPart & 8191;
        const int sincIdx2 = (sincIdx1 + 1) & 8191;
        const float sincSample = sincData[sincIdx1] + sincFracPart * (sincData[sincIdx2] - sincData[sincIdx1]);
        result += waveSample * sincSample;
    }
    
    {
        // i = 3: offset = -1
        const int wavePos = startPos + ((basePos - sampleSpacing) & mask);
        const float waveSample = buffer[wavePos];
        
        const float sincPos = 3072.0f - sincOffset;
        const int sincIntPart = static_cast<int>(sincPos);
        const float sincFracPart = sincPos - static_cast<float>(sincIntPart);
        const int sincIdx1 = sincIntPart & 8191;
        const int sincIdx2 = (sincIdx1 + 1) & 8191;
        const float sincSample = sincData[sincIdx1] + sincFracPart * (sincData[sincIdx2] - sincData[sincIdx1]);
        result += waveSample * sincSample;
    }
    
    {
        // i = 4: offset = 0
        const int wavePos = startPos + basePos;
        const float waveSample = buffer[wavePos];
        
        const float sincPos = 4096.0f - sincOffset;
        const int sincIntPart = static_cast<int>(sincPos);
        const float sincFracPart = sincPos - static_cast<float>(sincIntPart);
        const int sincIdx1 = sincIntPart & 8191;
        const int sincIdx2 = (sincIdx1 + 1) & 8191;
        const float sincSample = sincData[sincIdx1] + sincFracPart * (sincData[sincIdx2] - sincData[sincIdx1]);
        result += waveSample * sincSample;
    }
    
    {
        // i = 5: offset = 1
        const int wavePos = startPos + ((basePos + sampleSpacing) & mask);
        const float waveSample = buffer[wavePos];
        
        const float sincPos = 5120.0f - sincOffset;
        const int sincIntPart = static_cast<int>(sincPos);
        const float sincFracPart = sincPos - static_cast<float>(sincIntPart);
        const int sincIdx1 = sincIntPart & 8191;
        const int sincIdx2 = (sincIdx1 + 1) & 8191;
        const float sincSample = sincData[sincIdx1] + sincFracPart * (sincData[sincIdx2] - sincData[sincIdx1]);
        result += waveSample * sincSample;
    }
    
    {
        // i = 6: offset = 2
        const int wavePos = startPos + ((basePos + 2 * sampleSpacing) & mask);
        const float waveSample = buffer[wavePos];
        
        const float sincPos = 6144.0f - sincOffset;
        const int sincIntPart = static_cast<int>(sincPos);
        const float sincFracPart = sincPos - static_cast<float>(sincIntPart);
        const int sincIdx1 = sincIntPart & 8191;
        const int sincIdx2 = (sincIdx1 + 1) & 8191;
        const float sincSample = sincData[sincIdx1] + sincFracPart * (sincData[sincIdx2] - sincData[sincIdx1]);
        result += waveSample * sincSample;
    }
    
    {
        // i = 7: offset = 3
        const int wavePos = startPos + ((basePos + 3 * sampleSpacing) & mask);
        const float waveSample = buffer[wavePos];
        
        const float sincPos = 7168.0f - sincOffset;
        const int sincIntPart = static_cast<int>(sincPos);
        const float sincFracPart = sincPos - static_cast<float>(sincIntPart);
        const int sincIdx1 = sincIntPart & 8191;
        const int sincIdx2 = (sincIdx1 + 1) & 8191;
        const float sincSample = sincData[sincIdx1] + sincFracPart * (sincData[sincIdx2] - sincData[sincIdx1]);
        result += waveSample * sincSample;
    }

    return result;
}

Optimisers do this for you. Did you check that the optimiser failed to do this before doing it by hand?

Just to check… are you compiling in release mode?

hey, thanks for your reply. I havent checked that, just noticed the high CPU demand and tried to figure out a faster version. How can i do that?

Im not sure about the release mode, do i have to implement that in my CmakeLists.txt?

Was additionally reading about inline functions: Standard C++